Computer Science and Mathematics

Sort by

Article
Computer Science and Mathematics
Computer Science

Xie Yang

,

Jun Yin

,

Xiujuan Ma

,

Luqian Wang

Abstract: In reality, infectious diseases rarely spread in isolation; instead, multiple diseases often spread concurrently. The infection of one disease may influence an individual’s susceptibility or transmissibility of another disease through mechanisms such as immunosuppression or symptom superposition. Furthermore, population contact structures are not limited to simple pairwise interactions, but also involve group events with simultaneous exposure, such as family gatherings. Traditional network models, which are based on pairwise interactions, are difficult to accurately capture these higher-order interaction structures and the coupling mechanisms among multiple pathogens. Therefore, this paper presents a hypergraph SIS transmission model based on dynamic thresholds. It systematically investigates the transmission dynamics of multiple diseases on uniform and non-uniform hypergraph structures in BA and ER networks. According to co-infection scenarios, three coupling mechanisms are proposed, positive coupling which promotes co-infection, negative coupling which suppresses co-infection, and no coupling where transmission occurs independently. To account for variations in initial disease intensity, three comparison groups are designed: high-low, same-high and same-low. This paper analyses the combined effects of coupling mechanisms, threshold variations and network structural characteristics on the transmission thresholds, propagation rates and infection scales of multiple diseases. These findings are validated in a dengue fever transmission network.

Article
Computer Science and Mathematics
Computer Science

Anil Kumar Chengali

,

Seetharamulu B.

Abstract: The automatic prediction of daily human activities like walking, running, cooking, and office work is called Human Activity Recognition (HAR). The medical industry can greatly benefit from it, especially those working with the elderly, personal health care aides, those keeping patient records for reference in the future, etc. A HAR system can take (a) video or still images of people doing things, or (b) data showing the human body's motions as they do those things gathered from sensors in smart devices (accelerometers, gyroscopes, etc.), smart homes, eldercare, and the Internet of Things (IoT). The suggested HAR applications heavily rely on the latest developments in AI approaches, such as optimisation algorithms from Deep Learning (DL) and Swarm Intelligence (SI). Here, we use open-source data from wearable sensors to construct a reliable HAR system that combines DL and SI. A method for light feature extraction called Residual Bidirectional Long Short-Term Memory (Res-BiLSTM) has been developed. Based on the Marine Predator Algorithm (MPA), we presented novel feature selection approaches to choose the best collection of features. Using three publicly available HAR datasets from the UCI machine learning repository, we assess the performance of the suggested model. We evaluate the suggested model against different DL architectures that have recently been suggested as solutions to the HAR problem. The proposed model surpasses other state-of-the-art approaches in terms of accuracy 96.92%, precision of 95.45%, recall of 94.07%, and F1 score of 96.15% on all three datasets. The suggested approach outperforms several reported results in robustness and activity detection. As well as adapting activity aspects, it has fewer parameters and improved accuracy.

Article
Computer Science and Mathematics
Computer Science

Porter E. Coggins III

Abstract: MD-Hill-SPN is the first Hill-based construction to combine a multi-tier diffusion mix layer, a memory-hard KDF, and a simultaneous multi-metric empirical evaluation. Two independent runs of the full metric suite yield: (a) full plaintext avalanche from round 1 (mean 63.97–64.67 of 128 bits, ideal 64); (b) the differential-probability sampling floor of 2×10⁻⁵ reached at round 4 (50,000 of 50,000 output differences distinct, both sessions); (c) algebraic-degree lower-bound saturation at the maximum observable value from round 1; (d) linear bias indistinguishable from random (combined exceedance 4.40%, below the 4.55% noise floor); and (e) branch numbers at the Singleton (MDS) bound for every tier (B = 5 for 4×4, B = 9 for 8×8, B = 17 for 16×16), computed exhaustively over weight-1 inputs. MD-Hill-SPN therefore moves beyond theoretical construction to a construction that passes a defined empirical evaluation suite: avalanche, differential sampling, linear-bias probing, algebraic-degree lower bounds, and MDS branch numbers under single-key, known-plaintext conditions with fixed parameters, an evaluation no prior Hill cipher variant has reported in full.

Article
Computer Science and Mathematics
Computer Science

Owusu Nyarko-Boateng

,

Adebayo Felix Adekoya

,

Isaac Kofi Nti

,

Romanus Daanaah

Abstract: Intrusion Detection Systems (IDS) remain essential for enterprise and IoT security, yet traditional approaches struggle to balance accuracy, scalability, and adaptability to evolving threats. Signature-based systems such as Suricata efficiently identify known threats but fail against zero-day and polymorphic attacks. Conversely, standalone machine learning models detect novel attacks but often suffer from high false-positive rates and lack contextual reasoning necessary for operational triage. This research addresses these limitations by proposing a hybrid intrusion detection system that integrates Suricata for signature-based detection, ensemble machine learning models for anomaly detection, and the Diamond Model of Intrusion Analysis (DMIA) for contextual reasoning. The system was implemented and evaluated using the CIC-IoT 2023 and TabularIoTAttack 2024 datasets. Experiments demonstrated high detection accuracy (98.6%), precision (98.1%), recall (97.6%), and a low false positive rate (1.2%). The DMIA integration uniquely contextualized each intrusion attempt by mapping it to adversary, capability, infrastructure, and victim dimensions, enhancing both situational awareness and response prioritization. The proposed system bridges the gap between academic IDS models and operationally deployable security platforms by combining deterministic rule-based detection with probabilistic machine learning and structured contextual analysis, offering a robust framework for next-generation enterprise and IoT network defense.

Article
Computer Science and Mathematics
Computer Science

Maath Frman

,

Kholood J. Moulood

,

Mustafa Noori

,

Ekram H. Hasan

,

Oqbah Salim Atiyah

,

Qutaiba Alasad

Abstract: The rapid advancement of modern Deep Neural Networks (DNNs) has played a crucial role in aiding humans in many real-world applications, yet their hardware accelerators have been proven to be vulnerable to malicious attacks. One particularly severe and serious attack involves inserting a hardware Trojan (HT) into DNN accelerator hardware in order to enable attackers to stealthily manipulate model predictions during the supply chain. In this paper, we present a stealthy HT that is difficult to be detected and has a significant impact on the performance of DNN models. To successfully achieve this goal, we introduce the Sensitivity-Based Weight Selection (SBWS) algorithm, a novel technique that adapts machine learning (ML) sensitivity analysis to identify and modify only a small number of weights that have the highest impact on DNN performance, compared to previous work. We evaluate our proposed attack on five DNN models and multiple datasets using two designed payload types: weight zeroing and sign-flipping, and record the results based on various security metrics. The experimental results show average accuracy reductions of 26.7% for the zeroing attack and 48.1% for the sign-flipping attack, yielding an overall average of 37.4%, calculated over five independent runs per dataset with standard deviation < 2%. The sign-flipping technique consistently outperforms zeroing one because it preserves the magnitudes of the attacked weights while inverting their signs, thereby disrupting the learned decision boundaries more severely and amplifying error propagation in subsequent layers. These results significantly exceed previous random-weight perturbation attacks (typically 12–20% drops) and other targeted HT approaches, while incurring lower computational and hardware resource overheads. This work provides a more effective and scalable method for assessing the vulnerability of DNN accelerators under real supply-chain threat models.

Article
Computer Science and Mathematics
Computer Science

Chuanzhen Wang

,

Meade Cleti

,

Pete Jano

Abstract: De novo protein generation has transformative potential in therapeutic design, enzyme engineering, and synthetic biology. While diffusion-based and flow matching approaches have achieved progress, they typically operate at single resolution and lack mechanisms for incorporating functional constraints. We introduce ProHiFlo, a hierarchical flow matching framework with three innovations: (1) coarse-to-fine generation that models backbone geometry before refining to all-atom coordinates, reducing computational cost while maintaining accuracy; (2) functional guidance leveraging pretrained pre- dictors to steer generation toward desired properties without retraining; (3) adaptive SE(3)-equivariant architecture for efficient multi-scale processing. Experiments on unconditional generation, motif scaffolding, and functional design demonstrate state-of- the-art performance while requiring 4× fewer sampling steps. On enzyme active site scaffolding, ProHiFlo achieves 58.9% success rate compared to 41.2% for RFDiffusion.

Article
Computer Science and Mathematics
Computer Science

Dimitrios P. Panagoulias

,

Andrei Ionut Damian

,

Cosmin Stamate

,

Vitalli Toderian

,

Petrica Butusina

,

Alessandro De Franceschi

,

Cristian Bleotiu

,

Evangelos Sakkopoulos

,

Evangelia-Aikaterini Tsichrintzi

Abstract: Clinical AI systems increasingly rely on large-scale medical imaging data processed through complex and continuously evolving machine learning pipelines. While cloud-based infrastructures enable scalability and performance, they introduce challenges related to trust, auditability, consent man-agement, and reproducibility, particularly in multi-institutional and longitudinal clinical settings. This paper proposes a decentralized governance framework for clinical AI that leverages blockchain technology as a verification and policy-enforcement overlay, without decentralizing the storage of sensitive medical data. Raw images and derived clinical artifacts remain within secure cloud-based repositories, while cryptographic proofs, processing manifests, access events, and consent policies are recorded on a distributed ledger. In addition, the framework supports the use of decentralized computation infrastructures for training and experimentation on de-identified or synthetic datasets, enabling scalable and collaborative AI development without exposing patient data. The proposed architecture enables verifiable provenance, tamper-evident audit trails, programmable consent enforce-ment, and deterministic reconstruction of AI-assisted clinical decisions, while preserving regulatory compliance, system performance, and clinical usability. To demonstrate the feasibility and practical benefits of the approach, a medical imaging use case is presented which utilized nine simulated clinical scenarios involving about 43,000 inferences of patient groups ranging from 50 to 1,000 subjects. Our proposed framework achieved a mean Governance Quality Index –a composite measure of security, compliance, performance, and auditability– of 0.93, indicating production ready performance, with governance overhead below 11 ms per operation and throughput exceeding 220 requests per second. In summary, our approach separates governance from data and computation. Blockchain is used solely as a tamper-evident governance layer that anchors consent, access, and provenance through cryptographic commitments, while medical data and AI pipelines remain unchanged within existing cloud infrastructures. This enables verifiable auditability and reproducibility without decentralizing clinical data or disrupting workflows.

Article
Computer Science and Mathematics
Computer Science

Shuo Cai

,

Yanggan Gu

,

Zihao Wang

,

Yuanyi Wang

,

Yibo Yan

,

Wenjun Wang

,

Yuhang Liu

,

Guanghao Zhu

,

Sirui Huang

,

Ming Li

+1 authors

Abstract: Model fusion integrates the capabilities from source models into a single target model. As the open-source AI ecosystem matures, Hugging Face has hosted more than 2M models. This growing pool provides a rich base for model reuse and capability integration. Yet existing surveys often cover only separate parts of this space, and they do not provide a unified definition or a systematic taxonomy. This survey defines model fusion and organizes prior work into three levels: parameter-level, representation-level, and behavior-level fusion. We also review related metrics, benchmarks, and applications, summarize current challenges, and identify future directions. Our goal is to provide a clear map of this area and support future work on model fusion. A comprehensive list of papers about model fusion is available at https://github.com/Baicaihaochi/Awesome-Model-Fusion-Survey.

Review
Computer Science and Mathematics
Computer Science

Asifullah Khan

,

Hamna Asif

,

Maha Tariq

,

Hania Khan

,

Inaya Imran

,

Nayab Ibrahim

,

Aqsa Asif

,

Zunaira Rauf

,

Aleesha Zainab

,

Saleha Jamshed

Abstract: Large language models (LLMs) have revolutionized and have had significant impact on diverse domains such as healthcare, software development and autonomous systems by enabling natural language understanding and reasoning. However, single agent architectures limit their potential due to their non-collaborative nature and the reduced capability to perform complex, multi-disciplinary tasks which require teamwork, role division and adaptive decision making. To counter these shortcomings, Multi-Agent Systems (MAS) have been developed into a platform of contemporary artificial intelligence that allow the autonomous agents to interact, reason and communicate with dynamic and complex environments. Accompanied by the growth of experimentation and implementation of LLMs and Generative AI, MAS frameworks have been applied to real-world applications increasingly. This survey offers an in-depth and comparative study of three innovative and advanced MAS frameworks; Autogen, Langroid and MetaGPT. It delves into their architectural design, communication standards, scalability, applicability and their integration into the rising real world technologies. It presents standard benchmark criteria and performance measures (latency, throughput and memory utilization) through detailed case studies across diverse application domains such as e-commerce, medicine and AI-assisted software engineering. Moreover, it highlights important issues like explainability, security, computational cost and human-in-the-loop requirement in designing such models. Being a synthesis of theoretical developments and practical implementation experiences, it provides a systematic decision-making guide and serves as a basis of further MAS research and development.

Article
Computer Science and Mathematics
Computer Science

Hassnae Aberkane

,

Latifa Boubekri

,

Karim El Hafidi

,

Mohammed Chaouki Abounaima

Abstract: Selecting appropriate healthcare waste (HW) treatment technologies is a challenging multi-criteria decision-making problem characterized by uncertainty, conflicting evaluation criteria, and limited decision-support information. Existing approaches often rely on subjective weighting schemes and may provide rankings that are sensitive to variations in expert judgments. This study proposes ARAS-H-IW, a hybrid decision-support framework that combines Additive Ratio Assessment under Hesitant Fuzzy Sets (ARAS-H) with an Inverse Weighting (IW) mechanism capable of inferring criterion weights directly from expert preference constraints through constrained quadratic optimization. To evaluate its practical applicability, the framework was applied to a real-world healthcare waste management case study using data provided by the Regional Health Directorate of Fez-Meknes (Morocco). A fully reproducible Python-based web platform was developed to automate the complete analytical workflow, including hesitant fuzzy modeling, multi-expert ranking aggregation, inverse weight inference, comparative evaluation, sensitivity analysis, Monte Carlo robustness assessment, and automated reporting. The proposed framework identified centralized autoclaving as the most favorable treatment alternative, followed by regional outsourcing and microwave disinfection. Comparative analyses with TOPSIS, VIKOR, PROMETHEE II, and EDAS showed strong agreement regarding the best and worst-ranked alternatives. Sensitivity and Monte Carlo analyses further demonstrated the stability and robustness of the obtained rankings, while all expert aggregation strategies converged toward the same consensus ordering. The results highlight the capacity of ARAS-H-IW to generate transparent, reproducible, and robust decision recommendations under uncertainty. The proposed framework provides a practical tool for healthcare waste technology assessment and offers a promising foundation for supporting evidence-based decision-making in regional healthcare waste management.

Review
Computer Science and Mathematics
Computer Science

Mahade Hasan

,

Farhana Yasmin

Abstract: Evolutionary algorithms (EAs) are widely used nature-inspired optimization methods capable of solving complex and high-dimensional problems across science and engineering. Foundational paradigms such as genetic algorithms, genetic programming, differential evolution, evolution strategies, and evolutionary programming have expanded into multi-objective, surrogate-assisted, hybrid, and large-scale variants, broadening their applicability to dynamic and datadriven environments. This survey provides a structured review of EAs from a domain-centric perspective, focusing on how different techniques are designed for engineering problems. Applications are examined across renewable energy, civil and structural engineering, electronics, industrial optimization, healthcare, robotics, and smart cities. We present an updated taxonomy of classical and emerging algorithms, consolidate recent application studies, and review benchmarking and reproducibility practices essential for fair evaluation. Key challenges including scalability, constraint handling, and exploration–exploitation balance are discussed alongside future directions such as EA–deep learning integration, federated optimization, and interpretable evolution. This survey offers an updated view of EAs and their engineering relevance.

Article
Computer Science and Mathematics
Computer Science

Suntei Leang

,

Rattanak Visoth Lin

,

Lihour Nov

Abstract: Depression among university students has emerged as a significant mental health concern worldwide. Traditional assessment methods primarily rely on self-reported questionnaires and clinical evaluations, which may not provide scalable and continuous monitoring. Recent advances in machine learning have created opportunities to identify depression-related behavioral patterns through digital activity data. However, many predictive models operate as black-box systems that provide limited interpretability. This study presents a comparative analysis of explainable artificial intelligence (XAI) approaches for depression risk assessment using digital behavior data collected from university students. Logistic Regression and Random Forest classifiers were developed using behavioral indicators including screen time duration, social media usage frequency, nighttime device usage, sleep patterns, self-perceived digital dependency, and perceived academic impact. Model performance was evaluated using accuracy, precision, recall, and F1-score metrics. SHapley Additive exPlanations (SHAP) were applied to improve transparency and interpretability. Experimental results indicate that Logistic Regression achieved 94.74% accuracy, while Random Forest achieved 100% accuracy on the testing dataset. SHAP analysis identified academic impact as the most influential predictor of depression risk. The findings demonstrate that explainable machine learning models can support transparent and ethical depression risk assessment in higher education environments.

Article
Computer Science and Mathematics
Computer Science

Narendra Kumar Upadhyay

,

Sudhakar Periyasamy

,

Vinod Kumar

Abstract: The advent of Cloud-Fog-Edge computing has transformed distributed data processing by performing computation closer to end devices. Due to resource constraints at edge nodes and the dynamic nature of fog-assisted communication, secure and efficient group key distribution and batch verification in such decentralized systems remain a major challenge. Many existing protocols based on Chinese remainder theorem (CRT) use straightforward scalar product with no non-linear transformation to mask the group key and hence fail to provide semantic security. Others suffer from architectural overhead since they require distinct and independent sets of moduli equations with multiple mathematical structures for different network layers, which increases computing overhead, limits scalability and synchronization delays during frequent node leave/join. To mitigate these challenges, this paper proposes a unified distributed CRT-based protocol for Cloud-Fog-Edge environments. Our protocol introduces a non-linear two-factor encryption logic, by incorporating a unique secret parameter for every edge node and breaks the algebraic linearity of the CRT. Additionally, our protocol uses a single set of moduli equations across Cloud-Fog-Edge networks which drastically reduces computation and storage costs at the fog layer. Our protocol achieves O(1) efficiency for rekeying. Formal security analysis using ProVerif and ROR model demonstrates that our protocol has considerable security advantages. To prove its practicality, an ESP32-based simulation on Wokwi is used to verifies the correctness of group key distribution, retrieval, and batch message verification. The performance analysis findings show that our protocol outperforms others in computation cost, communication cost, security and applicability for resource-constrained Cloud-Fog-Edge computing networks.

Article
Computer Science and Mathematics
Computer Science

Khem Poudel

,

Lilly-Sophie Schmidt

,

Clifford N. Jones

,

Saroj Baral

,

Thuan Nhan

,

Satish Wagle

,

Jorge Vargas

Abstract: Tennis match prediction has been studied extensively, yet the literature offers no controlled comparison of Elo ratings, classical machine learning, and deep neural networks under identical experimental conditions, leaving practitioners without clear guidance on model selection. We address this gap with a unified empirical study on 133,138 professional men’s tennis matches from the Association of Tennis Professionals tour (1968–2024). Four approaches are evaluated on the same temporally split data with a common 16-feature set and an aligned evaluation protocol: an enhanced Elo rating system, ten classical machine learning algorithms, seventeen deep neural network configurations spanning 207,000 to 21,000,000 parameters, and a hybrid Elo–machine learning (ELO-ML) approach that augments classical learners with three Elo-derived features. A tuned Elo baseline alone reaches 65.87% accuracy, the best of ten classical machine learning algorithms reaches 66.30%, seventeen deep neural network configurations cluster at 66.15–66.22%, and the hybrid ELO-ML approach reaches 67.52% (McNemar’s test, p < 0.001 for all ELO-ML pairwise comparisons). All four approaches sit within a 1.65 pp band whose upper edge lies below the 70–72% accuracy commonly cited for bookmaker odds, indicating that pre-match prediction under universally available features is a difficult task in which Elo alone already captures most of the predictable signal and algorithmic sophistication adds only marginal headroom. Deep neural networks deliver substantially better probability calibration than the other approaches (Expected Calibration Error 0.0077 vs. 0.0142). Model capacity exhibits sharply diminishing returns: all seventeen network configurations, spanning a 100-fold range in parameter count (207,000 to 21,000,000), fall within a 0.07 pp accuracy band. The study establishes a controlled benchmark for tour-level tennis prediction, quantifies how narrow the headroom above Elo actually is, provides modest but consistent empirical support for the Statistically Enhanced Learning framework, and supplies deployment-ready operating points for sports analytics practitioners.

Article
Computer Science and Mathematics
Computer Science

Olga Tarasyuk

,

Anatoliy Gorbenko

,

Oleksandr Gordieiev

,

Artem Akulynichev

,

Rishad Shafik

,

Alex Yakovlev

Abstract: Human activity recognition (HAR) based on smartphone and wearable sensor data is commonly addressed using statistical learning methods and deep neural networks that often provide strong predictive performance, but at the expense of limited interpretability and substantial computational and energy requirements. Such limitations reduce their suitability for deployment in practical sensing environments where model decisions must be transparent, verifiable and executable on resource-constrained devices. In this work, we investigate the Convolutional Tsetlin Machine (CTM) for multimodal HAR using the UCI-HAR dataset. The Tsetlin Machine is a novel neuro-symbolic machine learning approach that offers two important advantages over many conventional machine learning methods: (i) it learns logic-based decision rules that are human-readable and formally verifiable, and (ii) it operates with comparatively low computational complexity, making it well suited to efficient and low-power on-device learning. The proposed study systematically analyses the contribution of different feature modalities by decomposing the inertial signals space into semantically defined subsets according to: (i) sensor source: accelerometer or gyroscope; (ii) physical component: body or gravity; (iii) coordinate: x, y or z. A separate CTM classifier was trained for each modality and their combination in order to determine the relative discriminative value of each modality group for activity classification. In addition to predictive performance the study emphasizes the interpretability of the CTM model ensured by expressing each decision in the form of propositional clauses, thereby enabling visualization and direct inspection of the modality-specific patterns supporting each activity class. Owing to its symbolic structure and modest computational demands, the CTM provides a principled framework for the design of explainable, resource-efficient and deployable HAR systems. The proposed work therefore contributes toward trustworthy multimodal sensing by jointly addressing predictive performance, interpretability and suitability for embedded and mobile platforms.

Article
Computer Science and Mathematics
Computer Science

Juan Bonastre-Egea

,

Andrés Bueno-Crespo

,

Juan Morales-García

Abstract: Air quality forecasting and environmental health research at urban and regional scales depend on the combination of measurements from heterogeneous sensor networks, yet the construction of integrated multi-source datasets is rarely described or released as a self-contained deliverable. This paper presents an open dataset that combines four sensor-derived sources covering the whole of Spain over the period 2022 to 2024: hourly air quality observations from the 588 stations of the national network operated by the Ministerio para la Transición Ecológica y el Reto Demográfico (MITECO), daily meteorological records from the Agencia Estatal de Meteorología (AEMET), daily mobility indicators derived from anonymised mobile telephony events published by the Ministerio de Transportes y Movilidad Sostenible (MITMA) at the municipality level, and a calendar of national and Autonomous-Community public holidays. The processing pipeline harmonises sources that differ in temporal resolution, spatial codification and quality regime into a tidy hourly table indexed by station and timestamp, with a fixed feature schema of 56 variables per record. Air quality stations are paired with their nearest AEMET station through a three-tier distance rule, and the daily exogenous features are aligned to the air quality time axis through a two-variant temporal-alignment scheme (lag-and-expand to the hourly grid for the hourly release, same-calendar-day join for the daily release). A complementary daily-resolution variant of the dataset is also released, with 72 columns and the same feature schema except for the air quality block, which is aggregated to daily mean, minimum and maximum. The integrated dataset contains approximately 14 million hourly records across the 588 stations and is released on Zenodo (DOI 10.5281/zenodo.20196221) under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It is intended as a substrate for research on air quality forecasting, environmental epidemiology and multi-source data fusion at nationwide scale.

Article
Computer Science and Mathematics
Computer Science

Samson Mayeem

,

Benjamin Tei Partey

,

Godson Rashid Dawuni

,

Osei-Wusu Augustine

Abstract: Aspect-based sentiment analysis (ABSA) is increasingly the granularity at which customer feedback is consumed, and recent work has pushed the field rapidly toward transformer- and graph-based architectures [3,5–7]. However, most modern ABSA approaches assume either a closed manually curated aspect taxonomy or a fully supervised aspect extractor trained on benchmark corpora such as SemEval. Neither assumption holds in low-resource emerging-market settings, where aspects must be discovered from the corpus itself, annotation budgets are negligible, and class distributions can be unexpectedly skewed. This article introduces Soft-Aspect ABSA, a probabilistic, topic-model-agnostic framework that promotes unsupervised topic-model output to first-class aspects via a temperature-controlled softmax over topic-membership posteriors. We instantiate the framework with a spectral-clustering plus non-negative matrix factorisation (NMF) substrate on a corpus of 292 Google Play Store reviews of a Ghanaian retail-bank mobile application (April–September 2024). The corpus exhibits an inverted class imbalance (30.8% positive / 69.2% negative under a keyword-bootstrap rule) and a four-cluster topic decomposition. A baseline TF-IDF embedding head trained with binary cross-entropy collapses to the majority class on the held-out test set: accuracy 0.6949, minority-class F1 0.000, Matthews correlation 0.000, despite a ROC-AUC of 0.934 that indicates well-ranked probabilities. The framework licenses two closed-form remediations — class-weighted cross-entropy and focal loss [28] — that we evaluate empirically on the same head. Focal loss with γ = 1 lifts minority-class F1 from 0.000 to 0.818, Matthews correlation from 0.000 to 0.746, and ROC-AUC to 0.986, demonstrating that the framework correction is not merely formal but is recoverable on the case-study data. We also run a bootstrap stability protocol for cluster-count selection (B = 50) that flags the silhouette-max k* = 4 as only moderately stable (I_stab = 0.64). The contribution is methodological: a reusable scaffold for low-resource ABSA pipelines in which the aspect set is not given a priori.

Article
Computer Science and Mathematics
Computer Science

Priya Pal

,

Vivek Shukla

,

Atul .

,

Divya Mishra

,

Rishabh Tiwari

,

Mehul Kumar Das

Abstract: Phishing is the most common cybersecurity threat. With phishing, attackers create a website or manipulate a URL in order to obtain a user’s sensitive information. Sensitive information can include a user’s credentials, payment details, or personal information. Phishing attacks target online users by baiting them to click on a fraudulent link. Phishing is a growing concern for users across the world. I propose a phishing detection framework that is lightweight, fast, and able to detect URLs with phishing content. The lightweight comparative phishing framework focuses on the extraction of a reduced number of URL features. These features include characteristics, structures, and patterns that are seen in URLs. These features prepare and place input to the three supervised machine learning methods: Logistic Regression, Decision Tree, and Random Forest. The frameworks were then evaluated based on four main classification metrics: accuracy, precision, recall, and F1-score. The Random Forest Classifier, within the lightweight comparative machine learning framework, was the most accurate in phishing detection with minimal computational requirements. The purpose of this lightweight framework was to offer real time cyber security solutions on browsers. The framework was scalable and efficient.

Article
Computer Science and Mathematics
Computer Science

Goo Yun Hai

,

Abdul Salam Shah

,

Noor Ul Amin

Abstract: This paper introduces a Convolutional Neural Network (CNN) to jointly classify images with multiple classes on the Fashion-MNIST dataset, with a test accuracy of 90.20% and 0.11 million parameters of parameters a lightweight model, which significantly outperforms classical baselines (HOG+SVM: 85%) and is both computationally efficient. The CNN uses three convolutional blocks with varying filter depth (3264128), ReLU activation, MaxPooling, Batch Normalization, Dropout regularization, and fully connected classification head that is trained using Adam optimizer. These architectural concepts are generalised to the field of AI-related cybersecurity: namely, the deep learning-based Network Intrusion Detection Systems (NIDS) classifying network traffic flows as benign and attack ones - a problem that is characterised by the same core challenge architecture as Fashion-MNIST (spatial feature hierarchy extraction, multi-class discrimination, imbalanced class difficulty). State-of-the-art CNN based IDS are 94.8-97.5% accurate in detection (Attention-CNN-LSTM; Nature Scientific Reports, 2025), 98.5% with combined host/network data (Springer Nature, 2024), and 99.67% with encrypted malicious traffic.

Article
Computer Science and Mathematics
Computer Science

Rafik Aliev

,

Oleg Huseynov

,

Aziz Nuriyev

Abstract: The concept of Z-number was introduced to formalize partially reliable information. A Z-number represents linguistic evaluations of a random variable's value and the associated reliability degree. The latter is defined as a fuzzy restriction on the value of a probability measure since the actual probability distribution is unknown. Lotfi Zadeh formalized an extension principle for computation with Z-numbers based on fuzzy and probabilistic arithmetic and noted that the problem of computing with Z-numbers is easy to formulate but difficult to solve. Since then, a series of theoretical studies and practical applications of Z-numbers has been proposed. However, the computational complexity of Z-numbers remains a challenge. Because the actual probability distribution is unknown, a set of probability distributions is considered, which is the main source of computational complexity. In this study, we outline a new approach to computation with Z-numbers that relies on the concept of imprecise probability. Specifically, we use a lower prevision measure (the lower envelope of a set of probability measures) as the basis for computation. The reason is a one-to-one correspondence between lower previsions and convex sets of probability measures. Experimental results show that the proposed approach reduces computational complexity compared with existing methods.

of 68

Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated