Computer Science and Mathematics

Sort by

Review
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Guoxiu He

,

Jinquan Zheng

,

Fangqing Han

Abstract: Selection bias in Large Language Models has emerged as a fundamental obstacle to reliability, fairness, and robustness. Defined operationally as systematic decision changes under equivalence-preserving input perturbations, including option permutation, label renaming, candidate-order swapping, and evidence relocation, the phenomenon is examined across four representative task families: multiple-choice question answering, in-context classification, LLM-as-a-Judge evaluation, and long-context or retrieval-augmented generation. Selection bias is first analyzed through a causal chain that links biased behavior to training-data priors, architectural asymmetries, and post-training amplification. Existing mitigation methods are then synthesized through an intervention-level taxonomy spanning inference-time calibration and prompt optimization, architecture-level modification, and training-level debiasing. The evaluation landscape is unified by summarizing commonly used metrics, benchmark families, and application settings, with the lack of standardized and cross-task-comparable protocols identified as a central bottleneck. Selection bias is best understood as a failure of invariance under non-semantic reformatting, and mitigating it is essential for trustworthy, robust, and selection-invariant language models.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Thaer Thaher

,

Alaa Sheta

,

Huthaifa I. Ashqar

,

Hamouda Chantar

,

Salim Surani

Abstract: Background/Objectives: Obstructive sleep apnea (OSA) is a common and serious sleep-related disorder that causes repeated interruptions in breathing during sleep. Traditional diagnostic methods, such as polysomnography, are accurate but costly, time-consuming, and unsuitable for large-scale screening. This study proposes and evaluates a lightweight diagnostic framework based on an Extreme Learning Machine (ELM) optimized by a set of basic and advanced metaheuristic optimizers (GA, RUN, MEO, CL-PSO, HI-WOA, GWO, HGS, HHO, SeaHO, MGO, and the hybrid GWO--WOA). The model aims to improve early detection of OSA using demographic and clinical data. Methods: Two real datasets were employed to train and evaluate the proposed framework: (i) a clinical OSA dataset with 274 subjects and 31 demographic/anthropometric and sleep-related predictors, and (ii) a public strongly imbalanced Sleep-Disordered Breathing (SDB) dataset with 500 subjects and 10 structured predictors. Metaheuristic algorithms are used to optimize ELM weights and biases, addressing the instability of random initialization and improving model generalization. The optimized models are evaluated against eight baseline classifiers, including Logistic Regression (LR), k-nearest neighbours (KNN), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP), XGBoost (XGB), and a standard ELM classifier. Results: Results show that metaheuristic optimization improves ELM on the OSA dataset, increasing ROC-AUC from 0.6527 to about 0.73 and accuracy from 0.6573 to about 0.69–0.70, while on the highly imbalanced SDB dataset, it yields modest ROC-AUC gains (from 0.5132 to about 0.544–0.548) with small decreases in accuracy and F1-score. Conclusions: The proposed framework provides a fast, lightweight, and cost-effective screening tool for large-scale, resource-limited healthcare settings, enabling early OSA detection and preventive intervention.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Robert Campbell

Abstract: Anthropic’s April 2026 Claude Mythos Preview release established a new operational category of frontier AI systems—Mythos-class—whose capability profile (extended-context reasoning over codebases, recursive self-correction, native system-tool integration, and agentic scaffolding at deployable scale) renders the dominant AI safety paradigms insufficient as sole controls. Reinforcement learning from human feedback, post-generation output filtering, contractual access vetting, and human-in-the-loop supervision were each calibrated to a generation of systems that did not exhibit autonomous cyber capability at the levels Mythos-class systems now demonstrate, and each is insufficient as a sole control against the new category under the threat assumptions specified here. This paper develops a defense-in-depth reference architecture for detecting and mitigating Mythos-class capability across enterprise and federal deployment surfaces. Detection is structured as a three-tier framework spanning pre-deployment evaluation, deployment-time access and telemetry, and runtime behavioral signatures. Mitigation is structured as four concentric layers: governance, cryptographic enforcement, architectural isolation, and operational monitoring. The cryptographic enforcement layer specifies an authority-binding architecture using post-quantum-attested provenance to bind output release to a verifiable authority chain. The architecture is mapped to the NIST AI Risk Management Framework, the NIST Cybersecurity Framework (CSF) 2.0, and the CISA Zero Trust Maturity Model, and is demonstrated against three application cases: post-quantum cryptography migration, federal AI supply-chain assurance, and critical-infrastructure operational technology defense. Limitations and a research agenda for empirical calibration are stated explicitly.

Review
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Yiwen Zhu

,

Lihe Liu

,

Jiaqian Yu

,

Di Zhang

Abstract: The proliferation of large language model (LLM) agents has enabled increasingly complex 2 multi-step automation; however, composing multiple agents into coherent systems intro3 duces significant orchestration challenges that remain poorly documented. This survey 4 examines LLM-based multi-agent orchestration from 2023 through early 2026 (literature 5 cutoff: March 2026). We propose a three-topology, one-adaptivity taxonomy—centralized, 6 decentralized, and hierarchical coordination topologies, each optionally augmented with 7 a dynamic/adaptive control axis—grounded in classical multi-agent systems theory and 8 recent empirical evidence. We compare four leading frameworks (LangGraph, CrewAI, 9 AutoGen/Microsoft Agent Framework, and OpenAI Agents SDK) along axes directly rele10 vant to practitioners: state-management granularity, token cost structure, failure-recovery 11 options, and design philosophy. The emerging protocol stack is examined in terms of why 12 MCP (agent-to-tool) and A2A (agent-to-agent) occupy complementary layers, how the 13 ACP–A2A merger signals protocol convergence, and where ANP’s decentralized-discovery 14 design fits. Production design considerations—state management, task planning, error 15 handling, scalability, and security—are evaluated with reference to published benchmarks. 16 We close by identifying five open challenges and proposing a six-dimension evaluation 17 framework for multi-agent coordination quality. This paper provides practitioners with 18 a decision framework spanning taxonomy, framework selection, protocol adoption, and 19 production deployment.

Concept Paper
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Gabriel Axel Montes

Abstract: AGI is often framed as a problem of aligning model objectives with human values or constraining agent behavior. That framing becomes incomplete once AI systems move into the infrastructures through which people and institutions perceive, evaluate, remember, and decide. Cognitive integrity is introduced as the first infrastructure of intelligence, in humans and AGI-mediated systems alike: the evolving capacity of a bounded system to maintain calibrated attention, trust, contestability, and decision under pressure. The central risk is not boundary change as such, but maladaptive boundary reorganization: transitions that leave persons or institutions unable to reform a viable, reality-linked, self-directing boundary after coupling with AI. This reframing surfaces a conceptual vocabulary for AGI governance centered on integrity boundaries and health, failed reintegration, cognitive rails, and successor-safe continuity.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Ali Kia

,

Batuhan Uzunoglu

,

Silvana Andreescu

,

Masudul H Imtiaz

Abstract: Wearable electrochemical biosensors often producevoltammetric signals that are corrupted by noise and long-term drift.Effective on-device denoising is critical to improve signal quality anddetect anomalies due to sensor drift or interference. This paperexplores lightweight TinyML models for denoising and drift detectionin wearable sensor voltammograms under the strict memoryconstraints of microcontrollers. We apply compact 1D convolutionaland dense autoencoder networks, as well as a PCA-basedreconstruction, to remove noise and identify drifting signals. Using apublic NIST dataset of cyclic voltammograms with added syntheticnoise and artifacts, we evaluate each model’s denoising performance(signal reconstruction MSE) and drift/anomaly detection capability (ROC-AUC) versus its memory footprint (quantized int8 model size). Results show that a small Conv1D autoencoder (8KB weights) canreduce noise by 75% and achieve 0.89 AUC for drift detection,approaching the performance of a larger dense autoencoder (35KB)and outperforming PCA. We observe a trade-off between model sizeand generalization: the larger autoencoder nearly perfectly flaggedanomalies (AUC 1.0) but smaller models remain competitive whileusing 4–6× less memory. These findings demonstrate that drift-resilient signal enhancement can be achieved on-device with minimalresource usage, enabling more robust wearable electrochemicalsensing.

Review
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Numan Saeed

,

Salma Hassan

,

Shadab Khan

,

Mohammad Areeb Qazi

,

Klaus H. Maier-Hein

,

Salman Khan

,

Mohammad Yaqub

Abstract: Clinical care is interventional. Physicians must decide how a patient's trajectory is likely to change under competing actions, not only estimate risk under the status quo. Most deployed medical artificial intelligence, however, remains optimized for classification or passive forecasting. We argue that the useful next abstraction is the medical world model, a learned system that represents patient state, models how that state evolves over time, accepts interventions such as drugs, doses, and procedures, and rolls trajectories forward under those interventions. Progress toward this goal is currently fragmented across digital twins, disease-trajectory models, surgical simulators, and generative electronic health record forecasting, with each community addressing a subset of the necessary ingredients. We organize the field with a capability ladder spanning representation, forecasting, single-arm projection, comparative treatment evaluation, and planning. Across imaging, physiology, longitudinal electronic health records, and surgical simulation, a consistent maturity pattern emerges. Representation and forecasting are widespread, narrow treatment-conditioned simulators are appearing, credible counterfactual comparison remains scarce, and validated treatment planners are absent. Once a model simulates what would happen under alternative treatments, causal validity becomes the binding constraint. Scaling data and generative modeling alone will not solve this. Credible medical world models also require explicit action definitions, causal design, and staged clinical validation with regulatory oversight. In this paper, the medical world model is a claims-to-evidence framework for simulation that can inform clinical decisions.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Lydia Castronovo

,

Giuseppe Filippone

,

Giuseppe Giacopelli

,

Gianmarco La Rosa

,

Marco Elio Tabacchi

Abstract: In Multi-Criteria Group Decision-Making (MCGDM), the assignment of weights to decision-makers is a crucial but methodologically delicate step, especially when the group includes both human experts and artificial experts such as intelligent agents, Artificial Intelligences (AIs) or Large Language Models (LLMs). Existing weighting strategies are often either difficult to interpret or poorly suited to heterogeneous groups of evaluators. In this paper, we investigate a fuzzy rule-based approach to expert weighting, building on a previously introduced methodological framework and focusing here on its application-oriented validation. The proposed method models expert weighting as a Fuzzy Rule-Based System (FRBS) in which relevant properties of the experts are represented by linguistic variables and combined through interpretable IF–THEN rules. In this way, weighting policies can be expressed transparently and adapted to the requirements of the decision domain. The framework produces normalised weights in the interval [0,1], which can then be incorporated into standard MCGDM aggregation procedures. To assess the operational behaviour of the approach, we consider an application involving the weighting of four LLMs evaluated over multilingual performance, computational requirements, and open-sourceness. The experiments show that the proposed framework is flexible enough to encode different weighting policies and that changes in the rule base produce clear and interpretable changes in the resulting rankings. This confirms both the practical usability of the method and its suitability for contexts in which multiple, potentially competing, objectives must be balanced explicitly. Overall, the paper provides an application-oriented study of an FRBS-based weighting scheme for artificial experts, highlighting its interpretability, adaptability, and potential relevance for contemporary MCGDM settings.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Chenhui Xia

,

Yeqin Shao

,

Meiqin Che

,

Guoqing Yang

Abstract: Accurate traffic sign detection is important for the safety of autonomous driving systems. However, fully supervised methods require a large amount of manual annotation, which is cost-prohibitive and time-consuming. Semi-supervised methods employ a small amount of labeled data and a large amount of unlabeled data to train the models, hence largely reducing the annotation costs. However, these methods have the following challenges: (1) with an imbalanced long-tail class distribution of traffic signs, they tend to achieve poor performance on tail classes; (2) they often fail to detect small traffic signs. To solve these issues, we propose a Semi-Supervised Traffic Sign Detection method with Dynamic Pseudo-Label Selection and Gated-Feature-Fusion-based Proposal Refinement. Firstly, we design a Class-Distribution-based Dynamic Pseudo-Label Selection module (CD-DPLS) to select pseudo-labels for different classes based on the class distribution information, which reduces the tendency to select more pseudo-labels from head classes instead of tail classes, thereby improving the tail class detection performance. Secondly, we employ a Gated-Feature-Fusion-based Proposal Refinement strategy (GFF-PR) to refine detection proposals by fusing different-scale features with a gating mechanism, which facilitates the detection of small traffic signs. Besides, we use an Adaptive-Weight Focal Loss (AWFL), with which the weight of each pseudo-label is determined by the ratio between its classification confidence and the corresponding class-specific classification-confidence threshold. Experiments on traffic sign datasets demonstrate that the proposed method outperforms state-of-the-art semi-supervised approaches, with mAP50 scores of 11.5% and 36.3% using only 1% and 10% labeled data, respectively.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Mulavhelesi Rambauli

,

Thakhani Ravele

,

Caston Sigauke

Abstract: Credit risk modelling is essential for assessing the likelihood of borrower default and supporting informed lending decisions. Despite advances in predictive algorithms, challenges remain in ensuring model transparency, reliability, and robustness to uncertain inputs. This study investigates integrating explainable AI (XAI) and uncertainty quantification (UQ) to enhance interpretability and confidence in credit risk predictions. Three modelling approaches, Logistic Regression, Random Forest, and XGBoost, were evaluated using the Home Equity (HMEQ) dataset, with performance assessed on predictive accuracy, probability calibration, interpretability, and uncertainty handling. Ensemble methods achieved superior predictive performance, exceeding 98% accuracy and yielding near-perfect AUC scores above 0.999, whereas Logistic Regression exhibited substantially lower performance. Calibration analysis revealed a discrepancy between accuracy and probabilistic reliability: Random Forest, despite high accuracy, produced less well-calibrated predictions (ECE = 0.0475), while XGBoost achieved both strong predictive performance and reliable confidence estimates (ECE = 0.0117). Entropy-based uncertainty quantification identified instances where the model’s predictions were highly uncertain, effectively highlighting challenging cases. SHAP and LIME consistently identified DELINQ, DEROG, and DEBTINC as primary drivers of default risk, aligning with established financial risk logic. By combining SHAP, LIME, and entropy-based UQ, this study proposes a unified framework that enhances interpretability, supports regulatory compliance, and increases trust in automated lending systems, emphasising the importance of reliable confidence alongside predictive accuracy.

Review
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Alexandros Miteloudis

,

Ioannis Hatzilygeroudis

Abstract: Decision tree ensembles, such as Random Forests and Gradient Boosting Machines, achieve high predictive accuracy but often suffer from limited transparency due to their structural complexity. Due to this lack, interpretability challenges arise in domains where model understanding, accountability, and trust are essential. So, many interpretability/explainability techniques have been proposed for tree-based ensembles. However, although there are enough surveys or overviews concerning interpretability/explainability in artificial intelligence or machine learning in general, there are very few surveys of overviews on interpretability/explainability for tree-based ensembles. This paper provides an overview of recent approaches to interpretability and explainability in decision tree ensembles. We present two categorizations; one based on the kind of technique/architecture used and the second based on the level of scope. The former is a unified taxonomy of acquired (or post-hoc) and inherent methods further analyzed in two more levels. The latter concerns the distinction between local (or instance-related) and global (or model-related) methods. We additionally provide a survey of the interpretability/explainability methods/techniques used in various domain applications, like healthcare, finance, law, privacy preserving. This overview clarifies the current landscape of interpretable/explainable ensemble learning, explicitly addressing emerging challenges. Ultimately, it aims to support researchers and practitioners in selecting and developing ensemble models that move beyond the traditional accuracy-interpretability trade-off, aligning predictive power with strict regulatory, operational, and domain-specific transparency requirements.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Parvani Mokhammad

,

Mohd Tauheed Khan

Abstract: Air pollution poses a serious environmental and public health problem in Bishkek, Kyrgyzstan, especially during the winter months when the concentration of particulate matter increases dramatically. Despite the urgency of the problem, there are fewer than eight monitoring stations in the city, which leaves large urban areas without proper air quality control. This article presents the first systematic study of image-based AQI assessment for Bishkek, which explores whether transfer learning models can extract visual cues related to environmental pollution from on-site urban photographs under real-world uncontrolled conditions. Two hybrid deep learning architectures, VGG16 and EfficientNetB0, each augmented with scalar PM2.5 input data, were trained and evaluated on a locally collected dataset of 1,014 image pairs–AQI. EfficientNetB0 consistently outperformed VGG16 on all three evaluation indicators, reducing RMSE by 15.5% (66.49 vs. 78.71) and MAE by 16.6% (49.00 vs. 58.78). Both models demonstrated a partial predictive signal in the AQI range from low to moderate, confirming that visual features related to the atmosphere can be detected even based on small datasets from local sources. The performance limitations reflect the scale of the dataset and sparse sensor infrastructure, rather than the lack of a studied structure, which is consistent with similar pilot studies conducted under similar data constraints. This work establishes a basic and methodological framework for future image-based air quality monitoring in Central Asia and identifies key bottlenecks — the size of the dataset, tag interference caused by geographic mismatches in sensor images, and the density of monitoring stations - that should be addressed in future work.

Review
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Xiao Yu

,

Yichen Zhang

,

Mingzhang Wang

,

Shifang Zhao

,

Weizhe Liu

,

Yuyang Yin

,

Zhongwei Ren

,

Ning An

,

Xinglong Wu

,

Hao Liu

+7 authors

Abstract: Acquiring world knowledge directly from visual observation is fundamental to Artificial General Intelligence (AGI). To support this capability, the Vision World Model (VWM) has emerged as a key paradigm, which learns how the world evolves over time from visual streams. However, recent progress has been driven by diverse research communities, resulting in inconsistent problem formulations, disconnected taxonomies, and divergent evaluation protocols. We argue that addressing this gap requires a conceptual shift: vision should not be treated merely as an input modality, but as the primary driver shaping how world models are represented, learned, and evaluated. Guided by this vision-centric perspective, we introduce a unified framework that organizes VWM research into three core components: vision encoding, knowledge learning, and controllable simulation, and use it to analyze existing model designs and evaluation methodologies. Finally, we outline future research directions that emphasize stronger physical and causal grounding, more meaningful evaluation beyond visual appearance, and scaling toward more general and reliable world modeling capabilities.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Danut Dragos Damian

,

Felicia Michis

,

Luminita Moraru

Abstract: This study proposes a transparent, data-driven framework for behavior recognition based exclusively on IMU measurements, hypothesizing that vehicular jerk-based features can help in differentiating driving behavior. Unlike studies relying on direct jerk values, our approach derives novel findings from jerk-based features. For rolling windows of 300 samples, a comprehensive set of statistical and dynamic descriptors is extracted, including amplitude, variance, standard deviation, coefficient of variation, standard error, skewness, and kurtosis, as well as jerk-based features such as jerk_std, jerk_variance, jerk_amplitude, and jerk_spikes. Statistical analysis is used to identify features with strong discriminative power. Effect sizes, measured by Cohen’s d, quantify the difference between normal and aggressive driving styles. The selected features are used to compute the Driving Score (DS) and provide a driver’s profile. Experimental results reveal a correlation between lower DS scores (< 50) and windows characterized by high jerk variability, large amplitude fluctuations, and frequent spikes. Conversely, higher DS scores (>70) indicate smooth and stable motion patterns. The robustness of the proposed framework is evaluated using several machine learning classifiers as baselines, with the most important jerk-based features as inputs. For the aggressive driver class, the DBS model reports a Recall of 0.952 and an F1 of 0.925. For the normal driver class, the DBS model reports a Recall of 0.839 and an F1 of 0.879. The model has a total accuracy of 0.907. Also, Logistic Regression and ensemble models like XGB and RF perform well. The proposed framework offers an explainable, computationally efficient alternative to conventional machine-learning classifiers for identifying aggressive drivers.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Chaoyue He

,

Xin Zhou

,

Di Wang

,

Hong Xu

,

Wei Liu

,

Chunyan Miao

Abstract: This position paper argues that long-horizon robotics should optimize persistent autonomy, not only longer reset-based episodes. Real deployments require robots that remain safely useful over days to months while accumulating memory, adapting to evolving human preferences, recovering from inevitable failures, and managing constrained physical and computational resources. Many embodied AI evaluations still inherit the logic of episodic reinforcement learning---where environments are frequently reset and hidden human labor is often unreported---but continuous operation exposes vulnerabilities in state continuity, resource coupling, recovery, and maintenance. Although long-term autonomy is not conceptually new, recent progress in generalist robot policies, open robot datasets, and language-conditioned control makes persistence a primary machine-learning evaluation target rather than a deferred downstream systems-engineering concern. As base policies grow more competent, the practical bottlenecks of autonomy concentrate in memory staleness, hidden intervention burden, recovery loops, and maintenance debt. To align evaluation with these realities, we propose a persistent-autonomy scorecard and a layered benchmark blueprint centered on long-run service utility, intervention burden, recovery quality, proactive usefulness, memory hygiene, uptime, and wear-adjusted throughput. By treating persistence as the fundamental scientific object, modern robot learning can focus on systems that turn calendar time into compounding competence rather than relying on isolated task success.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Wenqi Gu

,

Carlo Vittorio Cannistraci

Abstract: The recently proposed Cannistraci–Muscoloni–Gu Generalized Logistic–Logit Function (CMG-GLLF) introduced a flexible, trainable activation function capable of modulating input features in multilayer perceptrons (MLPs). However, its initial implicit approximation in the logit-phase caused computational overhead and numerical instability during training, limiting its application to more deep learning tasks and more complex network architectures. In this study, we derive a fully explicit and differentiable expression for the CMG-GLLF using a one-step Newton’s method approximation. We demonstrate that this new formulation resolves prior numerical instabilities and reduces computational overhead to match vanilla networks. When applied as activation function to input layers, it outperforms vanilla networks and linear modulators across MLPs, a simple CNN, and VGG-16 architectures on CIFAR-10/100, notably achieving superior performance on VGG-16 using a highly efficient "channel-wise" strategy that adds only 6 learnable parameters. Furthermore, we explore CMG-GLLF as a hidden-layer activation function. On image classification tasks, CMG-GLLF parameters naturally converge to approximate a ReLU-like shape, explaining its comparable performance. In contrast, on physics-informed neural networks (PINNs) solving 3 different physics process prediction tasks, CMG-GLLF learns activation functions highly different from standard Tanh activation function, achieving significantly better performance. These results demonstrate CMG-GLLF as a trainable activation function capable of explaining performance gain associated to learned node’s nonlinear functions in neural networks. Overall, this new formulation of CMG-GLLF provides powerful, scalable, and highly explainable node modulation for both classical and physics-informed neural networks.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Pi-Yun Chen

,

Chun-Yu Lin

,

Neng-Sheng Pai

,

Ping-Tzan Huang

,

Chao-Lin Kuo

,

Chien-Ming Li

,

Chia-Hung Lin

Abstract: Parkinson’s disease (PD) is a neurodegenerative disorder with an increasing incidence, sig-nificantly affecting patients' motor functions and quality of life. Involuntary upper limb tremor (ULT) commonly manifests unilaterally, affecting either the left or right upper limb. Clinically, the ULT frequencies are categorized into three distinct classes, including low-frequency (< 4.0 Hz), mid-frequency (4.0 – 7.0 Hz), and high-frequency (> 7.0 Hz) tremors. These ULT move-ments manifest as either oscillatory or rotational (angular displacement) motions, the so-called micro-Doppler effect (mDE). This study aims to develop a short-range (< 1.0 m) and contactless sensing method based on Doppler millimeter-wave (mm-Wave) radar for ULTs detection. The reflected electromagnetic waves indicate the time-varying frequency characteristics, which can be analyzed by using time-frequency transform (TFT) methods, such Wigner-Ville distribution (WVD) and smoothed pseudo WVD (SPWVD) methods. These TFT methods are used to extract the mDE features, which are subsequently visualized as color-coded spectrograms for ULTs classification. Then, a two- dimensional (2D) convolutional neural network (CNN) is employed to automatically recognize the visual feature patterns for ULTs classification based on frequency and amplitude information. In the experimental setup, the W-band (76 - 81 GHz) Doppler mm-Wave biosensor is implemented for sensing and extracting feature patterns. The proposed classifiers based on “WVD + 2D CNN” and “SPWVD + 2D CNN” are trained and validated by using the collected datasets, with 60% randomly selected for training datasets and 40% for testing datasets in each fold validation. The ten-fold cross-validation method is applied to evaluate the classifier’s performances, achieving an average Precision of 95.92%, average Recall of 95.89%, average F1-score of 0.9509, and average Accuracy of 95.89%, respectively. The experimental results demonstrate the feasibility of the proposed classifier for real-time ULTs classification in PD patients using short-range (< 1.0 m) and contactless sensing.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Vivek Shukla

,

Atul .

,

Mehul Kumar Das

,

Rishabh Tiwari

Abstract: Despite recent advances in artificial intelligence, static deep learning models still struggle in non-stationary real-world environments because of concept drift. This paper presents a framework for Self-Evolving Machine Learning Models (SE-MLM) that combines the rapid adaptability of meta-learning with the structural flexibility of Neural Architecture Search (NAS). Unlike train-once approaches that require manual retraining afterdrift our framework enables the model to update itself through a bi-level optimization process: an inner loop adapts weights using meta-gradients, and an outer loop refines the architecture through a continuous relaxation of the search space. Experiments on CIFAR-10, CIFAR-100, and Rotated-MNIST show that SE-MLM recovers up to 98% of baseline performance within minutes of a drift event and consistently outperforms static base- lines. We also discuss practical applications in healthcare monitoring and high-frequency trading, along with future directions in “Green AI” and explainability.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Tsuyoshi Okita

Abstract: The Fréchet Inception Distance (FID), the standard metric for evaluating deep generative models, aggregates all data into a single score and thereby masks quality degradation in safety-critical minority conditions and in specific temporal regions of generated time series. We trace this dilution problem to a single cause—the absence of stratification—and propose Stratified Fréchet Distance (SFD), which partitions evaluation data into strata along a chosen axis and computes the Fréchet distance within each stratum. The choice of axis determines the diagnosis: stratifying by operating condition detects minority-condition failures (generalizing the existing Conditional FID), by temporal segment localizes late-cycle quality breakdown, and by their cross-product yields a two-dimensional condition×time quality map. Comparing SFD at different granularities further enables quantitative detection of inter-condition confounding. Experiments on four battery datasets (161 cells) with CVAE models show that SFD detects condition-dependent quality gaps of 1.97× where FID registers only 1.01×, with up to 79× higher sensitivity for minority conditions. Condition×time stratification reveals that the largest gap (8.69×) occurs in the latter half of 35∘C degradation curves—a physically interpretable failure to reproduce accelerated high-temperature degradation. Granularity comparison further detects temperature–C-rate (charge/discharge rate) confounding (T/J = 1.72×), providing actionable guidance on which conditioning variables a generative model should include. These findings are robust across three feature extractors and four datasets.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Hiroki Naito

Abstract: Prior work showed that human-in-the-loop oversight becomes structurally untenable in high-loss domains when AI output velocity V exceeds human cognitive capacity C_max. The operative constraint, however, is not V alone but V × L, where L denotes per-item cognitive load. L consists of triage, judgment, and response, which respond asymmetrically to AI capability improvement. Triage cost does not decline as models become more capable, because semantic indeterminacy is inherent in general-purpose design. Response cost is invariant to accuracy improvements. Only judgment cost faces downward pressure, and this pressure often operates by inducing omission rather than genuine reduction. Capability improvement therefore restructures L rather than reducing it. Governance mechanisms based on evaluating whether AI output is correct either delegate that evaluation to AI and inherit hallucination risk, or delegate it to humans and face the V × L ceiling. We propose Flow-by-Flow, a governance paradigm that controls supervisory load without evaluating content. A cognitive cost score based on formal, countable features imposes nonlinear costs on high-volume production, while an institutional capacity cap keeps processing volume within C_max. We derive four design invariants for any content-judgment-bypass exceedance pathway: no content judgment, no scalable consumption of examiner capacity, identity-bound per-application friction, and no batch clearance. One reference implementation is discussed to show that these invariants are jointly satisfiable, while its practical difficulties are explicitly acknowledged. An illustrative Monte Carlo analysis across 1,000 parameter draws suggests that composite multi-metric flow control outperforms supervision reinforcement alone in 90.8% of trials.

of 242

Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated