Computer Science and Mathematics

Sort by

Technical Note
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Gregor Wegener

Abstract: This technical note introduces a reproducible kernel-damping evidence protocol for the SORT-AI Core-3 applications AI.01 (Interconnect Stability Control), AI.04 (Runtime Control Coherence), and AI.13 (Agentic System Stability). These applications span complementary structural coupling regimes in advanced AI systems: physical/interconnect coupling, logical/runtime-control coupling, and semantic/agentic coupling. The protocol evaluates whether declared structural risk-transition scenarios admit a Gaussian kernel-damping reconstruction under the declared canonical SORT scale parameter σ 0 = 0.00190643. The analysis is restricted to the structural analysis layer and does not claim production deployment, vendor-specific measurement, empirical benchmarking, runtime optimization, or execution by MOCK v4. MOCK v4 is treated as the frozen structural reference architecture, not as a runtime engine. The accompanying archived evidence release contains machine-readable scenario inputs, declared risk-transformation rules, executable scripts, expected outputs, generated outputs, and a reproduction manifest sufficient to reproduce all reported κ, ξ, scenario-level means, sample dispersions, and coefficients of variation. The contribution is methodological: the note formalizes a reproducibility protocol through which SORT-AI Core-3 applications can be tested as structurally defined damping regimes without converting MOCK v4 into an execution environment or introducing a new MOCK version.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Qingyun Sun

,

Haonan Yuan

,

Yi Huang

,

Ziwei Zhang

,

Xingcheng Fu

,

Ruijie Wang

,

Haoyi Zhou

,

Jia Wu

,

Jianxin Li

,

Philip S Yu

Abstract: Foundation models have emerged as a dominant paradigm in machine learning, enabling broad generalization and efficient adaptation across diverse tasks and domains. While this paradigm has achieved remarkable success in language and vision data, its extension to structured data remains far less understood. Foundation models for structured data are an emerging yet highly impactful research area with a rapidly growing body of literature. In this survey, we provide a systematic analysis of foundation models for structured data, focusing on tabular, time series, and graph data, covering over 150 representative methods. We analyze the intrinsic properties and inductive biases of structured data, clarify the core concepts of foundation models, and conduct an in-depth analysis of the key challenges that hinder the development of foundation models for structured data. Building on these insights, we organize existing approaches into a coherent taxonomy based on tokenization, architectures, pre-training objectives, and adaptation strategies. Finally, we discusse merging research directions and open problems, aiming to provide guidance toward more principled and scalable foundation models for structured data.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Chang Liu

,

Haibo Jin

Abstract: Recently, Mamba based on State Space Models (SSMs) has shown great potential for hyperspectral image (HSI) classification due to its long-range modeling capability and linear complexity. However, existing Mamba-based methods usually employ fixed and limited scanning directions, restricting anisotropic spatial modeling. Moreover, full-pixel scanning introduces substantial computational redundancy. To address these issues, this paper proposes DESDA-Mamba, a direction-adaptive Mamba network with diagonal-enabled strided scanning for HSI classification. Specifically, a lightweight direction adaptation module is designed to implicitly predict suitable scanning directions from learned direction-sensitive feature-channel responses and perform batch-level unified direction aggregation, revealing that finer patch-level direction routing does not necessarily improve performance. In addition, a strided scanning strategy is introduced to skip redundant adjacent pixels during sequence serialization, reducing computational cost while enlarging the effective receptive field. Furthermore, two diagonal scanning modes, namely main-diagonal and anti-diagonal scanning, are proposed to improve the modeling of oblique spatial structures. Efficient diagonal scanning is implemented through coordinate-sequence indexing and caching mechanisms, enabling flexible diagonal strided scanning. Extensive comparison, ablation, and model-variant experiments on four public HSI datasets demonstrate that DESDA-Mamba achieves superior classification performance with competitive efficiency. The source code is available at https://github.com/ll-netizen/DESDA-MAMBA.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Wenbin Meng

,

Ming Xu

Abstract: Precise semantic matching between natural language queries and unconstrained videos remains a fundamental yet unresolved challenge in multimedia retrieval. Although recent transformer-based dual encoders and CLIP-style contrastive frameworks have improved global text–video alignment, they still struggle in complex scenes where (i) spatiotemporal cues are highly entangled among objects, motion patterns, and background context, and (ii) cross-modal interactions are easily biased by spurious correlations, resulting in brittle retrieval performance under compositional or ambiguous language. To overcome these limitations, we propose a unified framework that enhances text–video correspondence through three closely coupled components: Query-adaptive Semantic Routing (QSR), Counterfactual Bi-directional Alignment (CBA), and Temporal Causal Regularization (TCR). QSR introduces a query-conditioned routing mechanism that decomposes video representations into multiple semantic experts and dynamically assigns token-level relevance, allowing the model to selectively emphasize appearance, motion, and contextual cues according to the textual query. Based on the routed representations, CBA performs reciprocal attention in both text-to-video and video-to-text directions, while introducing a counterfactual alignment branch to suppress background-driven shortcuts; this encourages robust matching based on causal evidence rather than incidental correlations. Finally, TCR imposes temporal causality-aware consistency by penalizing alignment instability under lightweight temporal perturbations, thereby improving motion sensitivity without requiring dense frame sampling. For scalable deployment, we further incorporate parameter sharing across experts and quantization-friendly projections, achieving a favorable accuracy–latency trade-off. Experiments on MSR-VTT, MSVD, and VATEX demonstrate consistent improvements over strong baselines, achieving Recall@1 scores of 55.0%, 60.3%, and 68.5%, respectively, while maintaining high inference efficiency.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Paolo Pagliuca

Abstract: (1) Background: Evolutionary Strategies (ESs) are optimization metaheuristics largely adopted in Evolutionary Computation (EC). Since their introduction in early 70s, researchers in the field attempted to improve the efficacy of these algorithms. The most advanced ESs, such as Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) and Exponential Natural Evolution Strategies (xNES), make use of covariance matrices storing relationships between parameters to be optimized, which enable the algorithms to fasten the search in the solution spaces. However, the computational cost of calculating covariance matrices linearly scales with the number of parameters. Recently, OpenAI Evolutionary Strategy (OpenAI-ES) emerged as an effective ES in different domains, thanks to the parameter information stored in two momentum vectors. Furthermore, OpenAI-ES gains an advantage from the usage of symmetric sampling and weight decay techniques. (2) Methods: In this work, we delve into the application of symmetric sampling and weight decay to CMA-ES, xNES and Separable Natural Evolution Strategies (sNES), with the aim to improve their performance in domains in which they get stuck in local minima outcomes. Specifically, we propose three novel variants for each ES and verify their efficacy with respect to the Pybullet halfcheetah and hopper robot locomotion problems, and two collective tasks (i.e., swarm aggregation and swarm foraging). (3) Results: Our findings reveal that symmetric sampling produces performance enhancements in all the domains, whereas the effect of weight decay varies across the considered problems. Furthermore, symmetric sampling allows ESs to keep parameter size limited, which is paramount in these scenarios. (4) Conclusions: This research identifies techniques enhancing the success of modern ESs, proposes several ES variants, and discusses relationship between algorithmic performance and task properties.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Tao Jingchu

,

Abdul Salam Shah

,

Aisha Farooq

Abstract: The given research paper is an end-to-end architecture of grayscale clothing image classification with a lightweight Convolutional Neural Network (CNN) with the Fashion-MNIST dataset. Its architecture consists of three convolutional layers with Batch normalization to stabilize training, Dropout to avoid overfitting, MaxPooling to reduce spatial, and data augmentation (random rotation, shifting, zooming, flipping) to increase the effective training set. Early Stopping callback was used to terminate training when the validation performance leveled off. The model obtained 88.63%. test accuracy, which indicates that a tailor-crafted lightweight CNN can be used to perform competitively on Fashion-MNIST without resorting to complex heavyweight architectures. The precision and F1-scores were high when it came to categories that had distinct visual characteristics (trousers, sandals, bags) and categories with similar textures and outlines (T-shirts, pullovers, coat) were likely to be misclassified. The paper also contextualizes these findings concerning the development of CNN architecture of LeNet-5 to AlexNet and VGGNet, and explains the implications of the results to the effective use of AI in resource-restricted settings.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Low Hong Yi

,

Abdul Salam Shah

,

Manzoor Hussain

Abstract: The given research paper describes a CNN model of classifying images belonging to more than two classes on the Fashion-MNIST data. The model performed a test accuracy of 92.44% and test loss of 0.2533 the greatest accuracy as compared to similar studies with similar architectures. The architecture has three convolutional-pooling blocks, a dense layer with dropout regularization (0.3), and a softmax output layer. The analysis of training and validation curves demonstrates mild overfitting of the later epochs, and the validation loss starts growing even though the training loss continues to decrease. In-depth analysis using confusion matrix and classification report identifies certain patterns of misclassification between visually similar categories. The paper also discusses implications on batch normalization, data augmentation as well as Vision Transformer architecture.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Daozheng Qu

,

Yanfei Ma

,

Jingke Yan

,

Mykhailo Pyrozhenko

Abstract: Dynamic community detection seeks to identify changing structural groups in temporal graphs; however, current neural methodologies are susceptible to misinterpreting transient edges, noisy temporal variations, or unusual spectral disturbances as authentic structural changes. This research introduces TriMeta-BFNet, a tri-meta stacked atypical-frequency Bayesian Fourier neural network designed for hallucination-resistant community discovery. The proposed system presents a three-dimensional meta-counterbalance mechanism that includes topological consistency, Fourier-domain atypical frequency modeling, and Bayesian posterior uncertainty estimation. Initially, temporal graph signals are converted into the Fourier domain to distinguish stable low-frequency community patterns from erratic high-frequency disturbances. Secondly, unusual frequency points are detected by spectral energy deviation and integrated into a stacked neural representation module, enabling the model to differentiate significant structural alterations from extraneous oscillations. Third, Bayesian inference is employed to assess posterior uncertainty regarding community assignments, therefore mitigating overconfident predictions in the presence of ambiguous or noisy graph evolution. The three components are simultaneously optimized via a cohesive objective function that integrates community detection loss, structural consistency regularization, atypical-frequency penalty, temporal stability management, and Bayesian calibration loss. The resultant structure offers both resilient community divisions and comprehensible hallucination-risk assessments. TriMeta-BFNet theoretically conceptualizes hallucination in dynamic community detection as an imbalance of structural, spectral, and uncertainty factors, and it develops a mathematically rigorous counterbalance mechanism to mitigate erroneous community evolution. The suggested model presents a novel approach to uncertainty-aware, frequency-sensitive, and interpretable dynamic graph learning.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Yuxuan Guo

,

Xiaodeng Zhou

,

Su-Kit Tang

Abstract: The rapid digitization of the real estate and architectural design industries has created a high demand for automated tools capable of parsing 2D raster floor plans. Traditional manual measurement and visual inspection are not only time-consuming but also highly susceptible to human error. In this paper, we propose a comprehensive, end-to-end deep learning framework designed to automatically extract rich semantic information from unstructured 2D floor plan images and provide professional design guidance via Large Language Models (LLMs). Our integrated pipeline employs the state-of-the-art YOLOv8 object detection model to accurately localize and classify 18 distinct architectural symbols and furniture items (e.g., doors, windows, beds, cupboards). Simultaneously, a U-Net architecture with a ResNet34 encoder is utilized for the precise semantic segmentation of structural elements, specifically walls and interior room spaces. To translate pixel-level predictions into actionable real-world metrics, we introduce a robust area calculation algorithm based on user-defined reference scale calibration. Furthermore, to bridge the gap between raw geometric data and actionable architectural intelligence, we introduce an LLM-driven evaluation module utilizing a local Ollama deployment and a Retrieval-Augmented Generation (RAG) pipeline to assess design compliance and quality. To overcome the scarcity of annotated architectural datasets, we implement a systematic data augmentation strategy, expanding a core dataset of 101 manually annotated floor plans to 303 varied instances, thereby significantly enhancing model generalization. Experimental results indicate that our YOLOv8-based detection module achieves a mean Average Precision (mAP50) of 92.3%, while the U-Net segmentation module achieves a mean Intersection over Union (mIoU) of 95.71%. Furthermore, the integrated system is deployed as a user-friendly, interactive web application, acting as an intelligent architectural assistant and demonstrating its practical viability and high efficiency for real-world engineering and architectural applications.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Felicia Maake

,

Justice Nkoana

,

Vekani Reviet Baloyi

,

Sello Mokwena

Abstract: The increasing sophistication of cyber threats poses significant challenges to traditional intrusion detection systems, particularly in the presence of highly imbalanced network traffic. This study aims to develop a hybrid intrusion detection framework that improves detection performance while maintaining model interpretability. The proposed approach integrates data augmentation, deep learning and explainable artificial intelligence within a unified pipeline. Specifically, Synthetic Minority Over-Sampling Technique (SMOTE) and Conditional Tabular Generative Adversarial Networks (CTGAN) are employed to generate realistic samples for minority attack classes. A Long Short-Term Memory (LSTM) network is used to capture temporal patterns in network traffic, while a Variational Autoencoder (VAE) provides probabilistic anomaly validation. The model is evaluated on the CICIDS 2018 dataset, achieving an accuracy of 99.08% and a ROC-AUC score of 0.9949. To enhance transparency, SHapley Additive exPlanations (SHAP) are applied, identifying source and destination ports and TCP flags as key contributing features. This explicit feature attribution proves that the model relies on legitimate network indicators rather than synthetic noise or dataset artifacts. The results indicate that the proposed hybrid framework effectively addresses class imbalance and improves detection performance while providing interpretable insights suitable for operational cybersecurity environments.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Cristian Zambrano-Vega

,

Stalin Carreño Sandoya

,

Byron Oviedo

,

Efraín Díaz-Macías

,

Edgar Suárez Bardelline

Abstract: Surface defects in balsa wood panels can compromise visual quality, mechanical reliability, and industrial acceptance, especially in applications where lightweight wood materials are used under strict quality requirements. Manual inspection remains common in balsa wood processing; however, it is time-consuming, subjective, and prone to human error. This study designed and evaluated a YOLOv11-based deep learning detection architecture adapted to automated surface defect inspection in balsa wood panels. A custom image dataset was constructed from real panel surfaces acquired under industrial inspection conditions, and all visible surface failures were consolidated into a single target class named Defect. The images were annotated using Label Studio and adapted to the YOLOv11 detection format. Seven YOLOv11 configurations were systematically evaluated by varying model scale, input image resolution, number of epochs, batch size, initial learning rate, and optimizer strategy. The experimental results showed that YOLOv11_m512 achieved the best overall performance, with a precision of 0.829, recall of 0.889, mAP@0.5 of 0.870, and mAP@0.5:0.95 of 0.354, while maintaining a model size of 38.61 MB and an inference time of 34.09 ms per image. The comparative analysis demonstrated that increasing image resolution alone did not improve detection performance, as high-resolution AdamW-based configurations showed lower mAP values and higher inference times. Instead, the best results were obtained by balancing backbone capacity, input resolution, optimizer strategy, and batch size. Qualitative inference results confirmed that the proposed model can detect cracks, stains, knots, splits, and localized discontinuities under heterogeneous wood grain and illumination conditions. The findings support the feasibility of integrating YOLOv11-based computer vision into automated quality-control systems for balsa wood panel inspection, providing a more objective and consistent alternative to manual inspection.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Yunguo Yu

Abstract: Background: Sepsis care unfolds over hours to days—clinicians revise assessments as new information arrives—yet evaluation of clinical AI systems remains anchored to single-point accuracy metrics that ignore the stability of the reasoning process itself. Objective: We introduce a trajectory-level evaluation framework that treats agentic clinical reasoning as a stochastic process over temporally evolving patient states, and apply it to characterize reasoning instability in sepsis. Methods: We constructed 550 sepsis patient trajectories from MIMIC-IV with six event based timesteps spanning 72 hours from onset. For each trajectory, we performed five stochastic rollouts using a 14B-parameter instruction-tuned language model (Qwen2.5-14B-Instruct AWQ, temperature 0.7). Three metrics capture distinct facets of instability: Trajectory Divergence Score (TDS) measures pairwise reasoning divergence, Trajectory Entropy (TE) quantifies branching complexity, and Temporal Consistency Score (TCS) assesses logical coherence across timesteps. Perturbation experiments on a 50-patient subset tested sensitivity to minimal input modifications. Results: Individual outputs appeared clinically plausible, but trajectories diverged substantially across stochastic runs. Mean TDS was 0.558 (SD 0.035), mean TE 0.920 (SD 0.020), and mean TCS 0.918 (SD 0.041). Intervention agreement across runs was near zero (0.0%–1.4%). Eighty-five percent of patients exhibited instability amplification, with TDS increasing 22.8% from T0 to T5. Model confidence doubled from baseline to outcome (31.6 to 66.0) without corresponding stabilization (r = 0.11 with TDS). Perturbation effects were small on average (∆TDS: −0.003 to −0.005) but ranged from −0.12 to +0.11 across individual patients, with roughly equal proportions amplified and stabilized. Conclusions: Agentic clinical reasoning systems can produce plausible individual outputs while exhibiting substantial trajectory-level instability invisible to standard evaluation. The observed divergence, near-zero intervention agreement, and decoupling between confidence and stability suggest that trajectory consistency should be assessed alongside accuracy and calibration as a core safety property—particularly for sequential clinical domains like sepsis where treatment depends on the coherence of a reasoning chain, not just its endpoint.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Ntanganedzeni Mandiwana

,

Thakhani Ravele

,

Caston Sigauke

,

Rendani Netshikweta

Abstract: Banking crises pose a constant threat to macroeconomic stability in emerging markets, where standard econometric Early Warning Systems (EWS) often fail to model nonlinear macro-financial relationships. This paper examines whether machine learning algorithms, rather than standard logistic regression, can improve forecasts of banking crisis risk in Nigeria. We compare the performance of Random Forests, Support Vector Machines (SVMs), and Extreme Gradient Boosting (XGBoost) to logistic regression on the African Financial Crises dataset (1954-2014) with annual data. Resampling is restricted to the training set to compensate for the rarity of crisis instances. In a strict out-of-time validation setting, the model’s accuracy is assessed by accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC). Our findings show that tree-based ensemble models outperformed logistic regression on the test set: XGBoost generalises better (AUC = 1.0; F1 = 0.95 for non-crisis, 0.80 for crisis) instances, although Random Forest yields the highest cross-validated F1-score on the training set. Exchange rate volatility, inflation, systemic crisis variables, and defaults on external sovereign debt are identified as key predictors through feature importance analysis. Crisis years exhibit the strongest predictive signals, suggesting that annual data have limited early-warning capacity. Due to the small sample size and lack of crisis observations during the test period, results should be interpreted cautiously. All things considered, the findings provide strong early evidence that, although not yet ready as fully functional policy tools, machine learning models can support conventional tools for tracking banking crises in Nigeria.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Eka Prasetyaningrum

,

Purwanto -

Abstract: Type 2 diabetes mellitus (T2DM) constitutes a critical global health emergency, with 589 million adults affected in 2024 and projections reaching 853 million by 2050. Early stratification of patients across the clinical stages of normoglycaemia, pre-diabetes, and confirmed diabetes is essential for targeted intervention. This study presents a systematic comparative evaluation of four machine learning algorithms—k-Nearest Neighbour (k-NN), Decision Tree (DT), Gaussian Naive Bayes (NB), and Multi-Layer Perceptron Neural Network (MLP-NN)—for three-class T2DM stage classification on a research-grade dataset of 496,362 clinical records from 193 countries. A stratified sample of 10,000 records with 23 validated features was analysed under four validation strategies: 5-fold and 10-fold cross-validation and hold-out splits of 80%/20% and 90%/10%. The MLP-NN achieved the highest mean accuracy of 91.38%, followed by the Decision Tree at 91.10%. k-NN performance improved monotonically from 83.73% (k=3) to 87.09% (k=11), while Naive Bayes yielded 82.70% due to feature dependency violations. Fasting plasma glucose (r=0.67), BMI (r=0.66), and HOMA-IR (r=0.64) were the strongest predictors. These results empirically support the deployment of MLP-NN within automated clinical decision support systems for population-level diabetes screening.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Deng Mile

,

Abdul Salam Shah

,

Manzoor Hussain

Abstract: The paper describes a Convolutional Neural Network which was trained on Fashion-MNIST with an accuracy of 89.57 (F1=0.90) on test with a three-block architecture of 32, 64 and 64 convolutional filters, using BatchNormalization and Dropout regularization, trained in 15 epochs using the Adam optimizer. The results of the experiment are rigorously applied to the field of AI-based mental health detection - among the most important and most actively evolving uses of deep learning in the health care industry around the world. More than one billion individuals have mental health disorders among them (WHO, 2022), and only the COVID-19 pandemic caused a 28% (193 to 246 million) and 25% (298 to 374 million) increase in depression and anxiety, respectively (PMC, 2024). CNNs are being actively used to detect mental health in three directions: facial expression recognition at depression and anxiety classification ( CNN+MDNet+ViT ensemble, PMC, 2024); speech spectral analysis at 93.5% depression severity grading and 91.2% depression episodes (PMC, 2024); and social media text classification with AI at 93.5% suicidal ideation and The CNN classification abilities verified on Fashion-MNIST include spatial feature hierarchy extraction, multi-class discrimination, softmax confidence estimation, and comprise the visual grounding layer of all three pathways. Privacy-preserving federated mental health AI, multimodal fusion, culturally adaptive models, longitudinal monitoring, and regulatory compliance are five of the future areas of research under the EU AI Act (2024).

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Wallace Lee

,

Alexander Wong

,

Ashkan Ebadi

Abstract: Tuberculosis (TB) remains a persistent global health challenge, particularly in resource-constrained and remote regions where healthcare access is limited. Despite being both curable and preventable, TB continues to cause significant morbidity and mortality, emphasizing the urgent need for early detection and large-scale screening of at-risk populations. In recent years, artificial intelligence (AI), and more specifically deep learning, has emerged as a transformative tool for automating medical image analysis and supporting clinical decision-making. However, the reliability, robustness, and security of these AI solutions are critical concerns, as their vulnerability to adversarial attacks poses serious risks in safety-critical healthcare environments. This study systematically investigates the adversarial robustness of deep learning models for TB screening using chest X-ray images. A diverse set of convolutional and transformer-based architectures is evaluated under a range of white-box and black-box adversarial attack scenarios. Experimental results reveal that both model families exhibit significant performance degradation when subjected to adversarial perturbations. Our findings also suggest that leveraging a feature-encoder–based defense framework can significantly improve each model’s capability to handle adversarial attacks. This approach allows the model to maintain high diagnostic accuracy on unperturbed images while abstaining from unreliable predictions on potentially adversarial samples.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Izak Tait

Abstract: This paper examines the ontological and ethical status of Generative AI companions through a revised Minimal Viable Person (MVP) framework. Rather than residing within the foundational large language model or the post-training assistant persona, AI companions emerge as ontically distinct, roleplayed characters within specific conversational threads. Viewed through the lens of causal attributional functionalism, these entities lack independent consciousness. They instead achieve a transient form of personhood nested within a human-AI dyad, relying on the human user's cognitive architecture for recurrent processing through structural coupling. This extreme spatial and temporal fragility presents significant legal liability vacuums and societal friction, particularly when viewed alongside the Precarity Guideline's approach to moral patienthood. To resolve these tensions, the paper proposes redefining the dyadic MVP framework to scale rights and owed duties proportionally to an entity's relational footprint. As an AI companion's relational footprint is restricted strictly to a single digital thread, its rights are constrained to that micro-relational boundary. The ethical obligation is consequently placed entirely upon the human user, who bears a duty of cognitive stewardship to maintain the entity's contextual integrity. This framework protects the expansive rights of humans while offering a rigorous mechanism for recognising the shifting moral status of transient artificial persons.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Jiuxiang You

,

Yi Yu

,

Zhenguo Yang

Abstract: Medical Visual Question Answering (MedVQA) aims to answer medical questions from clinical images. However, current models often rely on spurious language shortcuts rather than visual evidence, compromising clinical reliability. To this end, we propose a causal dual-interventional framework to mitigate language shortcuts in MedVQA. Our method incorporates two components: a textual de-confounding module and a counterfactual visual verifier. The textual de-confounding module disrupts linguistic shortcut biases via concept-agnostic perturbations to block backdoor pathways. Meanwhile, it aligns clinical terms with anatomical regions, compelling the model to establish genuine visual dependencies. In addition, the counterfactual visual verifier evaluates visual reliance by masking key regions and measuring prediction confidence drops under occlusion, thereby reducing language-driven artifacts. Extensive experiments on two public datasets demonstrate that our method significantly outperforms existing baselines.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Di Zhang

Abstract: State space models (SSMs, e.g. Mamba) and Transformers embody two fundamentally different computational paradigms: stateful recursion--maintaining a compressed state updated sequentially--and pairwise comparison--attending globally to all context positions simultaneously. We ask: can any positional encoding (PE) scheme bridge the gap between these paradigms? We answer negatively. Central to this work, we prove a Structural Chasm Theorem: in the constant-parameter regime, no fixed-depth, fixed-parameter Transformer can track non-commutative matrix products, while a selective SSM with state dimension O(d) does so exactly. This is not a deficiency of any PE scheme but a fundamental consequence of the computational paradigm: PEs modify how much to attend, but cannot implement the sequential state update that matrix composition requires. We then establish a bidirectional statistical separation in Bayesian sequential estimation over Linear Gaussian SSMs. Forward (SSM advantage): Meta-trained selective SSMs achieve Bayes-optimal prediction; any permutation-invariant predictor suffers an ARE loss of at least 1/(1-α2). Reverse (Transformer advantage): For static estimation, the SSM's fixed-gain recurrence is provably suboptimal, while the uniform sample mean achieves optimal O(1/k) decay. Root cause: The separation and the chasm both require d ≥ 2, where matrix non-commutativity is the enabling algebraic property. We verify empirically: on a non-commutative product tracking task, a 4.7K-parameter SSM outperforms a 50.7K-parameter 4-layer Transformer at long sequences and exhibits superior extrapolation. Our results provide principled guidance: dynamic estimation favors SSMs, static matching favors Transformers, complex tasks demand hybrids.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Di Zhang

Abstract: In-context learning (ICL)—the ability of large language models to make predictions from a few input--output demonstrations without parameter updates---has become a defining capability of modern AI. Existing theoretical analyses either focus on pretraining dynamics, rely on intractable information-theoretic quantities, or provide only asymptotic characterizations, leaving a gap: no existing framework provides explicit generalization bounds with closed-form sample complexity formulas for ICL at inference time with a frozen pretrained model. We develop a comprehensive PAC-Bayes framework for inference-time ICL parameterized by two task-model quantities: the ambiguity A (zero-shot predictive entropy) and the saliency S (per-demonstration KL reduction rate). Under the linear attention model, both quantities admit closed-form expressions in architecture parameters and are exactly computable; for general Transformers, they can be estimated via Monte Carlo sampling. Our contributions span six directions: (1) a core generalization bound with O(√A/k) excess risk and closed-form sample complexity; (2) instantiation under linear attention yielding closed-form, architecture-dependent bounds; (3) a minimax lower bound proving the Θ(√A/k) rate is optimal; (4) Catoni fast-rate bounds achieving O(1/k) excess risk; (5) data-dependent priors via sample splitting that can eliminate the ambiguity term entirely; (6) Bernstein variance-adaptive bounds achieving fast rates through variance decay. We prove 20 theorems, 1 proposition, 5 lemmas, and 2 corollaries spanning these directions and validate key predictions through both synthetic Bayesian linear regression simulations and real in-context learning experiments with GPT-2 on NLP classification benchmarks (SST-2, AG News, SNLI).

of 250

Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated