Computer Science and Mathematics

Sort by

Review

Software

Applied statistics 101 in R: One-way Repeated Measures Analysis of Variance

Keston G. Lindsay

Abstract: Repeated measures ANOVA is the statistical method for comparing means of the same sample measured at least two different times, or two different contexts. It may also be used to compare means between two or more related groups. This paper serves as a tutorial for repeated measures ANOVA using R. It will introduce readers to parametric, nonparametric and robust one-way repeated measures ANOVA using the rstatix, afex, WRS2, and ARTool packages.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0566.v1

Article

Computer Science and Mathematics

Mathematics

Degeneracy of the Operator-Valued Poisson Kernel Near the Numerical Range Boundary

Shanmu Jin

Abstract: Let $A\in\C^{d\times d}$ and let $W(A)$ denote its numerical range. For a bounded convex domain $\Omega\subset\C$ with $C^1$ boundary containing $\spec(A)$, consider the operator-valued boundary kernel \[ P_{\Omega}(\sigma,A)\;:=\;\Real\!\Bigl(n_{\Omega}(\sigma)\,(\sigma\Id-A)^{-1}\Bigr), \qquad \sigma\in\partial\Omega, \] where $n_{\Omega}(\sigma)$ is the outward unit normal at $\sigma$. For convex $\Omega$ with $W(A)\subset\Omega$ this kernel is strictly positive definite on $\partial\Omega$ and underlies boundary-integral functional calculi on convex domains. We analyze the opposite limiting regime $\Omega\downarrow W(A)$. Along any $C^1$ convex exhaustion $\Omega_\varepsilon\downarrow W(A)$, if $\sigma_\varepsilon\in\partial\Omega_\varepsilon$ approaches $\sigma_0\in\partial W(A)$ with convergent outward normals and $\sigma_0\notin\spec(A)$, then $\lambda_{\min}(P_{\Omega_\varepsilon}(\sigma_\varepsilon,A))\to 0$ and the corresponding min-eigenvectors converge (up to subsequences and phases) to the canonical subspace $(\sigma_0\Id-A)\mathcal M(n)$ determined by the maximal eigenspace of $H(n)=\Real(\overline{n}A)$. Quantitatively, we obtain two-sided bounds in terms of an explicit support-gap scalar, yielding a linear degeneracy rate under bounded-resolvent hypotheses and an explicit rate for outer offsets $W(A)+\varepsilon\mathbb{D}$. For normal matrices we compute the eigenvalues of $P_{\Omega}(\sigma,A)$ explicitly, showing that degeneracy may fail at spectral support points unless the supporting face contains multiple eigenvalues.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0563.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

A Hybrid Fuzzy–Ensemble Machine Learning Framework for Non-Invasive Prediction of HER2 Status in Breast Cancer

Hassan Salarabadi

Dariush Salimi

Seyed Sahand Mohammadi Ziabari

Mozaffar Aznab

Abstract: HER2 status determination is a crucial task in breast cancer prognosis and treatment,1 yet traditional diagnostic methods such as immunohistochemistry (IHC) and fluorescence in situ2 hybridization (FISH) are invasive, time-consuming, and costly. Motivated by the need for scalable3 and data-driven predictive approaches, we propose a hybrid machine learning framework that4 integrates ensemble learning with fuzzy modeling for HER2 prediction using routinely available5 clinical and immunohistochemical data. A dataset comprising 624 breast cancer patients from6 Mahdieh Clinic (Kermanshah, Iran) was analyzed, with extensive feature engineering, scaling, and7 class balancing applied. We developed an ensemble framework based on tree-based learners (Random8 Forest, XGBoost, and LightGBM), combined through ensemble strategies and enhanced using fuzzy9 feature representations and decision threshold optimization. The proposed hybrid model achieved10 an accuracy of 0.816, an F1-score of 0.814, and an area under the ROC curve (AUC) of 0.862 on11 the held-out test set, demonstrating strong discriminative capability and balanced classification12 performance. This work highlights the potential of hybrid fuzzy–ensemble learning for uncertainty-13 aware predictive analytics in biomedical decision support, aligning with the journal’s focus on14 information processes, intelligent systems, and data mining.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0556.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

Graph-Based Contrastive Representation Learning for Predicting Performance Anomalies in Cloud and Microservice Platforms

Yuchen Liu

Abstract: This paper proposes a self-supervised modeling framework based on contrastive time-series representation learning to address the complexity of backend system performance anomaly prediction in cloud computing and microservice environments. The method constructs a time-varying service dependency graph and a temporal encoding mechanism to achieve joint representation of spatial structural features and temporal dynamic features, enabling the unsupervised identification of potential performance degradation patterns. The model consists of four main components: a dynamic graph construction module, a graph convolution feature extraction module, a time-series encoding module, and a contrastive learning optimization module. The dynamic graph module captures the evolving dependencies among services, while the time-series encoding module extracts multi-scale temporal features. The contrastive learning module builds positive and negative sample pairs to achieve representation aggregation and differentiation in the latent space. Extensive experiments on real backend system monitoring datasets, along with sensitivity analyses on learning rate, optimizer, temperature coefficient, and data missing rate, demonstrate that the proposed model outperforms mainstream methods in accuracy, precision, recall, and AUC, showing strong generalization and robustness. This study provides a new technical approach for early identification of performance anomalies in complex distributed systems and offers practical, theoretical, and methodological support for intelligent operation and performance assurance in cloud platforms.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0559.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

Learning to Debate for Improving School Level Education with Large Language Models

Aniket Deroy

Abstract: For decades, competitive debate has been hailed as the "ultimate mental gym," sharpening critical thinking, research skills, and public speaking. However, it has often remained an elitist activity, confined to schools with the budget for specialized coaches and extensive travel. The emergence of Large Language Models (LLMs) like Gemini and GPT-4 represents a seismic shift, offering a way to democratize high-level dialectical training. While some fear that AI might encourage "intellectual laziness," I argue that, if implemented correctly, LLMs can serve as the ultimate "Digital Socrates"—an infinitely patient, remarkably well-read sparring partner for the next generation of thinkers.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0562.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

CSSA: A Cross‐Modal Semantic‐Structural Alignment Framework via LLMs and Graph Contrastive Learning for Fraud Detection of Online Payment

Zirui Zhao

Keyu Yuan

Ziyue Wang

Jiaqing Shen

Yirui Huang

Abstract: Graph Neural Networks (GNNs) have demonstrated exceptional performance in modeling structural dependencies within networked data. However, in complex decision-making environments, structural information alone often fails to capture the latent semantic logic and domain-specific heuristics. While Large Language Models (LLMs) excel in semantic reasoning, their integration with graph-structured data remains loosely coupled in existing literature. This paper proposes CSSA, a novel Cross-modal Semantic-Structural Alignment framework that synergizes the zero-shot reasoning of LLMs with the topological aggregation of GNNs through a contrastive learning objective. Specifically, we treat node attributes as semantic prompts for LLMs to distill high-level "risk indicators," while a GNN branch encodes the local neighborhood topology. A cross-modal alignment layer is then introduced to minimize the representational gap between semantic intent and structural behavior. We evaluate CSSA on a massive dataset of 2.84 million online transaction records. Experimental results demonstrate that CSSA achieves a superior F1-score and AUC compared to state-of-the-art GNNs, particularly in scenarios characterized by extreme class imbalance and covert adversarial patterns.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0543.v1

Article

Computer Science and Mathematics

Algebra and Number Theory

Some New Results on N(2,2,0)-Algebras

Fang-an Deng

Tao Chen

Yichuan Yang

Xiuli Li

Abstract: An N(2,2,0)-algebra (abbreviated as NA-algebra) is an algebraic structure equipped with two binary operations, $\ast$ and $\bigtriangleup$, satisfying specific axioms. This paper investigates a special class of NA-algebras where the operation "$\ast $" exhibits nilpotent properties. We study several fundamental concepts within NA-algebras, including ideals, congruence decomposition, congruence kernels, and multiplicative stabilizers. A notion of NA-morphism is introduced, and a corresponding NA-morphism theorem is established. Furthermore, we explore the relationships between NA-algebras and other related logical algebraic structures, such as quantum B-algebras, Q-algebras, CI-algebras, pseudo-BCH-algebras, and RM-algebras. Notably, we prove that any nilpotent NA-algebra forms a quantum B-algebra. These results lay a foundation for further research into the structure and potential applications of NA-algebras.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0476.v1

Short Note

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

Architectural Advances and Performance Benchmarks of Large Language Models in Light of Anthropic’s Claude Opus 4.6

Satyadhar Joshi

Abstract: The rapid evolution of Large Language Models (LLMs) between 2024 and 2026 has ushered in a transformative era of artificial intelligence capabilities, characterized by significant architectural innovations, multimodal integration, and enhanced reasoning abilities. This paper presents a comprehensive comparative analysis of state-of-the-art LLMs including Anthropic's Claude Opus 4.6, OpenAI's GPT-5 series, Google's Gemini 2.5/3 Pro, and emerging models such as GLM-4.6. The release of Claude Opus 4.6 in early 2026 represents a significant milestone, introducing a 1 million token context window and demonstrating state-of-the-art performance across diverse domains. We systematically examine key technological trends including Mixture-of-Experts (MoE) architectures, extended context windows exceeding 1 million tokens, and advanced alignment techniques. We analyze the technical implementation of extended context windows, MoE architectures, and advanced reasoning capabilities that enable superior performance. Comprehensive benchmarking reveals Claude Opus 4.6's leading position in agentic coding, tool use, and complex reasoning tasks, while comparative analysis with competing models highlights evolving architectural strategies. Performance is rigorously evaluated across multiple domains including automated coding, medical informatics, regulatory document processing, and general reasoning benchmarks. The paper further investigates practical applications in software development, healthcare informatics, and regulatory compliance, demonstrating how architectural choices translate to real-world performance advantages. Our analysis reveals that while parameter scaling remains relevant, strategic divergence in architectural philosophy and deployment strategies increasingly defines the competitive landscape. This study provides insights into the current state of LLM technology, identifies key trends shaping future development, and offers recommendations for future evaluation methodologies in this rapidly advancing field.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0537.v1

Review

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

Semantic Alignment and Output Constrained Generation for Reliable LLM-Based Classification

Jixiao Yang

Sebastian Sun

Yang Wang

Yutong Wang

Xikai Yang

Chi Zhang

Abstract: To address the limited controllability, unstable output consistency, and weakly constrained decision processes of large language models in text classification tasks, this work proposes a controllable prompt-driven text classification method that establishes an end-to-end unified modeling framework from instruction alignment to constrained decoding. Text classification is reformulated as an instruction-conditioned generative discriminative problem. Input texts and task instructions are jointly encoded to form a unified internal representation that integrates textual semantics with classification constraints. On this basis, a category semantic alignment mechanism is introduced to ensure that the model explicitly follows category boundaries and decision criteria defined by the instructions, thereby reducing classification inconsistency caused by prompt variation or implicit bias. To further improve output reliability, a structured constrained decoding strategy is designed to restrict the generation space to a predefined set of valid categories, preventing redundant text or invalid outputs from interfering with classification results. Comparative analysis under unified data and evaluation settings demonstrates that the proposed method achieves more consistent advantages in classification accuracy, discriminative stability, and overall separability. These findings indicate that deeply integrating instruction understanding and output control into the decision process of large language models effectively transforms their generative capacity into stable, interpretable, and controllable text classification capability, providing a systematic solution for building reliable intelligent text analysis systems.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0525.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

The Mathematical Foundations of Constrained Object Hierarchies: A Theory of General Intelligence

Harris Wang

Abstract: Constrained Object Hierarchies (COH) present a comprehensive theoretical framework for artificial general intelligence (AGI) grounded in neuroscience principles. This paper develops the complete mathematical foundation of COH theory, demonstrating how intelligence emerges from hierarchical compositional structures constrained by adaptive optimization principles. We provide formal definitions, prove theorems regarding the theory's soundness and completeness, and establish connections with established mathematical frameworks including category theory, dynamical systems, and information theory. The paper shows that COH provides a mathematically rigorous basis for modeling intelligent systems across domains while maintaining the flexibility required for general intelligence.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0521.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

Tensor Logic of Embedding Vectors in Neural Networks

Vincenzo Manca

Abstract: Current Artificial Neural Networks based on Large Language Models (LLMs) primarily use statistical token prediction, often lacking rigorous structural semantic consistency and illocutionary force. This paper introduces the \textbf{Tensor Functional Language Logic (T-FLL)} as a formal bridge between symbolic reasoning and continuous neural manifolds. We redefine linguistic units as functional noemes and propose a mapping of logical operators onto tensor operations. Sentences are translated into \emph{noematic formulae}, and we show that the attention mechanism driving the semantics of a dialog can be reformulated more efficiently if directed by the noematic formulae. In this way, we outline a path toward more explainable and structurally sound AI architectures.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0530.v1

Concept Paper

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

Analysis of Artificial Intelligence in the Post-Pandemic Era

Felipe Valentim

Abstract: During the pandemics, the positive value of technologies was emphasized. In the post-pandemic era, shortly after the easing of confinement, the negative values were also re-evidenced. Despite the noted depreciation, it is agreed that technological advancement will always have a positive balance, but that there cannot be injustice due to lack of access to technology. This can be the subject of studies on digital inclusion. In turn, the set of values and practices that seek to ensure that the development and use of artificial intelligence (AI) systems are safe, fair, and responsible is discussed in the ethical and moral sciences of AI. This work presents a write-up as an attempt to generalize the framework presented in the work done by Michalski et al. (2025) and discuss a) norms for evaluating needs and areas of application, b) definition of values of the methods, and c) definition of criteria for comparing techniques.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202601.2196.v2

Article

Computer Science and Mathematics

Signal Processing

Enabling Adaptive Interaction in the Metaverse Using a Hybrid EEG-Based Brain–Computer Interface

Sapthak Mohajon Turjya

Abstract: This paper presents a hybrid model for the control of brain-computer interfaces (BCIs) for Metaverse environments, with the goal of advancing the capabilities of such interfaces beyond the traditional motor imagery (MI) or P300-based brain-computer interfaces. This hybrid model utilizes P300 for virtual devices' interaction and MI for navigation and movement imagination in the Metaverse, with each EEG modality being dedicated to a particular control state and state changes being made sequentially based on the context of the interaction. In the simulated experiment, the imagined movement of the left and right hands is used for rotational navigation, while discrete devices' actions use P300 responses under a five-stimulus oddball paradigm. In the performance evaluation, the paper shows that the hybrid model, with the use of MI and P300 under a single BCI, achieves accuracy comparable to single-mode BCIs, with advantages over existing BCI systems regarding their capabilities for interaction and adaptability, thus proving the effectiveness of hybrid control for achieving dynamic and flexible Metaverse interactions.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0061.v2

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

A Generalizable Efficient Machine Learning Framework for Schizophrenia Classification Using Multiscale EEG Features and Ensemble Methods

Yossef Emara

Xinran Han

Alice Zhang

Yi Lin

Yang Zhang

Abstract: EEG-based automated classification pipelines for identifying mental disorders increasingly rely on deep learning architectures that are computationally intensive and difficult to interpret, limiting reproducibility and clinical deployment in resource-constrained or cross-site settings. There is a need for algorithmically transparent frameworks that balance accuracy, generalization, and computational efficiency. We propose an interpretable EEG classification framework that integrates multiscale spectrotemporal feature extraction with ensemble machine learning. The pipeline combines standardized preprocessing with extraction of time-domain, spectral power, and entropy-based features, followed by minimum redundancy–maximum relevance feature selection. Classification is performed using voting and stacking ensembles of heterogeneous base learners. The proposed algorithm achieved 98.06% accuracy on a primary open EEG dataset (Warsaw IPN; 19 channels, 250 Hz) and 91.47% accuracy on an independent external dataset (Moscow adolescent cohort; 16 channels, 128 Hz) without retraining or dataset-specific tuning. The framework exhibited low computational overhead and stable cross-dataset performance. The results demonstrate a generalizable, computationally efficient, and interpretable EEG classification framework that favors feature-level transparency and ensemble diversity over deep architectures, supporting scalable and reproducible biomedical signal processing applications.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0513.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

Research on the Construction and Application of a Water Conservancy Facility Safety Knowledge Graph Based on Large Language Models

Cui Li

Yu Wang

Lei Gao

Qiaoyan Ding

Abstract: To address the challenges of integrating multi-source heterogeneous data and low knowledge utilization rates in water conservancy facility safety management, this study proposes a knowledge graph construction method that integrates ontology modeling with large language model enhancement. First, an ontology framework for water conservancy facility safety is constructed, encompassing four core elements: agencies and personnel, engineering equipment, risks and hidden dangers, and systems and processes. Subsequently, a KG-LLM-GraphRAG architecture is designed, which optimizes the knowledge extraction effectiveness of large language models through ontology-constrained prompt templates and utilizes the Neo4j graph database for knowledge storage and multi-hop reasoning. Experimental results demonstrate that the proposed method significantly outperforms traditional approaches in entity-relationship extraction tasks. The constructed knowledge graph not only effectively supports application scenarios such as safety hazard identification, emergency decision-making, and knowledge reuse but also provides an efficient knowledge organization and reasoning tool for water conservancy facility safety management, strongly propelling the digital transformation of the water conservancy industry.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0482.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

Neural Network Modeling of Air Spring Dynamic Stiffness Based on Its Pneumatic Physics

Yuelian Wang

Bo Tao

Wenzheng Hu

Jiaqi Zhao

Fa Su

Zuguo Ma

Ye Zhuang

Abstract: To meet the real-time computational requirements of active suspension control systems, this study shifts from complex microscopic physical equations to a direct nonlinear functional mapping between the relative motion states (displacement and velocity) and the output force of air springs. This approach aims to preserve critical nonlinear hysteresis characteristics while significantly reducing the computational overhead. A progressive modeling strategy is implemented to characterize these complex behaviors. Initially, polynomial fitting is employed to identify key input features; however, its limited capacity to capture intricate nonlinearities necessitates more advanced methods. Subsequently, standard Feedforward Neural Networks (FNN) are explored for their nonlinear mapping capabilities, yet their inherent "black-box" nature often leads to convergence difficulties and restricted generalization. To address these issues, a Physics-Informed Neural Network (PINN) architecture is introduced, embedding physical governing equations as regularization constraints within the loss function to integrate data-driven flexibility with mathematical rigor. Recognizing that conventional PINNs often encounter convergence challenges due to conflicts between PDE constraints and data-driven loss terms, this research develops a Physics-Embedded Hierarchical Network (PEHN). By deriving specialized PDE constraints tailored to air spring dynamics and designing a hierarchical architecture aligned with these physical requirements, the PEHN effectively balances physical priors with experimental data. Experimental results demonstrate that the proposed PEHN ensures robust convergence and superior accuracy in capturing complex nonlinearities, hysteresis effects, and dynamic stiffness variations compared to baseline models.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0491.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

AI-Driven Performance Degradation Identification via Self-Supervised Spatiotemporal Graph Modeling in Microservice Systems

Yuchen Liu

Abstract: This paper proposes a self-supervised performance degradation identification model to address the challenges of high-dimensional heterogeneous data, complex dependency structures, and dynamic non-stationarity in large-scale microservice architectures. The model takes multi-source monitoring data as input and first performs semantic alignment among different metrics through multidimensional feature embedding and projection layers. An adaptive dynamic graph convolutional network is then employed to capture the topological dependencies and interaction features among service nodes, constructing time-varying structural representations. In the temporal modeling stage, a gated recurrent unit-based embedding mechanism is introduced to jointly characterize long-term dependencies and local fluctuations of performance evolution, while a residual fusion structure enhances the stability of feature propagation. To improve feature discrimination under unsupervised conditions, the model adopts a contrastive learning optimization strategy and utilizes a temperature adjustment mechanism to strengthen the distinction between positive and negative samples in the latent space, enabling adaptive aggregation and recognition of degradation patterns. Furthermore, multiple hyperparameter sensitivity experiments are conducted to systematically evaluate the effects of learning rate, residual coefficient, temperature parameter, and monitoring sampling interval on model performance. Experimental results show that the proposed model outperforms mainstream methods in accuracy, precision, recall, and F1-score, achieving efficient and stable identification of performance degradation in complex microservice systems under unsupervised settings, thus providing a practical solution for intelligent operations and maintenance.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0480.v1

Article

Computer Science and Mathematics

Computer Networks and Communications

The Nash Equilibrium in Digital Cash Systems: Revisiting Rational Choice Under Transaction Validation Constraints

Craig S. Wright

Abstract: This article examines Nash equilibrium stability in digital cash systems, using Bitcoin as a canonical model for protocol-constrained strategic interaction. Building on the formal framework established in Wright (2025), we characterise mining as a repeated non-cooperative game under endogenous constraints: hashpower allocation, latency asymmetries, fee-substitution dynamics, and institutional noise. We show that equilibrium behaviours are sensitive to the structural composition of miner rewards—specifically, the transition from subsidy-dominated to fee-dominated environments—and that volatility in protocol rules leads to equilibrium multiplicity and eventual collapse. Using tools from mainstream game theory and Austrian time preference theory, we demonstrate that rational strategic cooperation is only sustainable under strict protocol immutability. Rule mutation introduces uncertainty that distorts intertemporal valuation and incentivises short-term extractive strategies. These results suggest that digital monetary systems must be governed by non-negotiable constitutional rules to preserve incentive compatibility across time.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0506.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

Benchmark: Deep Learning Methods for VERDICT MRI in Brain Tumour Microstructure Characterisation

Zheng Yu

Abstract: Understanding the microstructure of brain tumours without invasive methods remains a major challenge in neuro-oncology. The VERDICT MRI technique provides biologically meaningful metrics, such as cellular and vascular fractions, that help distinguish tumour grades and align closely with histological findings [1,2]. Yet, traditional non-linear fitting approaches are both computationally heavy and prone to errors, which restricts their use in clinical practice. Deep learning presents a promising solution by enabling faster and more reliable diffusion analysis [3]. Still, there is limited evidence on which specific neural network designs are best suited for accurate VERDICT parameter mapping. We present the first head-to-head benchmark of eight neural network families for predicting VERDICT parameters: multilayer perceptron (MLP), residual MLP, Long short-term memory (LSTM)/Recurrent Neural Network (RNN), Transformer, 1D-Convolutional Neural Networks (CNN), variational autoencoder (VAE), Mixture of Experts (MoE), and TabNet. All models were trained and evaluated under a unified protocol with standardized preprocessing, matched optimization settings, and common metrics (coefficient of determination R², RMSE), supplemented with bootstrap-based uncertainty and pairwise significance testing. Across targets, simple feedforward baselines performed competitively with more complex sequence and attention-based models, indicating that architectural complexity does not uniformly translate into superior accuracy for VERDICT regression on tabular features. Compared to traditional fitting, learned predictors enable fast inference and streamlined deployment, suggesting a practical path toward near-real-time VERDICT mapping. By establishing performance baselines and a reproducible evaluation protocol, this benchmark provides actionable guidance for model selection and lays the groundwork for clinically viable, learning-based microstructure imaging in neuro-oncology.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0470.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

Artificial Intelligence-Driven Supervised Classification Algorithm for Website Vulnerability Detection Using MITRE NVD CVE Scores

Amolika Roy

Paras Jiskar

Om Mishra

Abstract: As cyber threats continue to evolve, traditional security measures often fail to detect emerging vulnerabilities in real-time, particularly for small and medium-sized enterprises with limited resources. This study develops an AI-driven supervised classification algorithm for website vulnerability detection that integrates insights from the National Vulnerability Database (NVD) and Common Vulnerability Scoring System (CVSS) scores. A dataset of 40,000 vulnerability entries was curated using reconnaissance tools including Nmap and Nessus, with HTML code snippets labeled according to severity levels. The methodology employed CodeBERT transformer models for converting raw HTML into numerical embeddings, followed by Random Forest classification trained on AWS SageMaker. A Chrome browser extension was developed to extract live webpage content and communicate with a Flask-based API hosted on Amazon EC2 for real-time inference. Following optimization through TF-IDF vectorization and hyperparameter tuning, the model achieved 66.3% accuracy with ROC-AUC values ranging from 0.60 to 0.70 across severity classes. The system successfully classifies websites into Low, Medium, or High-risk categories in real-time. This research demonstrates that supervised machine learning offers a practical, cost-effective, and auditable alternative to computationally intensive deep learning approaches, providing accessible vulnerability detection while maintaining compliance with emerging AI governance frameworks such as ISO 42001 and the NIST AI Risk Management Framework.

Posted: 06 February 2026

https://doi.org/10.20944/preprints202602.0475.v1

of 658