Article
Version 1
Preserved in Portico This version is not peer-reviewed
Structural Syntax Network for Code Classification
Version 1
: Received: 11 December 2023 / Approved: 12 December 2023 / Online: 12 December 2023 (05:35:17 CET)
How to cite: Shah, M.; Patel, R.; Terry, A. Structural Syntax Network for Code Classification. Preprints 2023, 2023120805. https://doi.org/10.20944/preprints202312.0805.v1 Shah, M.; Patel, R.; Terry, A. Structural Syntax Network for Code Classification. Preprints 2023, 2023120805. https://doi.org/10.20944/preprints202312.0805.v1
Abstract
The field of program classification, a critical aspect of software engineering, facilitates understanding and categorization of code across various applications, including anomaly detection, and code quality assessment. The advancement of cross-language program classification opens up avenues for efficient code translation among different programming languages, significantly aiding developers in rapid coding and reducing development cycles. Existing research primarily focuses on semantic code analysis, with limited emphasis on cross-linguistic challenges. To address this, we introduce an innovative neural network model, CodeSemanticsNN, which leverages a refined Syntax Structure (SS) approach. This model consists of two integral mechanisms: first, it adopts a novel SS representation that combines sequential and graph-based SS structures, enhancing semantic feature capture. Second, it employs a 'unified vocabulary' strategy to bridge the gap between diverse programming languages, facilitating efficient cross-language classification. Additionally, we have compiled a comprehensive dataset of 20,000 files spanning five programming languages, serving as a benchmark for cross-language classification. Our experiments on this dataset demonstrate that CodeSemanticsNN surpasses existing models in four key metrics: Precision, Recall, F1-score, and Accuracy.
Keywords
code analysis; syntax-based classification; multi-language code parsing; cross-linguistic code analysis
Subject
Computer Science and Mathematics, Computer Networks and Communications
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment