The continuous development and increasing popularity of digital media technology has made it an integral part of people's lives. Digital media takes many forms including text, images, audio and video, and the volume of traffic transmitted over the network is constantly growing. However, the rapid growth of digital media has led to a number of challenges, including digital media traffic management. This involves managing and optimizing the transmission of digital media in the network [
1]. Digital media traffic classification is a crucial technique in digital media traffic management. It helps network administrators understand network usage and develop better network strategies. Moreover, digital media traffic classification assists network security personnel in identifying malicious traffic and network attacks. Therefore, digital media traffic classification is of great practical significance and has broad application prospects [
2,
3].
Currently, several approaches are used to classify digital media traffic, including port number and protocol-based approaches, traffic feature-based approaches, and deep learning-based approaches [
4,
5]. Nguyen et al. [
3] reviewed the current state of research on Internet traffic classification and obfuscation techniques, analyzed the limitations of traditional classification methods in terms of complex management and handling of new traffic features, and summarized various data representation methods and the different objectives of Internet traffic classification. Currently, the traffic feature-based approach has become one of the mainstream methods for classifying digital media traffic. This approach uses statistical features in traffic, such as packet size, time interval, and transmission rate, to categorize traffic. Clustering algorithms, such as K-means and KNN, and classification algorithms, such as SVM, are widely used to classify digital media traffic. Erman et al. [
6] used an unsupervised K-means algorithm to identify core traffic (e.g. P2P, Web, TFP, etc.) using average message interval, stream duration, average message length, etc. as specialization values. William et al. [
7] compared five machine learning algorithms: discrete plain Bayesian, plain Bayesian kernel, C4.5 decision tree, Bayesian network, and Bayesian tree. The experimental results show that all four algorithms can achieve 90% classification accuracy except for the plain Bayesian kernel, but the computational performance of each algorithm varies greatly, with C4.5 being the fastest. However, these methods have several practical limitations. Firstly, manual feature extraction is often required, and feature selection often requires significant domain knowledge and experience. Secondly, these methods often do not handle encrypted traffic or multi-purpose traffic well, and the computational complexity is often high. Finally, these methods are less efficient when dealing with large-scale traffic data. To address the limitations of traditional classification methods, researchers have developed deep learning-based methods, such as convolutional neural networks and recurrent neural networks, for classifying digital media traffic. These methods use end-to-end learning and classification algorithms, avoiding the problem of manual feature extraction, and achieve higher accuracy and stronger generalisation capabilities than traditional methods. For example, Zhang et al. [
8] used deep convolutional neural networks for end-to-end learning and classification of traffic, achieving high classification accuracy and strong generalisation capability in experiments. Similarly, Bryan et al. [
9] proposed an SDN-based method for classifying digital media traffic, which achieved high classification accuracy and fast classification speed in experiments. Wang et al. [
10] compared the classification performance of 1-dimensional and 2-dimensional CNNs with the traditional classification algorithm C4.5 algorithm. Wu et al. [
11] compared theoretically and experimentally the advantages of 2D CNNs over 1D networks, traditional machine learning algorithms, and RNN networks in terms of computation and number of parameters, respectively, without inferior performance to the control group. Although deep learning methods have shown promising results, they have some limitations in the field of digital media traffic classification [
12]. Firstly, they require a large amount of training data to improve their accuracy and generalisation ability. However, data collection and annotation are difficult in this field, limiting the amount of available training data. Secondly, feature extraction is challenging because different types of digital media have different features and structures, requiring different feature extraction methods. Finally, deep learning methods require significant time and computational resources for model training, which can increase the difficulty and cost of implementing the algorithm.
This study investigates the suitability, advantages, and limitations of different algorithms for digital media traffic classification. Three classical machine learning algorithms, decision trees, support vector machines, and neural networks, were compared in terms of their performance. The experimental results demonstrate that all three algorithms can solve the digital media traffic classification problem, but their performance and applicability vary. Decision trees and support vector machines exhibit better performance when the data volume is small and the feature dimension is low, whereas neural networks exhibit better classification performance when the data volume is large and the feature dimension is high. Therefore, this study provides valuable insights for using machine learning techniques to address digital media traffic classification problems.