2.1. System Hardware
Data from the arm was collected using two Myo Armbands from Thalmic Labs, each consisting of an IMU and sEMG sensors [
44]. The device is depicted in
Figure 1.
Myo is a commercial motion controller that contains an ARM Cortex-M4 120 MHZ microprocessor, eight dry sEMGs with a sampling rate of 200 Hz, and a nine-axis IMU with a sampling rate of 50 Hz. The IMU provides ten different types of data. The accelerometers in the device measure acceleration in terms of the gravitational constant (G), while the gyroscope measures rotation in terms of rad/s. The device includes magnetometers to obtain orientation and positioning measurements, which are used to determine the movement of the arm. Additionally, the device has eight dry sEMG sensors that allow for the detection of finger movements.
The Myo armband was the ideal choice for the study outlined in the manuscript due to its exceptional features that make it particularly suited for real-time sign language recognition (SLR). The Myo armband was selected for its integrated sensors, including both inertial measurement unit (IMU) sensors and surface electromyography (sEMG) sensors. The IMU sensors capture arm movements, while the sEMG sensors detect muscle activity related to hand and finger movements. This combination provides comprehensive data necessary for accurate gesture recognition. The Myo armbands are wireless and highly portable, allowing for practical, everyday use outside of laboratory environments. Additionally, the armbands enable real-time processing of the captured data. The Myo armband processes data in real-time, making it ideal for translating sign language into text or speech with minimal delays. Its sensors are sensitive and precise, ensuring high accuracy in gesture recognition. The study’s results demonstrate that the Random Forest algorithm achieved an accuracy rate of 99.875%. The Myo armband design is both user-friendly and comfortable to wear, which is crucial for ensuring user acceptance and continuous use. It can be effortlessly put on and taken off and does not necessitate the user to handle complex setups. In comparison to other motion capture devices such as data gloves and camera-based systems, Myo armbands are a cost-effective option. This technology is highly accessible and practical for large-scale deployment, thanks to its enhanced accessibility. The Myo armband boasts a robust Application Programming Interface (API) that empowers the creation of tailored applications, such as the SLR system mentioned in the manuscript. This support allows developers to personalize applications to suit the unique demands of their projects. The Myo armband is an excellent choice for developing a sign language recognition system that is effective and practical for everyday use. It addresses many of the limitations found in previous technologies.
2.2. Developed Software
The application was developed using the Delphi programming language and Embarcadero RAD Studio 11 Integrated Development Environment. Two versions were created for Windows and Android platforms, with the possibility of creating additional versions for Linux and IOS platforms using the same code. SQLite was used as the database.
Figure 2 presents a flowchart outlining the sequential steps of the proposed methodology for recognizing sign language using the Myo armband.
The program has 5 different user interfaces.
2.2.1. Main User Interface
Upon running the program for the first time, the user interface depicted in
Figure 3 will appear. The interface is designed with simplicity in mind. Pressing the Connect button triggers the program to check for location and Bluetooth permissions. If permission is not granted, the program will request it from the user. If permission is granted but Bluetooth connection or location access is turned off, the program will turn them on. Finally, the program will establish a connection to both Myo Armbands using the Bluetooth Low Energy (BLE) protocol. After establishing the connection, the label and oval-shaped markers on the left of the screen appear if the EMG and IMU services are successfully subscribed. These markers turn blue when there is a change above the threshold value in the data. The blue oval indicator under the TEST label indicates that the program can make predictions from instantaneous movements. The guessed word is displayed in written form in the upper middle of the screen. The color transition in the three-dimensional arms can be used to observe changes in the EMG data of each arm. The battery status of each armband is displayed as a number ranging from 0 (empty) to 100 (fully charged) in the upper left and right corners of the interface. To return the three-dimensional arms to their reference position, click the refresh icon in the lower right corner. Access the Settings interface by clicking on the three-line icon located in the lower right corner.
2.2.2. Settings User Interface
The interface (
Figure 4) allows access and modification of all device and software settings. It is possible to adjust the device’s data, vibration, lock, sleep, and wake functions, as well as change the language. Currently, Turkish and English are available, with more languages to be added in the future.
Precise data segmentation settings can be adjusted individually. For segmentation, each EMG and IMU data is stored in a global variable. Motion is detected by calculating the difference between the current and previous measurements of 200 Hz (EMG) and 50 Hz (IMU) data. The user can adjust the limit values for this in the settings interface to suit their needs.
The maximum value of the EMG Trackbar is 256, which corresponds to the range of EMG data from the device (-127 to 128). In the example interface, this value is set to 40. Therefore, any absolute change of 40 in the values of the eight EMG sensors indicates motion.
The pause between two signals can be adjusted between 20 ms and 1 s (1000ms). If you wish to speak quickly, you can select the minimum value of 20 ms. In the example interface, this value is set to 60 ms. If the limit value specified in the EMG or IMU is not exceeded for 60ms (12 measurements, each taking 5 ms since the device operates at 200 Hz), the sign is considered finished. Following the end of the sign, the data properties are calculated and sent to the artificial intelligence algorithm for classification.
The IMU orientation data determines the position of the device in the x, y, and z planes and detects motion. The IMU Trackbar maximum value is set to 360, as this is the maximum value of the angles being measured. In the provided interface, the value is set to 20. If the absolute value of the mean of the changes in the Euler angle labels (α, β, γ) exceeds 20 degrees, it indicates movement. The Euler angles are obtained from the IMU sensor using the Quaternion method. To convert the given Euler angle labels to quadrilateral labels, use the following conversion. The values of roll, pitch, and yaw angles from the Euler angle notation are represented by α, β, and γ respectively. The assumed rotation order is from pitch to roll from deflection. The corresponding quarter q is defined as:
The minimum duration of the movement is set with the minimum record time, which is 0.4 s in the example interface. To calculate certain features, a certain amount of data is required. An option has been added to prevent the software from giving errors.
Additionally, detailed information can be displayed or hidden for simplicity and performance improvement using checkboxes for Log, 3D Arm, Progress, Sound, and Graph.
The Log Checkbox allows for instant viewing of all data and software/device changes.
The 3D Arm Checkbox displays arm movements on the screen in real-time, with the option to hide the feature to reduce performance and battery consumption.
The Progress Checkbox displays recording and training times as a progress bar.
If the Sound Checkbox is selected, the signs are read aloud using the operating system’s text-to-speech feature. This software feature enables the user’s phone to convert sign language movements into sound.
The Graph Checkbox allows for visualization of device data in chart form.
Additionally, the MYO Armband’s classification features can be used to control the application (e.g., putting the device to sleep, waking it up, starting and stopping).
Figure 5 shows that the device classifies five different hand movements as hardware and sends them over the BLE service. The application can also utilize these features to generate a total of ten different commands from two different devices. For instance, when the user makes a ‘Fist’ movement with their right arm, the 3D arms move to the reference position. To display the logs, click on the icon of three lines in a row located in the lower right corner of the main graphical interface. The logs will appear when the ‘Wave In’ movement is made with the right arm and disappear when the same movement is made again.
When the ‘Fingers Spread’ movement is made with the right arm, the BLE characteristics of the devices are subscribed, meaning that the data begins to be received. In other words, the program starts. When the movement is repeated, the program stops. Many commands like this can be performed with wristbands without the need to touch the phone, even when it is in our pocket.
2.2.3. Records User Interface
This interface was designed to collect data for training artificial intelligence.
Figure 6 displays a list box on the left showing words and the number of records associated with each word. To create a new record, select the word and click the ‘New Rec’ button. Recording begins when the motion starts and continues until it ends. When the recording is complete, the word count increases by one. If the Auto Checkbox is selected, the list box will automatically switch to the next word. When the movement starts, the data for the next word is saved. If it reaches the last word in the list, it will move on to the first word and continue until the Stop Rec Button is clicked. This interface allows for graphical display of data from the device during recording. Additionally, if a gif file with the same name as the word exists, the movement video will be displayed. The list box at the top displays the record numbers of the selected word in the database. By selecting the record number, the data of that record is graphically displayed, allowing for the identification of any incorrect records. To remove unwanted records, double-click (double Tab on the phone) on the record number and select ‘yes’ in the warning message. The interface displays the graph of record number 244 for the word ‘correct’.
2.2.4. Train User Interface
Training involves using the data obtained after extracting features from the IMU and EMG raw data. The algorithm and parameters selected in this interface are used to train existing data. The training is then tested using the 10-fold cross-validation method. The performance values of the end-of-training algorithm are displayed in
Figure 7. The sample interface uses the K-nearest neighbor algorithm with a K=3 parameter.
2.2.5. Dictionary User Interface
This interface allows for the addition and removal of words or sentences from the database (refer to
Figure 8). Additionally, words can be uploaded from a txt file for convenience. However, please note that loading from the file will result in the deletion of all existing records. If accepted, a new dictionary will be added. Another noteworthy feature of the application is its ability to support multiple languages. As the application is personalized, it can create a unique experience by allowing users to choose their own words, create custom records, and train it using artificial intelligence.
2.3. Data Collecting
Determining the signs of Turkish Sign Language is a crucial aspect of this study, which will be tested by the system. To achieve this, we received support from Hülya AYKUTLU, a licensed sign language trainer and translator with years of experience in the Special Education Department. The testing location is shown in
Figure 9.
The system was designed to predict more than 80 words, but to test the performance of the system, 80 words, which is the maximum number of words used in similar studies, was chosen. While selecting these 80 words, requests from sign language instructors and users were evaluated.
To test the system, 80 frequently used words in sign language and daily speech were selected. The system also supports additional words.
If we need to categorize the selected words.
Greeting and introduction words (hello, I’m glad, meet, etc.), Turkish (merhaba, sevindim, görüşürüz vb.);
Family Words (mother, father, brother, etc.), Turkish (anne, baba, abi vb.);
Pronouns and Person Signs (I, you, they etc.), Turkish (ben, sen, onlar vb.);
Common Verbs (come, go, take, give, etc.), Turkish (gel, git, al, ver, vb.);
Question Words (what, why), Turkish (ne, neden);
Other Daily Life Words (home, name, good, warm, easy, married, year, etc.), Turkish (ev, isim, iyi, sıcak, kolay, evli, yıl vb.).
The 80-word dictionary was repeated 10 times, with each word being recorded by the IMU sensors of the Myo Armband device 50 times per second. The data collected includes 10 measurements, consisting of gyroscope accelerometer (x, y and z) and orientation (x, y, z and w). Additionally, data from eight sEMG sensors of the device are measured 200 times per second and stored in memory during recording. At the end of the recording, 1044 features are extracted from the stored data, including both raw and feature-extracted data, which are then stored in the database. If the sign recording lasted for 1 s, a total of 4200 raw data and 1044 feature data would be stored. The data was initially segmented using either a fixed time or a fixed reference arm position. The slow and tiring nature of the application testing was not well received by users. To ensure fast and effective communication during data segmentation, a motion detection system is employed instead of a fixed time. The sensitivity setting can be adjusted in the settings section to determine the most suitable option. Another important aspect of this feature is that hearing-impaired individuals may produce signs at varying speeds depending on their level of excitement and emotional state. As a result, the duration of the same sign may differ. To account for this, ten different recordings were taken for each received sign.
2.5. Classification
In this study, we used various classification methods to address research objectives. The Weka Deep Learning (WDL) algorithm was employed to harness the power of deep learning for extracting features and classifying data. The k-Nearest Neighbor (KNN) method, a non-parametric algorithm, is utilized for pattern recognition based on data point proximity in the feature space. We also employ the Multi-Layer Perceptron (MLP), a type of artificial neural network, for its ability to model complex relationships within the data through its layered structure. Naïve Bayes (NB), a probabilistic classifier, was chosen for its simplicity and efficiency in managing datasets. The Random Forest (RF) method, an ensemble learning technique, was applied to combine the predictions from numerous decision trees, improving the classification performance. Support Vector Machines (SVM), known for their effectiveness in high-dimensional spaces, were employed to determine the optimal hyperplane to separate data. Each of these classification methods has been selected to make use of its specific strengths and capabilities in tackling the complexities of the research problem, enabling a comparative study for a thorough analysis.
The study compared the classification performance of features obtained through feature extraction using various classification algorithms and parameters. All algorithms were tested using the 10-Fold cross-validation method. The Weka Application Programming Interface (API), developed specifically for the Windows platform, allows for the use of all algorithms available in Weka by converting the data in the database into ARFF file format [
47]. Trainings are saved as a model file containing the algorithm name and parameters. Therefore, a previously performed training can be predicted using the model file without the need for retraining. On the Android platform, only the KNN algorithm is used due to its classification performance and fast operation [
48]. The aim is to incorporate additional algorithms in the future, including the Weka API on the Android platform.
To add Weka algorithms to the program, edit the ‘Data\weka\algorithms.txt’ file located in the program’s installation folder. Here is an example of the file’s contents:
bayes.NaiveBayes
lazy.IBk -K 1 -W 0 -A
trees.J48 -C 0.25 -M 2
functions.MultilayerPerceptron -L 0.3 -M 0.2 -N 50 -V 0 -S 0 -E 20 -H 50
trees.RandomForest -P 100 -I 100 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1
bayes.BayesNet -D -Q
When the program is executed, each added line is displayed as a combo box, as shown in
Figure 10 when the training interface is accessed.
In the measurements taken for a total of 800 data points, consisting of 80 signs and 10 repetitions, the average time taken for feature extraction from the raw data and classification after the signal ended was 21.2ms. This demonstrates the system’s ability to perform in real-time, as the time was imperceptible to the users testing the system.
The training results of six different algorithms, selected based on their popularity and classification success, were compared.
Table 3 presents the results of the training conducted using a total of 800 data points, with 80 signs and 10 records for each sign. The 10-fold cross-validation method was used for testing. This method uses all data for both testing and training. The default parameters of the algorithms in Weka were used for this comparison, as their performance was quite high. The default parameters of the algorithms in Weka were used for this comparison, as their performance was quite high. No alternative parameters were tested.
In another application, the training was conducted by splitting the data at different rates instead of using 10-fold cross-validation. Some of the randomly selected records from a total of 10 records for each sign were used for training, while the remaining records were used for testing. The results of these classifications, made using the same algorithm and parameters, are also shown in
Table 4.
The classification results obtained from different algorithms indicate that the Random Forest algorithm outperformed the Naïve Bayes algorithm. It is important to note that training using a single record resulted in very low success rates. Therefore, it is recommended to repeat each sign at least three times to create a record. Increasing the number of repetitions is directly proportional to the increase in performance. Despite the variations in recording speeds, the classification performance remains consistently high.
The variation in performance among different algorithms in a 10-fold cross-validation classification task can be attributed to several factors, such as the nature of the algorithms, their handling of data complexity, and their sensitivity to the specifics of the dataset used. In this section, we will evaluate the performance of the listed algorithms based on three key metrics: accuracy, kappa statistic, and root mean squared error (RMSE).
WDL and RF Performance: Both WDL and RF have demonstrated exceptional accuracy of 99.875%, with identical kappa statistics of 0.9987, indicating almost perfect classification capabilities compared to a random classifier. However, it is worth noting that WDL outperforms RF in terms of RMSE, with an impressively low value of 0.0053, compared to RF’s RMSE of 0.037. The analysis shows that WDL is more consistent in its predictions across the dataset, possibly due to better handling of outlier data or noise within the dataset.
KNN performs moderately well, with an accuracy of 95.5% and a kappa statistic of 0.9542. It has the lowest RMSE among all algorithms at 0.0020, indicating very tight clustering around the true values. KNN is a strong model, despite its lower accuracy compared to WDL and RF. It is important to note that KNN may require parameter tuning, such as the choice of ‘k’, and may be sensitive to noisy data.
MLP exhibits a strong performance with an accuracy of 98% and a kappa statistic of 0.9797, despite its relatively higher RMSE of 0.0201. a kappa statistic of 0.9797, despite its relatively higher RMSE of 0.0201. The higher RMSE in comparison to its accuracy and kappa indicates variations in prediction errors, possibly due to the complexity of the model and the need for careful tuning of its layers and neurons.
NB: In contrast, NB demonstrates the lowest performance among all evaluated models, with an accuracy of 87.625%, a kappa statistic of 0.8747, and a relatively high RMSE of 0.0556. While NB may encounter difficulties when dealing with datasets where features are not independent, which is a core assumption of the algorithm, SVM is able to handle such datasets with ease.
Although SVM has the highest RMSE of 0.11 among the algorithms, its superior performance in other areas makes it the recommended choice. The analysis clearly demonstrates that the SVM algorithm outperforms the NB algorithm in terms of accuracy and kappa statistic. Although SVM has the highest RMSE of 0.11 among the algorithms, its superior performance in other areas makes it the recommended choice. Although SVM has the highest RMSE of 0.11 among the algorithms, its superior performance in other areas makes it the recommended choice. The high RMSE, despite good accuracy and kappa statistic, suggests that SVM’s decision boundary may be less stable or more sensitive to individual data points, possibly due to the choice of kernel or regularization parameters.
WDL and RF outperform the other models in terms of accuracy and kappa statistics, likely due to their robustness to data imperfections and their ability to model complex patterns. WDL is superior in handling outliers or noise compared to RF, as evidenced by its lower RMSE. The other models’ performance is dependent on their intrinsic assumptions and sensitivity to data characteristics. It is imperative to select the appropriate model based on the specific requirements and nature of the dataset.