1. Handwriting: acquisition and role
Handwriting, considered as
language by hand [
1], is a complex perceptual-motor task involving attentional, perceptual, linguistic, and fine motor skills. Handwriting occupies a large proportion of children’s daily activities at school [
2,
3] and is the basis, together with reading, for the acquisition of higher-level skills such as spelling, grammar, syntax, and text composition. A relationship between the mastery of handwriting movement and the quality of writing content has been established, both at the semantic level in text production [
4] and at the orthographic level, in word formation [
5]. If children pay too much attention to handwriting movements, they may have difficulties in the allocation of cognitive resources to higher-level processes [
6].
From a developmental perspective, handwriting originates from drawing, from which it slowly differentiates as the child grows. In younger children, the quality of drawings is correlated to the quality of handwriting [
7]. Then, with the acquisition of handwriting, this relationship between drawing and writing quality decreases to eventually disappear [
8]. The formal acquisition of handwriting begins around the age of 5 at preschool, and its mastering requires about 10 years of practice and training. The automation of handwriting is partial at the age of 10 (5th-grade) and is considered almost complete around the age of 14 (9th-grade) (for a review, see [
9]). During acquisition, handwriting evolves first in terms of quality (primarily between 1st- and 5th-grade), then in terms of speed (from 4th-grade essentially). Efficient, fully automated handwriting relies on a balance between speed and quality: it should be fast enough to allow the retranscription of a course or the transcription of ideas, and of sufficient quality to be readable by the writer and by others.
2. Handwriting deficits
Despite correct learning and practice of handwriting, some children never master this skill to a sufficient level of automation (reviewed in [
10,
11,
12]. These handwriting deficits, referred to as developmental dysgraphia in children, have been defined as a written-language disorder that concerns mechanical writing skills in children of average intelligence and with no distinct neurological or perceptual-motor deficits [
13]. Currently, dysgraphia is not recognised as a disorder
per se by the Diagnostic and Statistical Manual of Mental Disorders, fifth edition (DSM-5) [
14] or the International Classification of Diseases 11th edition (ICD-11). The DSM-5 only mentions « deficits in the fine motricity required for handwriting », in the chapter dedicated to the development and evolution of learning disorders. Due to the diversity of methodological approaches and the absence of a consensual definition, the exact prevalence of dysgraphia is not known, and probably differs between countries and writing systems.
Dygraphia is generally found in association with neurodevelopmental disorders, namely dyslexia (DL), Developmental Coordination Disorder (DCD) and Attention Deficit Disorder/Hyperactivity Disorder (ADHD) [
15,
16,
17,
18,
19,
20]. Dysgraphia preferentially affects boys (3:1 ratio), most likely because of the prevalence of the associated disorders in boys [
10,
21]. Many studies have shown differences handwriting deficits depending on the associated disorder [
22,
23,
24,
25,
26,
27,
28]. DCD primarily affects handwriting quality [
24,
29,
30] while DL affects both speed and, to a lesser extent, handwriting quality [
28,
31]. Children with comorbid DL and DCD present nearly the same profile of difficulties as children with DL, although with a much higher within-group variability. Comorbidity seems to lead to the addition of DCD and DL writing difficulties but without aggravation of the deficits in each of the two dimensions [
26]
Given the central role of handwriting in the acquisition of other skills, these deficits can seriously hamper the acquisition of other skills [
32,
33,
34]. It has been shown that, at equal content, worst quotes are attributed to less legible school works [
35], resulting in a decrease in the child’s self-esteem. Dysgraphia may thus impact the academic success of the child if it is not diagnosed and handled early [
36,
37]. To this end, different tools are available to allow researchers and clinicians to analyze the two dimensions of handwriting: the final product of handwriting and the dynamic process of handwriting that generates the trace [
38].
Evaluating the handwriting product refers to the static, spatial features of the written trace. This kind of analysis is performed afterwards. This is the principle of many tests used in different countries (for a review see [
10]). The quality of the trace is evaluated based on different features such as letter size and form, spatial organization of handwriting on the paper sheet, margin, etc.
Evaluating the handwriting process refers to the analysis of dynamic, kinematic and temporal features of handwriting. Several type of variables can be analyzed, depending on the tools used for the evaluation: posture, finger and arm movements, pen grip and finger pressure on the pen, in-air and on-paper durations, pen velocity, pen pressure, etc. The increasing number of publications on the analysis of the handwriting process over the past years attests for the growing interest of researchers for this field (
e.g., [
39,
40,
41,
42].
The objective of this quasi-systematic review is to make a concise listing of the tools and methods which are the most reported in the literature for the analysis of handwriting and the diagnosis of dysgraphia. Tools focusing on both the final handwriting product and the handwriting process will be considered. We will then discuss the pros and cons of the existing tools, and the perspectives for the development of future tools.
3. Handwriting tools based on the product
In order to list the diagnosis tools based on the analysis of the handwriting product, we searched in the two scientific browsers PubMed and Google Scholar using the following keywords: Handwriting, Assessment, Test, Tool, Quality, Evaluation, Battery, Children, Students, and Questionnaire.
The tools meeting our search criteria are listed in
Table 1. We included only tools for which the following data were available: norms or age class, type of task, subdomain analyzed, criteria evaluated.
Although mainly designed for a developmental population (from the age of 5), some diagnosis tools can also be used in adults up to the age of 80 (QNST-3; [
57]). The test duration is variable, from a few minutes up to 30 minutes. This parameter is interesting because deficits may not be visible during the first few minutes of handwriting, but may appear during a continuous handwriting task, as it is the case in the classroom. The tasks used in the tests are of three main types: copying a text or a sentence, writing under dictation (letters, digits, words, or text), and spontaneous writing. These complementary tasks explore different aspects of handwriting. The copy task is the easiest, and can be used with beginner writers. Moreover, it resembles the condition of the classroom, where children are often asked to copy texts. However, the reading component can pose problems for children with dyslexia, introducing a possible bias in the interpretation of the test results. The dictation task is ecologic too, without the reading component, but the spelling processes and the orthographic components may again pose problems for children with dyslexia. Finally, the spontaneous writing task is likely to be the most relevant task. The difficulty here is the establishment of norms, since the texts produced are all unique. General criteria of legibility and quality are thus used in this case, which may provide a less fine-grained analysis of handwriting.
It should be noted that one test includes an analysis of texts produced at school, the TOLH (Test Of Legible Handwriting, [
59]). Two others include writing from memory: the ETCH-M (Evaluation Tool of Children’s Handwriting – Manuscript, [
50]) and the MMHAP (Mac Master Handwriting Assessment Protocol, [
54]). Two tests also add another level of analysis, thanks to two conditions in the copy task: normal speed and maximum speed (the DASH, [
48]). This approach is particularly interesting since it mimicks certain classroom conditions, and it is well known that adding constraints (temporal or spatial) during handwriting helps revealing handwriting deficits [
63,
64]. Combining different tasks and/or conditions can provide a fine and detailed analysis of handwriting. It is worth noting that although these tasks are complementary, only three tests involve all three types: the BVSCO-3 [
45], the ETCH-M [
50], and the MMHAP [
54].
The majority of the tests listed in
Table 1 analyze handwriting quality using different criteria such as legibility, letter form, spatial organization of letters or words, alignment, etc. Some tests also measure handwriting speed by evaluating the number of characters or letters (BHK, [
43]; French adaptation, [
65]; BHK-ado, [
44]; BVSCO-3, [
45] ; CHES-M, [
47]; ETCH-M, [
50]; EVEDP, [
51]; MMHAP, [
54]; MHA, [
55,
56]) or the number of words (DASH, [
48]; EVEDP, [
51]) produced in a fixed period of time. Since a universal, gold standard test for the diagnosis of dysgraphia is not available, it is sometimes necessary to combine several tests to perform an optimal clinical assessment. The DASH test appears to be the most complete one, since it includes various types of tasks, different constraints of writing and it requires about 15 minutes of writing. Its weakness is that it only evaluates handwriting speed.
Finally, we also mention in
Table 1 a couple of questionnaires, which can be interesting to use in complement to the other tests (HPSQ, [
61]; « questionnaire for children », [
62]). Indeed, these questionnaires provide information about the evaluation of handwriting quality by the teacher or the child himself, which can be useful in the perspective of a rehabilitation program.
Another important point to consider when choosing which test to use is the existence of standards. The
Table 2 presents the psychometric properties of the main tests used both in research and in clinical practice. A number of tests have relatively inter-rater and test-retest reliabilities (the French adaptation of the BHK for example, [
65]), while others reach high validity-related standards (the MHA, [
55], and the TOLH [
59] for example).
More recently, a few computerized diagnostic tools based on the analysis of the final product of handwriting have also been developed. They are listed in
Table 3.
These algorithms are all based on pattern recognition methods using images of letters, digits, words or sentences. They use a large database of images, from which characteristic features of « poor writing » are extracted and analyzed using machine learning approaches. The performances of computer tools are evaluated using a series of criteria. The precision, also called positive predictive value, is defined as the number of correct classifications of dysgraphic children divided by the total number of classifications. The sensitivity represents the true-positive detection rate (correct classification of children with dysgraphia). The specificity represents the true-negative detection rate (correct classification of typically developing children).
As shown in
Table 3, the performance of these classification tools is below that of the paper-and-pen tools listed above (73% for [
69], 79.7% for [
70]). The only exception is TestGraphia, the algorithm developed by Dimauro
et al. [
68], with good performances very closed to that of the original BHK test. It analyses the same criteria as the original BHK test [
66], but using scanned images of the BHK texts. The sensitivity of TestGraphia is 83%, and its specificity is 98%. This algorithm seems thus very promising for the future development of computerized diagnostic tools.
4. Handwriting tools based on the process
Collecting the spatio-temporal characteristics of a written trace has become possible thanks to the development of digital tablets. The principle is simple: the tablet records the x, y and sometimes z (up to 2 cm) positions of the pen with a high frequency (every 5 or 10 milliseconds), as well as the time, the pen pressure, and the angle of the pen with the tablet. From these data, a large variety of static (size, alignment…), kinematic (speed, acceleration, jerk…) and dynamic (pen pressure, pen tilt…) features can be calculated. To avoid the undesirable effects of loss of surface roughness (e.g., [
40]), a sheet of paper must be attached to the digital tablet and an ink pen compatible with the tablet must be used.
Over the last decades, a growing number of studies have focused on the development of tools for the diagnosis of dysgraphia using digital tablets. In this review, we present a non-exhaustive overview of these tools, which are not yet available to clinicians (
Table 4).
The different digital tools for the diagnosis of dysgraphia presented in
Table 4 combine dynamic, kinematic, and static features extracted from the handwritten tracks. These features are then analyzed mainly using
machine learning approaches to classify the data (i.e. classifiers). These tools differ by the nature of the tasks analyzed (handwriting or graphomotor tasks), the size of the dataset, and the computational approach used to analyze the data.
Of the 22 studies reported here, four use graphomotor tasks, the others use handwriting alone or a combination of handwriting and drawings. It is interesting to mention that several studies use tasks that have been validated in clinical practice, such as the BHK [
39,
71,
74,
80], the BVSCO2 [
76], or the Minnesota Handwriting Assessment (MHA, [
78]).
The size of the dataset used varies between 35 and 580 participants, and the children included in the different studies are between 5 and 15 years of age.
Nine studies use classical statistical comparisons to identify discriminative features between groups (in blue in
Table 4; [
39,
75,
76,
78,
79,
80,
81,
86,
89]. The others (in black in
Table 4) use different algorithms of
machine learning (
Random Forest,
Support Vector Machine,
Convolutional Neuron Network, etc.) to classify the children into different groups. These methods are called « supervised learning approaches » since the algorithm is trained to identify groups which were previously labeled. Most of the studies reported here present a simplistic classification of children in two groups: with or without dysgraphia. Only one study classifies the children into four groups: typically developing, with mild dysgraphia, with mean dysgraphia, and with severe dysgraphia [
87]. This approach is interesting since it considers dysgraphia as a
continuum of severity. This is probably closer to the reality than a dichotomic classification, as recently suggested by Lopez & Vaivre-Douret [
90], who described 3 levels of handwriting disorders in children from 1
st- to 5
th-grade: mild disorder, moderate disorder, and dysgraphia.
The tools based on the analysis of handwriting samples obtained the best classification performance. For example, Asselborn
et al. [
71] reached a sensitivity of 96.6% and a specificity of 99%, and Mekyska
et al. [
84] reached a sensitivity of 96%. It is worth noting, however, that the excellent performances obtained in [
71] must be considered with caution since they may be biased by the fact that the authors only included participants with severe dysgraphia [
91]. The most discriminative features between children with and without dysgraphia vary among the studies, but generally include a larger size in dysgraphic handwriting, numerous velocity variations, a lower mean speed, increased lift and stop duration, and variations in the pen angle with the tablet.
The tools based on the analysis of drawing samples appear promising too, although their performances are slightly lower than those based on handwriting. For instance, the algorithm developed by Mekyska
et al. [
85] obtained a sensitivity of 90%. The idea that a dysgraphia can be identified based on graphomotor tasks suggests that dysgraphia can be independent from higher order processes, namely linguistic ones. Developing diagnostic tools based on drawings is interesting for two reasons: these tools would be more universal since they are independent of the language and the alphabet, and they can be used with younger children to identify « at-risk » children, which could be handled earlier.
Developing a computer tool for the diagnosis of dysgraphia is not trivial, as attested by the variability in the performance of the tools presented in
Table 4. Several reasons can explain these differences. First, the variety of the tasks used and the number of participants leads to large differences in the size of the databases, which is a critical determinant in a classifier’s performances. Second, a large panel of
machine learning approaches were used, with a different number of features analyzed among studies. Although certain classification methods appear better than others (Random Forest for example), none reached excellent performances nowadays. Since the interest of researchers for these tools is growing, it seems obvious that their efficiency will rapidly be improved. To do so, however, a number of key elements will be important to consider. First, it will require the constitution of large databases of handwriting and drawing samples from children which are perfectly characterized from a clinical point of view. It will also be necessary to estimate the severity of dysgraphia, and not only provide a dichotomic classification of children with or without dysgraphia, as proposed by Sihwi
et al. [
87]. Moreover, other processes involved in handwriting, such as visuomotor aspects, which are currently investigated [
92], would be interesting to include in future diagnostic tools. Finally, it is also worth noting that diagnostic tools fully integrated into the pen and using
machine learning approaches are also under investigation [
93,
94,
95].
5. Perspectives: Towards a universal, standardized test of dysgraphia?
Although very promising, none of the computer and paper-and-pen tools presented above is fully satisfactory and sufficient to provide a completely reliable diagnosis of dysgraphia. In addition, most of the tools available to clinicians today do not give precise information about the specific handwriting difficulties of each child.
In this context, it appears interesting to think about developing a reliable, comprehensive and universal diagnostic tool for dysgraphia. Several important points need to be considered for the development of such an instrument. First, an “ideal” diagnostic tool will probably combine computer and paper-and-pen approaches, since they are complementary and provide distinct information on the writing process and product, respectively. A fully computerized tool could also be envisaged, provided that it is complemented by the assessment of the clinician, who must remain the reference assessor. Indeed, the spreading of tablets and the rapidity of computerized analyses could allow the collection of written samples in school or at the children’s house, which could then be sent to a clinician. Standard pen-and-paper tools could subsequently be used in case the computer tools detected a risk for dysgraphia in the child’s handwritten productions, in order to firmly confirm the diagnosis. In this perspective, the goal of the computer tools is thus not to replace the clinician and the existing, validated tests, but to help in screening larger populations of children and in facilitating the clinician diagnosis (
Figure 1). In addition, these tools provide valuable information on the process of handwriting himself, by identifying dynamic or kinematic features which may be altered in each particular child. This information would be very relevant for the clinician, since it would offer cues for an individualized rehabilitation of handwriting.
Second, using a combination of tasks targeting different skills seems crucial to provide more information about handwriting difficulties. Indeed, some children with dysgraphia may succeed certain tasks and thus be undiagnosed if only a single one is used. Combining different tasks in a unique test would thus greatly increase its efficacy, as previously suggested by Safarova
et al. [
96]. Namely, the test should include spontaneous handwriting, copying of words and/or sentences, writing to dictation, digits writing, writing under speed and accuracy constraints, and drawing and/or graphomotor tasks. Temporal (i.e. speed) or spatial (i.e. size) constraints add a cognitive load and are known to increase handwriting difficulties [
18,
63,
64]. With regards to the spontaneous production task, we could, for example, ask to write a 7-sentence text corresponding to the writer’s ideal weekly schedule. This would enable a specific analysis to me made of the days of the week, which would be common to all texts produced. As mentioned above, the addition of graphomotor and/or drawings tasks, which are language-independent, will enable to target younger children than with the existing tests and thus detect and handle earlier children “at-risk” of dysgraphia. In addition, it would provide a universal test, allowing comparisons between countries and alphabetic systems. In addition, the test needs to last at least 20 minutes in order to enhance the difficulty of the task and induce fatigue. Finally, completing the test by a self-questionnaire will enable the clinician to better characterize the difficulties experienced by the writer.
Thirdly, the choice of the cohort of participants will be crucial. A large developmental window ranging from 5 to at least 15 years old should be included, and the content of the test should be adapted depending on the age and/or class of the child, and the level of handwriting automation. The number of participants should be important enough to allow machine learning approaches. It would also be important to include children presenting a dysgraphia in various clinical contexts and precisely characterized from a clinical perspective. This would enable to evaluate the severity of dysgraphia, which could eventually be an additional evaluation criterion provided by the diagnostic tool. Finally, participants should be recruited in multiple sites representative of different socio-economic and educational statuses.
Developing such a complete diagnostic tool implies to collect large databases of handwriting and drawing samples in different places around the world. This would be possible by the implication of a consortium of laboratories and clinicians. Besides the diagnostic tool itself, the benefits of these developments would be twice: (i) from a clinical perspective, it would allow to estimate the prevalence of dysgraphia in different countries and it would further tailor rehabilitation programs to the characteristics of handwriting difficulties, and (ii) from a research perspective, it would provide large annotated databases that could be freely available to researchers working in the fields graphonomics, whether in educational, clinical, or human movement sciences.