4.1. Precision of GP-Canary Atlas
Our results show that the GP-Canary Atlas exhibits high Intra-rater precision in BA assessments. The ICC for evaluations by the radiology specialist (Rater 1) was nearly perfect, with ICCs of 0.995 for females and 0.996 for males. The General Practitioner (Rater 2) also demonstrated high precision, with ICCs of 0.990 for females and 0.992 for males. However, the medical student (Rater 3) showed slightly lower precision, with ICCs of 0.921 for females and 0.976 for males.
These findings align with previous research in pediatric populations from Anglo-Saxon countries. Hackman and Black (2012) [
37] reported an ICC of 0.969 for Scottish children, and Maggio et al. (2016) [
38] found ICCs of 0.970 for males and 0.972 for females in Australia. Similarly, high correlations were reported in Germany and the Netherlands, with Schmidt et al. (2007) [
39] finding an ICC of 0.96 for both genders and Van Rijn et al. (2001) [
40] reporting ICCs of 0.979 for males and 0.974 for females. Our results also slightly exceed those reported in Southern European countries. Santos et al. (2011) [
41] observed excellent Intra-rater agreement in Portugal, with ICCs of 0.99 for both boys and girls, while Pinchi et al. (2014) [
42] reported ICCs of 0.907 for males and 0.928 for females in Italy. However, Santoro et al. (2012) [
43] found moderate Intra-rater concordance in Southern Italy, with ICCs of 0.88 for males and 0.81 for females. Studies in the United States and Sweden have shown lower reliability, with Calfee et al. (2010) [
44] reporting an ICC of 0.890 in a Latin American sample, and Kullman (1995) [
45] finding ICCs ranging from 0.64 to 0.74 in Swedish teenagers.
Similar high Intra-rater agreements have been observed in African studies. Govender and Goodier (2018) [
33] in South Africa reported an ICC of 0.99, and Olaotse et al. (2023) [
46] in Botswana found ICCs of 0.97 for males and 0.98 for females. Dembetembe et al. (2012) [
47] observed moderate precision (r = 0.76) using the GP Atlas in Cape Town. Comparable agreements have been reported elsewhere, such as in Saudi Arabia, with Albaker et al. (2021) [
48] finding ICCs of 0.995 for males and 0.996 for females, and in Malaysia, with Nang et al. (2023) [
49] reporting ICCs of 0.947 for males and 0.933 for females.
The excellent Intra-rater agreement observed with both the GP-Canary Atlas and the GP Atlas can be attributed to the quick and direct visual comparisons they allow, facilitating efficient BA assessments across various pediatric populations. However, the slight variability observed when applying the GP-Canary Atlas may be due to individual cognitive biases and potential misinterpretations of the Atlas [
50,
51]. Biases such as anchoring, confirmation bias, experience-based bias, overconfidence, the availability heuristic, and the observer expectancy effect can impact the rater's judgment, resulting in inconsistencies and longer times for BA assessments [
52,
53]. Additionally, errors may occur due to the limited number of maturity indicators available for evaluation, especially when assessing young children. As a child grows, the number of ossification points increases, but when fewer points are present, the potential for assessment errors also becomes higher.
Our findings demonstrate a high level of agreement among different raters when using the GP-Canary Atlas for BA determinations. The concordance between the radiology specialist (Rater 1) and the General Practitioner (Rater 2) was remarkably high for both women (ICC = 0.982) and men (ICC = 0.944), indicating that both raters consistently produced similar BA assessments.
On the one hand, these results align with previous studies on the degree of agreement between two expert evaluators when determining the BA of children from Anglo-Saxon countries. For instance, Alshamrani et al. (2019) [
54] observed high agreement between two raters in a sample of British children aged 8.80 to 9.59 years. In northern Europe, Zabet et al. (2015)[
55] identified an excellent level of Inter-rater concordance among assessors in France (ICC = 0.94, 95% CI: 0.91-0.96, p < 0.05). Similarly, Calfee et al. (2010) [
44] found very high Inter-rater reliability (ICC = 0.982) in a study involving children from Washington, United States. Additionally, significant agreement among examiners was reported in Oceania, with a Cohen's kappa of 0.887 (p < 0.001) when the GP Atlas was used to assess BA in Western Australian children [
38]. On the other hand, in Africa, there was a remarkable similarity between the GP-Canary Atlas and GP Atlas Inter-rater agreement. Olaotse et al. (2023) [
46] reported that the degree of agreement between two expert raters in assessing BA in the Palapye region of Botswana reached an ICC of 0.94 for girls and 0.93 for boys.
However, the Inter-rater reliability significantly declines when comparing the scores assigned by Rater 1 and Rater 3, as well as Rater 2 and Rater 3. This results in a noticeable reduction in agreement for both girls (ICC = 0.463 and 0.509, respectively) and boys (ICC = 0.408 and 0.327, respectively). The significant decrease in concordance among evaluators with varying levels of experience suggests that less experienced raters might interpret the characteristics of the images differently or make errors when applying the scoring criteria of the GP-Canary Atlas. As reported in other radiological diagnostic tests, this lack of precision may be due to limited familiarity with the specific methodology [
56,
57,
58,
59]. This highlights the need for more comprehensive training and rigorous standardization of evaluation procedures to ensure that all raters, regardless of experience, apply the criteria consistently and accurately. Such measures are crucial for preventing inconsistent diagnostic decisions in clinical practice.
4.2. Accuracy of GP-Canary Atlas
In preschool-aged children, the GP-Canary Atlas underestimates BA for all genders, showing statistically significant differences (MD = 17.036 mos., p < 0.001). This level of underestimation is considerably greater than that reported in several European studies. For example, Martrille et al. (2023) [
60] found a significant underestimation using the GP Atlas in Caucasian children from southern France, with a mean difference (MD) of 1.27 mos. (SD = 1.56 mos., p < 0.05). Similarly, Santoro et al. (2012) [
43] reported underestimations in a southern Italian cohort, with MDs of 1.2 mos. for boys (SD = 15.6 mos., p = 0.18) and 4.8 mos. for girls (SD = 12.0 mos., p < 0.001). Also, Kullman (1995) [
45] also noted a smaller mean underestimation (MD = 4.8 mos.) in Swedish children.
The GP-Canary Atlas may lack accuracy in assessing BA in preschool-aged children due to several reasons. Firstly, during this period, known as "
turgor primus", children experience rapid and significant growth influenced by thyroid hormones, leading to high variability in ossification points that the Atlas evaluates. This variability makes it challenging to capture changes accurately. Secondly, the Atlas may not be well-designed to reflect the developmental changes influenced by genetic and familial factors [
61], rather than following a uniform pattern [
62]. Additionally, the presence of children with constitutional delay of growth and puberty (CDGP) could introduce further growth variations that the Atlas may not capture due to insufficient calibration [
63]. Finally, errors in reference PA-HW radiographs may also contribute to the lack of accuracy, suggesting that the Atlas might not be reliable for evaluating BA in preschool children.
For school-aged children, the GP-Canary Atlas also underestimates BA but less than that observed in preschoolers, with a mean difference (MD) of 8.165 mos. (p < 0.001). The Atlas is more accurate for school-aged girls than boys, largely due to a significant reduction in the underestimation of BA for girls (MD = 3.949 mos., p = 0.239). This closer alignment with the Atlas's developmental stages leads to measurements that are within the normal accuracy range established for this age group, which is up to 12 mos. [
2].
During this age, alternating changes in bone maturation, such as increases in length "
proceritas prima" and "
proceritas secunda" and weight "
turgor secundus", occur. In girls, these changes may be accelerated by early puberty [
64,
65], which is often associated with lifestyle factors, exposure to endocrine disruptors, or genetic determinants [
66,
67]. Obesity is a significant factor contributing to early puberty, particularly among Hispanic girls [
68,
69,
70]. The girls in our sample have a high body mass index (BMI = 22.76, SD = 5.49), indicating obesity, which likely leads to an earlier onset of puberty and adolescence. In this phase, bone maturation becomes more regular and standardized compared to preschool children, resulting in earlier developmental stages for girls than boys. This reduces individual differences in bone growth patterns, making them more consistent and predictable, thereby allowing the GP-Canary Atlas to provide more accurate assessments of BA in girls compared to boys.
With respect to teenagers, it has been demonstrated that GP-Canary Atlas increases its accuracy as children mature. However the Atlas slightly overestimates BA with a mean difference (MD) of 3.159 mos. overall (p = 0.823), 4.497 mos. for girls (p = 0.980), and 4.85 mos. for boys (p = 0.095), though these overestimations are not statistically significant. This trend is consistent with studies in geographically similar regions, such as Portugal and Spain, where the GP Atlas also showed progressive overestimation of BA. For instance, Santos et al. [
41] reported an increasing MD from 2 to 7 mos. in Portuguese adolescents, while comparisons with the Spanish-adapted Ebrí method showed overestimations from 5 to 6.5 mos. (both p < 0.05) [
71].
Other studies across Europe, including in Lower Saxony, Germany (Schmidt et al., 2007) [
39], and the Loire Valley, France (Zabet et al., 2015) [
55], demonstrated similar overestimations of BA in teenagers when used GP Atlas, with MDs ranging from 2.29 to 5.8 mos. (all p < 0.05). In Anglo-Saxon countries, Hackman and Black [
37] found BA overestimations from 1.62 to 11.05 mos. in adolescents aged 13 to 14 years in the northern UK (p < 0.05), while Paxton et al. [
34] observed an underestimation of 0.81 mos. in early childhood (p = 0.719) but a significant overestimation of 3.8 mos. in adolescence (p = 0.001) in Caucasian Australian children. Similar trends have been reported in Middle Eastern countries. For example, Soudack et al. (2012) [
72] found significant underestimations in Israeli Caucasian children across various age groups: 6-10 years (MD = 2.3 mos., p < 0.0001), 10-15 years (MD = 5.4 mos., p < 0.0001), and 15-18 years (MD = 3.7 mos., p < 0.0001). However, a slight overestimation was noted in those over 18 years (MD = 2.9 mos., p = 0.0043). Similarly, Cantekin et al. (2012) [
73] reported comparable results in Turkish Caucasian children, with underestimations of 1.32 to 5.76 mos. (p < 0.05) in the 7-10 years age group and overestimations up to 9 mos. (p < 0.05) in the 10-17 years age group.
Furthermore, GP-Canary Atlas appears more accurate from puberty onwards compared to children from the nearby African continent. However, due to limited data, direct comparisons with North African regions influenced by the Berber ethnic group, the ancestors of the Canary Islands' Guanches, are not possible. In other African countries, Tsehay et al. (2017) [
74] reported that the GP Atlas overestimated BA in children aged 10 to 22 years, with a mean difference (MD) of 8.7 mos. for males and 11.8 mos. for females (both p < 0.05). Similarly, Olaotse et al. (2023) [
46] found overestimations in Botswana ranging from 3 mos. in early bone development to 11.2 mos. for adolescents aged 15 to 18 years (p < 0.05). Kowo-Nyakoko et al. (2023) [
75] also reported that the GP Atlas overestimates BA by approximately 9.12 mos. in peripubertal children in Zimbabwe.
At this period of development, the GP-Canary Atlas shows increased accuracy in predicting BA for both boys and girls during the final phase of development, known as "
turgor tertius," which is characterized by rapid growth and hormonal changes driven by sex steroids. This is followed by the "
post-pubertal period" or "
internubil-puberal of Godin" marked by the closure of the epiphyseal growth plates, indicating the end of bone growth and the attainment of full skeletal maturity. In this phase of childhood, the differences in development between boys and girls decrease, leading to more synchronized and predictable maturation patterns. The GP-Canary Atlas captures this synchronization, as evidenced by the mean differences in BA for females (MD = -4.497 mos.) and males (MD = -4.85 mos.). These consistent changes allow the Atlas to predict BA accurately within the normal range of up to 24 mos. [
2], making it a reliable tool for assessing BA in adolescents across genders.