Participants
We tested 26 children with ASD and 19 NT children (see
Table 1 for a summary of participant profiles). All eligibility criteria were determined through a literature review and pilot testing. Details of the recruitment criteria are summarized in
Table 2. Recruitment and research procedures were approved by the Institutional Review Board of the Shanghai Mental Health Center in accordance with the Declaration of Helsinki. Fully informed parents or guardians of the young children were required to provide written consent. Financial compensation was provided for participation.
The two groups of children were matched for sex composition, mean chronological age, attention, and handedness (all right-handed). Most participants of the ASD group were outpatients from the Child and Adolescent Psychiatry of Shanghai Mental Health Center, and some were students or members from partner institutions. Participants from partner institutions were diagnosed at Grade III Level A hospitals between 3 and 7 years of age. To investigate SPIN abilities and the effects of HAT use among children with autism before auditory and cognitive stabilization, we recruited participants aged between 6 and 12 years. The working definition of auditory and cognitive stabilization here refers to the status of having adult-like auditory skills and cognitive abilities and maintaining generally consistent auditory and cognitive processing as a child grows older. According to previous behavioral and neurological studies (e.g., Edgar et al., 2020; Nettelbeck & Burns, 2010), children or adolescents aged above 12 years typically exhibit relatively stable and adult-like performance in both cognitive and auditory processing tasks. As dialectal influence could be an important factor, we made sure that the language background across participants or between the two groups were comparable with most participant families from Shanghai and some from neighboring cities speaking variants of the Wu dialect that are similar in phonological features and mutually intelligible to a great extent.
All participants with ASD were further confirmed to meet the latest diagnostic criteria, the Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-Ⅴ, American Psychiatric Association, 2013) by two veteran pediatric neurologists at the Child and Adolescent Psychiatry of Shanghai Mental Health Center, who are unrelated to the present study. The current diagnostical procedure could not utilize standardized diagnostic instruments widely applied in English-speaking populations, e.g., the Autism Diagnostic Observation Schedule (ADOS; Lord et al., 2000; Lord et al., 2012; Lord et al., 1989) and the Autism Diagnostic Interview–Revised (ADI-R; Lord et al., 1994) due to a series of practical issues, including no officially validated Chinese version, limited application, a lack of normative data, and a small number of certified examiners (Huang et al., 2012; Sun, Allison, Auyeung, et al., 2013; Sun, Allison, Matthews, et al., 2013). Instead, the Chinese versions of Autism Behavior Checklist (ABC; Krug et al., 1980) and Childhood Autism Rating Scale (CARS; Schopler et al., 1980) were jointly adopted to confirm the diagnosis and assess the severity level. In mainland China, both ABC and CARS have long been validated and widely adopted for diagnosing ASD in both research and clinical practice (e.g., Li et al., 2005; Li et al., 2018; Lu et al., 2004; Shan et al., 2021; Yang et al., 1993). Parents or relatives living with the child for at least three years filled out the ABC under the guidance of two licensed psychologists on the research team. All ASD participants’ scores were higher than 68. As suggested by Krug et al. (1980), total scores above 68 points indicate a high probability of autism. Meanwhile, the two pediatric neurologists, who are trained and qualified CARS examiners, accomplished the evaluation based on behavioral observations of children during other testing, inquiry of their past medical and/or developmental history, and reports of their parents and/or relatives. According to the CARS manual, a score at or above 30 strongly indicates the existence of ASD; more specifically, a score from 30 to 37 suggests mild to moderate symptoms, whereas a score from 38 to 60 suggests severe autism. In the current study, only one participant (Number 6) was characterized by severe symptoms with a total score of 40, while the rest had scores between 30 and 36.
All participants had an IQ above 70 and could repeat simple sentences. Their IQ was assessed by two certified examiners using the age-appropriate Wechsler Intelligence Scale for Children 4th edition-Chinese version (WISC-IV-Chinese) short form (Zhang, 2009), which is commonly used in clinical practice among children with neurodevelopmental conditions owing to its implementation efficiency and estimation accuracy (Hrabok et al., 2014). Noticeably, IQ differed significantly between the two groups and was included as a covariate in the statistical analyses.
Two subsets (inattention and hyperactivity/impulsivity) of the Chinese version of the Swanson, Nolan, and Pelham version IV scale (SNAP-IV) reported by parents (Gau et al., 2009) were used to screen for behavioral symptoms of attention-deficit/hyperactivity disorder (ADHD). Each subset includes nine items rated on a 4-point Likert scale (0 = never, 1 = occasionally, 2 = often, and 3 = very often). Subset scores below 13 indicate no clinically significant attention issues. None of the groups showed significant differences or clinically significant ADHD symptoms.
Pure tone audiometry was conducted in a separate sound booth with a noise level lower than 30 dB A to test children's audibility ≤20 dB HL across 0.25 to 8 kHz bilaterally under the guidance of a certified audiologist through an Inventis Bell Plus diagnostic audiometer calibrated following ANSI S3.6 standards (American National Standards Institute, 1996). There was no medical history of cerebral injuries, visual impairments, hearing loss, comorbid diseases, or psychotropic medication within three months before participation. Comorbid diseases here refer to co-occurring medical conditions restricted to neurological disorders, such as severe somatic complications, Fragile X syndrome, Down syndrome, and epilepsy, with symptoms that are physically apparent and distinct from ASD and would potentially make the experimental design very difficult to implement. Two participants (Number: 6, 18) were confirmed with ASD-associated mental health conditions (see Appendix 2 for details), according to parents’ report of medical history or clinical judgments of the mentioned two pediatric neurologists.
Target Measures
This study included two SPIN experiments on children and a parental/teacher survey questionnaire. For the SPIN tests, two types were implemented: adaptive and fixed-level. All tests were conducted using a custom program on a ThinkPad X1 Carbon Gen 7 laptop in laboratory rooms with a background noise below 35 dBA. The test stimuli were presented via an M-Audio M-Track 2 × 2 sound card and a pair of M-Audio BX8 D2 studio monitor loudspeakers. In line with previous studies, speakers were sited at 0° azimuth (speech signal) and 180° azimuth (noise) at head level at a distance of one meter from the participant to simulate an ideal class scenario in which the participant was seated in the front row, listening to a teacher speaking in the front with distractions from behind (
Figure 1).
Because few studies had tested SPIN in young Chinese children with autism, several considerations were taken to adopt appropriate materials and tasks to ensure feasibility and validity. The Mandarin Hearing in Noise Test for Children (MHINT-C, Chen & Wong, 2020) was used for both types of tests. It consists of 12 highly equivalent test lists, with the mean deviation of each list in the SRTs within ±1 dB. Every list includes ten simple sentences, each of which contains ten characters (e.g., “爸爸带回来一个大西瓜。”
1) and is presented in a male voice with the same 2-s duration. During the SPIN tests, participants were required to repeat each sentence as accurately as possible after hearing a prompt tone; accordingly, the experimenter listened to each word in the sentence and clicked right or wrong in the program. As a preliminary exploration, this study chose a speech-spectrum-shaped steady-state masker matching each target sentence's long-term average spectrum as background noise. From a neural decoding perspective, disembedding speech signals from speech-shaped steady-state noise is relatively simple, as these two stimuli differ greatly in spectro-temporal features (i.e., the spectrogram of speech is dynamic, both temporally and spectrally, whereas that of the masker is not). For young children whose sensory perception and linguistic skills are yet to mature, maskers involving complex acoustic features and speech or speech-like elements are rather demanding (Maamor & Billings, 2017), which would be aggravated by speech-specific difficulty in most children with ASD. As the participants in the current study were at a relatively young age, adopting a steady-state masker would be less of a concern for subject attrition/dropout that could arise from potential emotional reactions when facing adverse listening conditions. Another consideration for choosing the steady-state masker was to compare the results with those from older Chinese children with ASD in steady-state noise in a previous study (Yu et al., 2021).
While the steady-state masker was relatively easy to implement and test, it has its drawback on the ground of ecological validity. What children experience in everyday life, particularly at school, is much more complex and generally mixed with informational masking. In prior studies on SRTs, the Bamford-Kowal-Bench Speech-in-Noise test (BKB-SIN, 2005), a sentence-level SPIN test similar to MHINT-C, documented significant enhancements among children with ASD and other auditory disorders with the aid of the HAT (Schafer et al., 2013; Schafer, Traber, et al., 2014; Schafer et al., 2016). Compared to BKB-SIN, MHINT-C has a slightly lower test-retest reliability for SRT measures with a 95% confidence interval of 2.8 dB (1.8 dB for BKB-SIN) but best suits the age group in the current study among all available tests of the same type. Importantly, previous BKB-SIN studies generally adopted multi-talker babble rather than steady-state noise we used, thereby more involving the effect of informational masking in addition to energetic masking.
In our study, participants took two adaptive tests in noise without HAT use, with the results averaged as the final SRT. The starting presentation level was set at -5 dB SNR with a steady-state masker fixed at 60 dB A, which was calibrated using a sound level meter (Aihua AWA6228+) and a default calibration track. Based on the participants’ responses, the signal volume was adaptively altered to achieve different SNRs. The first four sentences were conducted with a 4-dB step size, while the rest with a 2-dB step size. The SNR presentation levels from Number five to Number ten sentences were averaged to obtain the participants' SRTs. In addition, fixed-level accuracy tests were supplemented to provide a comprehensive profile of the participants' SPIN abilities and the effects of HAT use with knowledge of the baseline situation. Participants completed four fixed-level tests wearing the receiver of the HAT on the right ear, with the microphone plus transmitter attached to the monitor at 0° azimuth (
Figure 1). A unilateral fitting was chosen because it could allow the user to have one ear open for surrounding sounds (Feldman et al., 2022), and bilateral fitting would be inappropriate for listeners with normal hearing bilaterally (Tharpe et al., 2004). Following a recent HAT study from Feldman et al. (2022), we also fit the receiver on the right ear. It has been shown that the maturation of SPIN abilities are more advanced on children’s right ear, with the SPIN abilities becoming adult-like on the right ear by 10 years and on the left ear by around 13–14 years (Chandni et al., 2020). The order of the 2 (Noise/Quiet) × 2 (HAT-on/HAT-off) testing conditions was counterbalanced across participants and test sessions. In noisy conditions, 0 dB SNR (intensity of speech signal equal to the noise power) was chosen based on a pilot study where 5 dB SNR, 0 dB SNR, and - 5 dB SNR were examined, respectively, among seven children with ASD. Significant differences were detected between the HAT-on and HAT-off sessions at 0 dB SNR and -5 dB SNR, but the latter appeared too challenging for some autistic children to bear and cooperate.
For the questionnaire, the ASD group’s listening difficulty under various circumstances at school was assessed by their parents/teachers using the Children's Auditory Performance Scale (CHAPS, Smoski et al., 1998) before and after a ten-day trial period of HAT use. In previous studies, CHAPS recorded less listening difficulty in sum scores for all conditions among children with ASD after an at-school trial of HAT (Schafer et al., 2013; Schafer et al., 2016). The Chinese translation version of CHAPS was generated by two graduate students from the Department of Translation and Interpreting at Shanghai Jiao Tong University. Examples of test items are listed in the Appendix 1. The CHAPS uses a 7-point Likert scale from +1 (less difficulty) to -5 (cannot function at all) for 36 items to discern behaviors related to auditory processing disorders among children above six years of age in six conditions (Auditory Attention Span/Auditory Memory Sequencing/ Ideal/ Multiple Inputs/ Noise/ Quiet) with reference to NT peers. All conditions of CHAPS were conducted in the current study because each of them represents a common scenario in daily life, which could comprehensively contextualize autistic children’s SPIN abilities and device efficacy across various situations.
Data Analysis
All statistical analyses were conducted in R version 3.6.3 (R Core Team, 2020). The SPIN tests yielded two types of data: SRTs and accuracy rates. This study defined SRTs as the lowest SNRs in which participants recognized 50% of the speech signals. Lower SRTs indicate better signal recognition from concurrent noise. The SRTs of the two groups were compared using the Mann-Whitney U test. A linear mixed-effects model (LMM) was adopted to analyze accuracy rates because of its strength in fitting non-independent and repeated-measures data. Before modeling, raw accuracy data were converted into rationalized arcsine units (RAU; Studebaker et al., 1995) to reduce potential ceiling effects, given the participants’ high accuracy rates in this study. We constructed two LMMs using the lmerTest package (Kuznetsova et al., 2017) to examine (1) the effect of listening condition (Noise vs. Quiet) on two groups’ sentence recognition accuracy without the HAT use, and (2) the effect of HAT use (Off vs. On) on two groups’ sentence-in-noise recognition accuracy. The first LMM on HAT-off accuracy (then referred to as the baseline LMM) started with a random-effect term subject, three fixed-effect terms group (NT vs. ASD), listening condition (quiet vs. noise), IQ (continuous), and all interaction terms. The second LMM for analyzing the effect of HAT (then referred to as the HAT LMM) on SPIN accuracy included Group (NT vs. ASD), HAT use (HAT-off vs. HAT-on), IQ (continuous), and possible interactions, with Subject as a random effect. Parsimonious models were determined by AIC in a stepwise algorithm using the step function from the stats package and visually inspected for no obvious deviations from homoscedasticity or normality through residual plots. Post hoc tests were implemented using the emmeans or emtrends function from the emmeans package (Lenth, 2020).
The CHAPS questionnaires provided children’s sum scores for difficulty rating and the number of children rated at risk for each listening condition before and after HAT use. The sum scores for each condition were obtained by dividing the total condition score (i.e., the sum of the circled responses for each condition) by the number of condition items (Noise: 7, Quiet: 7, Ideal: 3, Multiple Inputs: 3, Auditory Memory Sequencing: 8, Auditory Attention Span: 8). A sum score of -1 or lower implies a risk of having significant difficulties in a certain area of auditory processing. However, the results of the “at-risk” numbers in the current study need to be interpreted with caution because the cut-off score has not been validated in Chinese norms. The sum scores were analyzed using the Wilcoxon-signed-rank test on paired samples, whereas the number of children at risk before and after device use was compared using Fisher's exact test.