1. Introduction
Reviewing survey errors is essential for maintaining the accuracy of self-administered questionnaire data. This process helps identify and minimize errors like sampling, nonresponse, and measurement issues specific to self-administered questionnaires (Ziegel 1990). By addressing these errors, researchers can enhance the reliability and validity of future survey results.
While ensuring data quality is essential in any research project, it becomes especially challenging when surveying adolescents. They are a heterogeneous group, with some being forthcoming and others being rebellious or unwilling to answer personal questions. In addition, adolescents tend to be impatient and restless, especially when surveyed in groups. These factors can complicate the survey process and potentially compromise data quality and the independence of individual responses. Thus, adequate supervision might identify and mitigate any sources of bias that may arise in order to ensure accurate and reliable results.
Schools are key institutions in adolescent life and an ideal setting for research dealing with the topic (Hatch et al. 2023). A typical approach to survey adolescents is conducting standardized surveys in classrooms with teachers or external supervisors ensuring consistency of the process (Hallfors et al. 2000; Lucia et al. 2007; Alibali & Nathan 2010, Bartlett et al. 2017, March et al. 2022). Classrooms are especially convenient for several reasons. Working with schools is time-saving because we can interview whole classes at once and don’t have to visit homes (Walser & Killias 2012) or wait for postal answers. Surveys conducted in schools are considered more valid compared to surveys taking place at home (Kann et al. 2002). Brener et al. (2006), for instance, found a 25% lower likelihood of reporting offending behaviour when collecting data in homes rather than schools. Cops et al. (2016) also found significantly higher prevalence of delinquency when youths were filling out questionnaires in school versus a mail survey design. Some topics or target groups are difficult to reach, recruitment via public schools is an efficient strategy to reach representative populations (Ellonen et al. 2023). Others point out that research in schools add the risk of selection bias on the school level, depending on which schools are selected for the sample (Newransky et al. 2020).
Given successful recruitment of schools and classrooms, teachers can be pivotal for a survey’s success: parents trust them (parental consent, unit nonresponse), students rely on their expertise (item nonresponse, validity) and are expected to respect their authority (unit & item nonresponse, validity). However, relying on them to convey a scientific survey might bias results, because of (possibly even unnoticed) influence on the students’ response behaviour (see Strange et al. 2003, Heath et al. 2013; Rasberry et al. 2020). Many questionnaires contain questions about delinquency, drug abuse, school, relationships to adults in school and further topics that are prone to social desirability bias. The presence of external supervisors may potentially have an impact, as suggested by Bidonde et al. (2023); they might emphasize the significance of maintaining neutrality among the adults present, either through verbal communication or simply by being physically present.
Many studies examining with effects of the presence of third parties focus on face-to-face or telephone interviews (e.g. Deakin & Wakefield 2014, Weller 2017, Hennessey et al. 2022, Goh & Binte Rafie 2023). Much research on self-administered survey research is evaluating methods of (mental) health of adolescents (e.g. Kann et al. 2002; Brener et al. 2006; Raat et al. 2007; Rasberry et al. 2020; Bidonde et al. 2023; Ellonen et al. 2023; Hatch et al.) or criminology and delinquency of young adults (e.g. Lucia et al. 2007; Walser & Killias 2012; Kivivuori et al. 2013; Cops et al. 2016; Gomes et al. 2019). However, meta-studies still demand more systematic evaluation of the effects of supervision on response behaviour (Gomes et al. 2019; Bidonde et al. 2023).
Although there is extensive research dealing with mode effects, additional research with controlled designs is necessary to find effective methods and strategies that ensure satisfactory response rates among adolescents in mental health surveys (Bidonde et al. 2023) and to find out how different settings might affect response behaviour (Gomes et al. 2019).
Given the sensitive environment of schools and the difficult task to get qualitatively good data from minors, social science must constantly evaluate its methods (Newransky et al. 2020; Hatch et al 2023). Plus, the many challenges of survey research in this setting and frequently low research budgets call for justified decisions on how to carry out the aspired survey (Walser & Killias 2012). Two questions arise: is teacher supervision a threat to data quality, and is investment external supervision necessary?
To examine effects of different modes of supervision during standardized survey research on data quality, this publication utilizes dataset resulting from a quasi-experimental study design. The survey study under investigation is based in Germany’s largest metropolitan area, testing different forms of digital, hybrid and in-person supervision. UWE („Umwelt, Wohlbefinden und Entwicklung“ = “Environment, Well-Being and Development”) is a classroom-based, repeated cross-sectional study. It serves multiple purposes, among them sociological research of adolescent life, but more importantly it shall empower youths to have their voices heard by school and municipality officials. Aim of this publication is testing how item-nonresponse, interview duration, drop-out-rate and response patterns differ between groups, depending on their supervision. Results could help researchers surveying youths and adolescents to decide whether they allow confidants or other adults to be present during (group-) interviews, if they can rely on teachers alone when surveying classrooms, and if it is cost-efficient to send out external supervisors for classroom sessions.
1.1. Previous Research
Researchers surveying young people during the pandemic have faced similar challenges all over the world (e.g., Gassman-Pines et al. 2020; Goh & Binte Rafie 2023) and have found similar workarounds, although most of them had to start from scratch. Until very recently, there has not been much published research on the effects of lockdown-workarounds on survey data quality. As the term “workaround” suggests, much of the conducted research during that time, the present study included, used ad-hoc methods, fitting individual needs and available infrastructure. Assuming the researchers in the field had profound knowledge of survey methodology and put it to good use when developing work arounds, scholars can now benefit from the resulting pioneer work.
The following section will present previous research on third party effects, effects on external supervision and relevant literature on survey research in general, that is relevant when surveying youths and adolescents with standardized questionnaires in classrooms, using different modes of supervision.
1.2. Supervision during Self-Administered Surveys
In theory, a well-designed questionnaire should not require any supervision at all, but evidence on data quality is not unambiguous. For instance, Felderer et al. (2019) found that web surveys tend to have higher nonresponse rates than surveys led by interviewers. Their results contradict the findings of Mühlböck et al. (2017), who investigated self-completion surveys among young adults and found no significant differences in response behaviour between web surveys without supervision and modes with interviewers present – although completion rates were higher in the latter group. However, self-administration has been shown to be less prone to response bias (Felderer et al. 2019). Atkeson et al. (2014) argue that the presence of an interviewer can alter response patterns on questions that pertain to an individual’s personal beliefs, attitudes, or experiences. Bidonde et al. (2023) found that response rates of adolescents in studies of mental health can indeed vary with survey mode, consent type, and incentives used. Overall, it seemed that when there is any kind of supervision involved, response rates were at least slightly higher. The results of Cops et al. (2016) suggest that the mode of administration can impact response behaviour on different levels. Survey mode influence extends to both individuals’ likelihood to participate in the study, eliciting selection bias, and the potential for differential tendencies to report criminal behaviour among participating individuals, prompting measurement bias. Thus, variations in the prevalence estimates of criminal behaviour between studies arise from differences in the participating population, as well as potential effects related to the setting or anonymity. Since young people are considered less likely to participate in surveys compared to older people (Ziegel 1990), a closer look into potentially enhancing methods is worthwhile.
1.3. Effects of Teachers’ Presence
Recruiting and surveying respondents in schools is a rather inexpensive and an efficient way to achieve high response rates (Heath et al. 2009: 146, Alibali & Nathan 2010, Bartlett et al. 2017, March et al. 2022) and representative samples (Ellonen et al. 2023). An important prerequisite for successful research involving schools and school personnel is to tailor survey research designs to schools’ needs (Hatch et al. 2023). However, the study presented by Rasberry et al. (2020) shows how difficult survey research in schools can be. Their study is prepared and conducted carefully, and every pitfall seems to be anticipated, in the field they still struggled with teachers failing to monitor the survey or unprecedented issues with post-survey logistics.
When surveys take place in schools, it is common practise to work with teachers as “assistant supervisors”, as they are already figures of authority in the target groups and can usually handle group dynamics and disruptions. Although their ability to do so may vary (March et al. 2022), this should increase data quality in terms of validity and completion rates (Bidonde et al. 2023).
A typical notion in the literature is that teachers are important or at least very helpful to get the survey infrastructure setup (e.g., in Strange et al. 2003; Heath et al. 2009). They can help keeping respondents’ motivation up and help retrieve basic information, such as zip codes. These are typically used to obtain small-scale data necessary to localise neighbourhoods that require municipal attention. Retrospective data might be more accurate, when a confidant can help remembering things. Teachers can provide valuable information and insights about academic abilities and behaviour. All the above is also noted in the protocols of the UWE survey. In conclusion, there are arguments speaking in favour of teachers facilitating the survey process:
Hypothesis 1a: When conducting standardized survey research in classrooms, the presence of teachers increases data quality.
When conducting or assisting with a survey in their classroom, teachers may find themselves in a unique position where they need to balance two roles. They act as supervisors but are a third party at the same time. Yet, none of these attributions fits perfectly and they are also ambiguous in their potential effects. In their role as supervisors, they are handing out questionnaires or online survey links and are the primary source of information on comprehension questions. In many cases, they also read an introduction. From the researcher’s perspective however, they are considered as a third party. Primarily because they have a personal relationship to the respondents jeopardizing their neutrality. In addition, they usually don’t know the questionnaire at all. In Demkowicz et al. (2020), students reported that their teachers were not very helpful when asked comprehension questions. Before the survey takes place in their classroom, teachers have not been trained as interviewers in most cases, although Hatch et al. (2023) highly recommend that. Depending on their seniority, they might or might not have been part of research in school (March et al. 2022). Hence, it cannot be assumed that they are aware of how their assistance can affect the accuracy of responses. Rasberry et al. (2020) even report teachers filling out the survey themselves, discussing survey questions or reading sensitive questions out loud, potentially preventing students from answering truthfully.
The literature knows further sources of potential influence teachers or other adults can have on young people’s response behaviour. An obvious one is social desirability bias. Social desirability bias is the tendency of respondents to provide answers that are socially acceptable rather than their true opinions or behaviours, which is more likely to occur in interviewer-administered surveys (Atkeson et al 2014). Adolescents may be particularly prone to social desirability bias because they are often highly influenced by their peers and social norms (Ziegel 1990; Brown & Larson 2009). Cops et al. (2016) found that self-reported delinquency was significantly lower when youths were supervised by adults close to them, which implies that third parties should be avoided and surveys with sensitive questions should rather not take place at home. As Tourangeau and Smith (1996) found in an early review of studies using the computer assisted personal interview (CAPI) approach, social desirability bias can be reduced by using self-administered questionnaires or computer-assisted self-interviewing, which can increase anonymity and decrease social pressure to conform.
In contrast, the mere presence of teachers might increase the latter, since they are authority figures (Möhring & Schlütz 2019, 49). They establish the typical classroom-atmosphere and thus, respondents likely find themselves in their social role as students. When asked how they feel in this role, the presence of involved teaching personnel likely influences their response behaviour. The relationship between students and teachers could also increase item non-response: Strange et al. (2003) report that students were hesitant when asked about personal information by their own teachers.
Duncan and Magnuson (2013) discuss potential bias arising from the presence of adults during surveys in school settings due to differences in socioeconomic status. This might be especially relevant in this particular case: Teachers in Germany are highly qualified personnel and thus have a high socio-economic status. The survey analysed in this study was conducted in two of the poorest cities in Germany, so there is a high share of youths from materially deprived households.
The difference between a survey and an exam can be difficult to internalize for both students and teachers. A frequent observation from the protocols of UWE is that teachers tend to rush things. Of course we want respondents to take their time, teachers however are used to tight schedules and like to get things done in time. This may partly stem from the practical necessity for schools to efficiently manage their staff resources, given that they are frequently understaffed, leading them to be cautious about allocating excessive time for survey projects (March et al. 2022). Moreover, surveys are often perceived as additional work by teachers, which is not wrong (ibid.). Alibali and Nathan (2010) as well as Hatch et al. (2023) strongly recommend being prepared for that and being patient with schools and teaching staff as their time is limited.
The problem is recognized and not easily surmountable, so we need to assess whether it has adverse consequences. Strange et al. (2003) showed that when strict time limits are established, those with literacy problems show higher drop-out rates, and consequently students from lower social classes were less likely to complete their survey. Gummer and Roßmann (2015) found that longer interview duration is related to higher motivation among respondents – strict time limits might dampen this motivation or at least eradicate its positive impact on data quality (e.g. item-nonresponse). Following these arguments against teachers’ presence, a counterhypothesis challenging my first one would be:
Hypothesis 1b: When conducting standardized survey research in classrooms, the presence of teachers reduces data quality.
1.4. Effects of External Supervision
The obvious solution to this dilemma would be to educate teachers on the science of survey research, like Hatch et al. (2023) did. Unfortunately, this can be very difficult to pull off for various reasons, with time being the most pressing one. A potential remedy is the presence of a third, neutral party that is familiar with the pitfalls of conducting surveys. Trained interviewers can intervene when teachers are unintentionally biasing responses and help with comprehension questions. Communication between members of the research team can also help teachers and respondents to understand the purpose of the study, resulting in higher motivation among all parties involved (March et al. 2022).
Demkowicz et al. (2020) report that respondents were motivated to complete their questionnaire because they were aware that their participation might be helpful to others in the future. Reflecting on their own lives and well-being has been seen to be helpful to children and young people as well, again increasing motivation to complete the questionnaire. Those who are actively engaged in the projects associated with the survey (e.g. researchers acting as supervisors or even hired, but trained staff) are more likely to effectively convey to respondents the significant impact their participation can have on the project’s success and its subsequent benefits for youths and adolescents.
Hypothesis 2a: When conducting standardized survey research in classrooms, the presence of external supervisors increases data quality.
According to Epstein (1998), adults who work in a school environment can be perceived as "another teacher" by both students and teachers. This can have an impact on the objectivity of interviewers and may also influence how young people respond. Or if it doesn’t, sending in supervisors might not even worth the effort, which cannot be underestimated (see Walser and Killias 2012, Bartlett et al. 2017; March et al. 2022). Strange et al. (2003) found that when surveys were administered to students either by a teacher or a researcher, there was no significant difference in the likelihood of students completing the questionnaire. Walser and Killias (2012) and Kivivuori et al. (2013) dealt with highly sensitive questionnaires and also found little significant differences in response behaviour of juveniles supervised by teachers versus external supervisors. The opposing hypothesis must therefore be:
Hypothesis 2b: When conducting standardized survey research in classrooms, the presence of external supervisors does not increase data quality.
Demkowicz et al. (2020) discuss the role of non-teaching staff in creating an environment that deviates from the typical classroom atmosphere and empowers respondents with a choice regarding their participation. Ethically, it is highly preferable for respondents to be aware that survey participation is voluntary. Moreover, informed consent can significantly enhance completion rates (ibid.). However, participation rates can also benefit from what Demkowicz et al. refer to as a ’fait accompli’ scenario (2020: 11). This refers to situations where parents and teachers of the respondents have already agreed to the surveys, and the assigned schoolwork is generally compulsory. Gaining consent from school and parents in the first place is a field of research in itself (see Alibali & Nathan 2010; Bartlett et al, 2017; Hatch et al. 2023)
1.5. Effects of Using Video Conference Software
Cost efficiency has been the primary challenge associated with deploying external supervisors or trained interviewers for surveys until recently. Professional interviewers are expensive and survey projects usually run on a budget (Walser & Killias 2012). In 2020 and following, another big problem superseded that issue: contact limitations and closing of schools. With contact limitations and lockdowns in place, sending in external supervisors was literally impossible.
One solution for that problem was the increasingly common use of video conference software, which allows people to virtually communicate face-to-face without having to be in the same room. Some schools were already using it for distance learning and as many others, UWE took advantage of their efforts. A complex coordinating process generated a set of new survey modes.
Video conference software is already used quite regularly in qualitative social science (see e.g. Deakin & Wakefield 2014, Weller 2017, Hennessey et al. 2022, Goh & Binte Rafie 2023). Common sources of bias are along the same lines as in face-to-face or telephone interviews (ibid.). An additional one is representativeness, as not all social strata have equal access to and competence to use VCT.
For quantitative research and questionnaire-based surveys it is uncommon to make use of it, because it may appear impractical or redundant. Impractical, because it takes effort to arrange and setup despite being technically unnecessary. Redundant, because a good questionnaire speaks for itself and does not need an interviewer. As discussed above, when dealing with adolescents in larger groups, supervision is required. There tend to be disruptions and group-dynamics that are unique for this age group, and that anyone can imagine who has been working with teenagers before. Strange et al. (2003) described them vividly. Simply put, we cannot assume independent observations in shared classrooms. Finally, if external supervision is needed, VCT offers a relatively inexpensive solution.
1.6. Web-Surveys
The age-group analysed here is born after 2004, individuals belonging to the generation we consider "digital natives”. It can be argued that their extensive experience with digital devices would not result in any significant differences in data quality when completing a questionnaire with a pen, a school-PC, or a smartphone. Raat et al. (2007) found negligible differences between adolescents answering a health-related survey either on paper in schools or via web survey in feasibility, reliability, and validity of the scales. Hallfors et al. (2000) report that self-administered surveys decrease item nonresponse compared to paper forms.
Young respondents in this generation, born after 2000, might even prefer the digital delivery. In the study conducted by Demkowicz et al. (2020), respondents expressed a preference for digital delivery, citing reasons such as increased efficiency, their familiarity with surveys on digital devices, concerns about anonymity due to recognizable handwriting, and heightened security concerns associated with the potential loss of paper forms.
Gummer and Roßmann (2015) argue convincingly, that the device matters indeed. However, their main argument was that the questionnaire design must fit the device, because there may be differences in visibility of the questionnaire or different download-times when using mobile devices – affecting response latency. Few respondents in Demkowicz et al. (2020) did also state that they struggled sometimes with the visual formatting. The survey project analysed here utilized software that is suitable for all devices (“Sosci-Survey” (Leiner 2021)) and tested the resulting questionnaire on all possible devices – there was satisfactory visibility of the questionnaire and there were no reports about problems with downloading the questionnaire or uploading responses.
2. Materials and Methods
The hypotheses will be tested in the following using regression analyses, based on data from the 2021 wave of the survey study UWE that has been described briefly above (available via Stefes 2023). Data quality indicators will be predicted based on different supervision modes, while controlling for respondent- and interview characteristics.
UWE set out asking every youth in grades seven and nine in two Ruhr-Area municipalities about their well-being, everyday life, and social resources, every other year since 2019 using standardised questionnaires (see Schwab et al. 2021 and Stefes et al. 2023). The analyses in this study utilize data resulting from the survey round in one municipality that enabled cooperation with local secondary schools in spring 2021. Within this wave, a cohort comprising 923 students from grades seven and nine, typically ranging from 12 to 15 years old, within a single municipality, has commenced the process of responding to the questionnaires. During this period, schools were closed or opened for a limited number of students, depending on incidence rates of the Covid-19-pandemic. Due to these unique circumstances, different modes of supervision were used in the same schools and even classrooms were surveyed in various setups.
The survey itself covers multidimensional operationalisation of subjective well-being, social resources and contexts and allows drawing a comprehensive picture of adolescent life from a socio-ecological perspective (Knüttel et al. 2021). Initially, the data collection involved handing out questionnaires in classrooms with an external supervisor and a teacher present. Supervisors were responsible for explaining and answering questions, while teachers represent figures of trust and authority during these sessions. The supervisors were mostly the researchers responsible for the survey itself and their student assistants, who have been trained as interviewers.
The study provided three modes of supervision for groups of respondents aged 12 to 15, including (A) only teachers present, (B) teachers and external supervisors present via video-conference technology (VCT), or (C) external supervisors present via VCT without teachers. Questionnaires were self-completed using school-owned devices, personal devices, or paper forms. This quasi-experimental study allows for systematic evaluation of potential differences in data quality-between these modes of supervision.
2.1. Data Quality indicators and Analyses
There are three factors of data quality that can be indicated reliably using the data sets of UWE: Dropouts, item nonresponse, interview duration and straightlining. This section describes these indicators and discussed why they are used and how they are analyzed.
A response is defined as drop-out, when the survey ended before the last quarter of the questionnaire was answered. The drop-out threshold lies between two item batteries, more general questions about school life and a battery about bullying. Roughly 5% of all respondents answered less than 75% of the overall questionnaire, this is what is considered dropping out prematurely. The indicator can be seen as a negative of completion rates, which is a common measure of data quality. Mühlbrock et al. 2017 used drop-out rates to compare self-administered versus supervised surveys among young adults but found no significant differences in drop-out rates. Drop-out is a binary variable, hence the use of logistic regression is adequate to predict it (Heeringa et al. 2017, Niu 2020). Logistic regression assumes a linear relationship between the independent variables and the log-odds of the dependent variable. It also assumes the absence of multicollinearity, meaning that independent variables are not highly correlated. Multicollinearity concerns have been effectively addressed, as evidenced by satisfactory Variance Inflation Factor (VIF) values within the regression model, affirming the absence of significant multicollinearity among the independent variables. Limitations include its sensitivity to outliers and the assumption of linearity, which may not always hold in complex relationships. Additionally, logistic regression assumes that the observations are independent, and violations of this assumption can affect the accuracy of parameter estimates (Niu 2020). The independence of observations assumption usually does not hold in classrooms, as all respondents in the same classroom are exposed to the exact same conditions, which can differ between classrooms. Therefore, clustered standard-errors and controls of all available conditions are implemented in the logistic regression model.
As a second indicator, I analyse item-nonresponse. There were 210 items presented to all respondents. The great number of items is one of the reasons why the absence of supervision was not considered in the field. Most respondents answered most of the questions, but there are also a lot of gaps in the data. While nonresponse rate is a common indicator for survey data quality (e.g. Gummer et al. 2021; Leiner 2019), its use as a proxy indicator for nonresponse bias has been questioned (Wagner 2010). Wagner (2010; 2012) calls researchers to use the fraction of missing information instead. They conclude that although completion rates might not be a good indicator for nonresponse bias, they are adequate for a comparison of different data collection methods (Wagner 2012). For the analyses in this study, item nonresponse is measured by simply counting the items with missing values in the data set. Some questions were filtered and could not be answered by all respondents, such as follow-up questions about migration background. They are excluded from the count. Respondents who were defined as drop-outs have been excluded form this analysis, because they are extreme outliers. The decision against a share or missing information instead of the raw count is based on the argument that a count is easier to interpret than a share, and the transformation into a percentage adds no information. Since the resulting variable can be described as count data, a variation of poisson regression is adequate (Little 1988; Little 1992). There is no reason to assume zero-inflation, because all zeros in the data are simply complete questionnaires. Therefore, a negative binomial model to predict the number of unanswered questions will be applied. This model, akin to the previously discussed logistic model, assumes independence of observations, which is addressed by using clustered standard errors. VIF-values cannot be estimated in negative binomial regression models. As recommended by Türkan and Özel (2013), the model utilizes jackknifed estimators to remedy potential effects of multicollinearity and reduce bias in the estimation process.
The duration of interviews conducted with digital questionnaire forms has been recorded, allowing examination for any indication of haste. Interview duration is an ambiguous indicator. We cannot simply claim that longer duration indicates higher quality than shorter, just as we assume less dropouts or nonresponses to be indicators of better quality. Questions can be answered too slowly or too quickly. The former might imply that respondents have trouble understanding the questions or being distracted. The latter could indicate speeding through the questionnaire or straightlining and consequently not answering truthfully. A very fast completion of the questionnaire can be a hint for a careless or fake response, hence Leiner (2019) suggests using completion times to identify meaningless data. In terms of response time for single items, Revilla and Ochoa (2015) mention that highly skilled respondent could answer very quickly, although extremely short response times rule out the possibility that respondents read the question at all. They seem to lean towards the notion that short response time is generally related to lower quality of responses. Their results support this claim convincingly and are in line with the findings of Gummer and Roßmann (2015), who found longer duration in higher motivated respondents. Interview duration was recorded in seconds during the self-administered questionnaire procedure. To make my results more comprehensible I recoded them into minutes. The analysis excludes dropouts and respondents that filled out paper forms. The former because their duration is irrelevant for this question, the latter because the required data could not be collected for this group. Since Tourangeau et al. (2013) used Cox regression to model interview duration, the study follows their recommendation here. Critical assumptions and limitations of cox regression are similar to the aforementioned and involve linearity of effects and independence of observations (Braekers & Veraverbeke 2005, Tourangeau et al. 2013). To assess potential inaccuracies due to multicollinearity, Variance Inflation Factors have been calculated and are satisfactory for all covariates.
Whether a response is truthfully or not is difficult to assess. Satisficing response behavior may lead respondents to anchor their answers on the first response option they find satisfactory, which easily minimizes the required effort for survey completion (Gummer et al. 2021). If they then align their subsequent responses with the initial choice, straightlining occurs. The absence of straightlining enhances the reliability of data by signaling genuine respondent engagement and truthful reporting (Leiner 2019; Gummer et al. 2021). UWE uses several item battery or grid questions suitable for a thorough examination of possible straightlining patterns. Many grid questions only contain three or four items, which can be answered truthfully with the same option repeatedly. Hence, the analysis presented in this work seeks patterns in item batteries containing at least five items. The questionnaire contains six grids that are unlikely to be answered truthfully by repeatedly choosing the same option. All of the items are answered on a five-point-scale. Another logistic regression model will predict how likely it is that at least one occurrence of straightlining can be detected in each supervision mode. The same measures are applied as in the first logistic model and all VIF-values are satisfactory.
2.2. Explanatory Variables
All statistical models control for the same characteristics of the respondents and the interview setting. Respondent characteristics include age, gender, migration background, German literacy, grade, and school type. School type distinguishes three types: higher secondary track (Gymnasium), comprehensive secondary track (Sekundarschule and Gesamtschule) and practical secondary track (Realschule and Hauptschule). The main difference is that attainment of the first one qualifies graduates for tertiary education, in the second this is optional and the third one does not offer this option.
Table 1 presents descriptive statistics of the variables used in the analyses:
Girls are slightly overrepresented in the sample. The maximum age in the sample is 17, although the usual age in grades seven to nine is 12 to 15. A very small portion of the sample has literacy problems, indicated by the question “How easy is it for you to read German?”, to which they answered “hard” or “very hard”. 41% of the sample has a migration background, which was defined as being born in another country or having a parent being born in another country than Germany. This share is consistent with the overall population of that age in metropolitan Germany.
The interview setting is characterised by the supervision (teacher present or not, supervisor present or not), the survey location (School or at home) and the form in which the questionnaire was delivered (paper or online). There were 84 classroom-sessions and the models control for potential heteroskedasticity by clustering the standard-errors accordingly in all models. The supervision has three possible forms, assignment was done according to schools’ equipment, teachers’ preferences and the official contact-restrictions. An overview of the supervision modes used in which schools can be found in the
Table 2. During the time the survey was conducted, some schools were practicing what was called “Wechselunterricht”, or “alternate distance learning”. Classes were split up in two groups, which were taught at home or in school for one week, in the next week they swapped places. This allowed the researchers to test different approaches on the same classroom, assignment to one of the two groups was usually random by surname.
(A) Teacher only: One share of classrooms was surveyed without external supervision involved. Teachers were given paper questionnaires or shortened online-survey links. The links were available as QR-codes as well. They had the option to present a video or explain the procedure themselves. For that option they had an introduction and manual prepared by the institute, in order to ensure that all respondents had the same information. Respondents could either use their personal devices, or school-owned devices, depending on teachers’ perception what would work best in their classroom.
(B) Teacher and external supervisor: Another group of classrooms had teachers present and was more or less constantly connected to an external supervisor via video conference software. They introduced the survey and offered help with comprehension questions. They also observed the situation but could not effectively intervene, given their mere virtual presence. This mode was combined with online-surveys conducted on school-computers or respondents’ personal devices.
(C) Supervisor only: Some classes were not available for being interviewed in school. We used the distance-learning channel they were already used to by that time to establish the group survey (Microsoft Teams in most cases). They were all online in a video call and filled out an online survey on their own devices at home. No teachers were present in this mode.
4. Discussion
A successful recruitment process can be seen as a prerequisite for high data quality, and previous literature shows that schools can be quite efficient partners for this endeavour (Heath et al. 2009, Alibali & Nathan 2010, Bartlett et al. 2017, March et al. 2022, Ellonen et al. 2023). This study does not question this but advises to carefully assess potential effects of any involvement of school staff when it comes to data collection. Hatch et al. (2023) suggest working as closely with school officials and personnel as possible, a conclusion that is based on considerations about response rates in the first place. This claim is valid as the loss of a single school in the process can threaten representativity of the whole sample (Newransky et al. 2020, Ellonen et al. 2023). The main message of this publication is that besides recruitment, involvement of school staff in standardized survey research in classrooms should be limited to a minimum.
This rather challenging claim is based on evidence of previous research as well as the results presented above. While there is a reasonable claim that any supervision can increase completion rates and validity of survey data when surveying adolescents (March et al. 2020, Bidonde et al. 2023), supervision might not be unproblematic in any case. There are two key factors speaking against teachers supervising survey research: They can make lousy research assistants (Demkowicz et al. 2020; Rasberry et al. 2020; March et al. 2022) and they can increase social desirability or related response bias unintentionally (Strange et al. 2003; Duncan and Magnuson 2013; Atkeson et al 2014; Cops et al. 2016; Möhring & Schlütz 2019). The results of the analyses in this article allow the conclusion that data quality is lower when teachers are responsible for data collection with regards to item nonresponse and the prevalence of satisficing patterns.
In light of potential compromise of data quality, the literature suggests that the introduction of external supervisors, acting as neutral parties well-versed in survey conduct, may serve as a valuable countermeasure. External supervisors have been shown to increase motivation (Demkowicz et al. 2020; March et al. 2022) and contribute to a neutral atmosphere (Demkowicz et al. 2020), potentially enhancing the data collection process. While teachers are usually present in schools anyways, (additional) external supervision is expensive (Walser and Killias 2012, Bartlett et al. 2017; March et al. 2022). Thus, the decision to involve them or not requires justification. This study tested two possible ways to ensure external supervision. Additional supervisors were present virtually in a video conference, while teachers were present in the classroom. This mode labelled “Mode B”, had minimal effect on data quality. The efforts invested in establishing this mode were found to be disproportionately high in comparison to its impact on data quality. “Mode C” yielded considerably better results and did not involve teachers and classrooms in the first place. However, this study is clearly limited in that it cannot determine whether the enhanced data quality of this mode is a result of the supervisor or the setting.
Future research should ideally investigate the impact of external supervisors in classrooms in the absence of teachers and explore teachers’ supervision of web surveys conducted in students’ homes. Additionally, the data collection in UWE did not feature a completely unsupervised survey mode, which is a clear limitation. The “supervisor only” mode could be considered almost unsupervised, as there was no physical presence of any adult and no risk of responses being exposed to a third party. Mühlböck et al. (2017) examined differences between web surveys with interviewers present and modes without supervision and found none in terms of response behaviour but higher completion rates in the supervised groups. Both findings are contradicted by this study, as drop-out rates did not differ significantly, but satisficing response patterns were more prevalent in physically supervised modes compared to virtual supervision and are in line with Tourangeau and Smith (1996), who concluded that self-administration can yield better results indeed.
The study can contribute to the debate whether web-surveys or digital questionnaires work better in self-administered surveys, as opposed to paper forms. In addition to the evident advantage of eliminating the need for transcribing digitally delivered questionnaires, early research particularly supports this mode for its favourable impact on anonymity, reducing item nonresponse associated with social desirability (Tourangeau & Smith 1996; Hallfors et al. 2000). More recent studies indicate that the data quality of digital surveys is on par with traditional paper forms (Raat et al. 2017; Felderer et al. 2019), given that all formats and devices function as intended (Gummer and Roßmann 2015). Moreover, there is a notable inclination among younger respondents towards favouring digital delivery (Demkowicz et al. 2020). In contrast, the results presented here indicate a higher drop-out rate among respondents using the online questionnaire. There are two explanations for this which are backed by the protocols of the UWE study. Firstly, intermittent disruptions in the connection between digital devices and the survey server occurred. This was primarily attributed to the limited infrastructure in schools and respondents’ devices occasionally lacking sufficient charge. Secondly, a paper form doesn’t go away when you close it, posing a higher barrier to drop-out. However, when drop-outs were excluded from the analyses, item-nonresponse and prevalence of satisficing did not differ between the modes.
The quasi-experimental nature of this study could be considered a potential limitation, and it would be worthwhile to conduct further research using randomly assigned supervision modes. However, respondents did not self-select into supervision modes. The assignment was completely external and based on decisions of the school boards, depending on factors completely unrelated to respondents’ competencies or motivation to fill out a questionnaire. Hence, the differences in item-nonresponse and response patterns are valid arguments for future research to not rely on teachers alone when conducting standardized survey research in classrooms.