Types of Studies
Most studies employed quantitative methods in the evaluation. Sixteen of 31 studies (52%) were purely quantitative [
8,
9,
10,
11,
12,
13,
17,
18,
19,
21,
23,
25,
28,
29,
31,
32], while an additional six studies used qualitative methods as part of a mixed-methods design [
14,
15,
20,
22,
30,
38]. Surveys were the most common instrument used to collect data through randomized control trials (n=5) [
8,
12,
18,
21,
28], cluster randomized control trials (n=6) [
9,
19,
23,
25,
29,
31], cross-sectional cohort studies (n=3) [
11,
17,
32], and controlled cohort studies (n=1) [
10]. In the eight mixed-methods and six qualitative-only evaluations [
16,
21,
24,
26,
27,
33], the most popular methods employed were in-depth interviewing (n=7) [
14,
16,
20,
22,
26,
27,
33], focus group discussions (n=4) [
16,
20,
22,
26], and semi-structured interviewing (n=3) [
15,
24,
26]. However, two articles employed unique methods surrounding ethnography [
26,
27]. Lees et al. utilized hearsay ethnographies where participants chronicled their daily lives in diaries after receiving training on ethnographic observation [
26], while Exner-Cortens et al. employed Photovoice, a qualitative research methodology where participants use photography to capture their experiences around a health intervention [
27].
Four articles were systematic reviews of the existing literature [
34,
35,
36,
37]. Each had different focuses. Adamou et al. explored gaps in male engagement in family planning programs (Adamou et al., 2019). Sharma et al. reviewed “promising practices” for the monitoring of gender-based violence interventions in humanitarian settings around the world [
35] More broadly, Mandal et al. reviewed measures of women’s empowerment and other gender-related constructs in family planning and maternal health interventions [
37], while Kowalcyzk et al. broadly surveyed issues of conducting and evaluating gender-based programs in global health [
34] Adamou et al. and Sharma et al. also employed in-depth interviews with their reviews to give more depth to their findings [
35,
36].
Ways in Which Gender Was Integrated into M&E
Through the data extraction process, we mapped the existing evidence on gender responsiveness in M&E. These included: the use of gender-related theory or frameworks to guide M&E, the inclusion of gender scores, the inclusion of gender domains of analysis, the disaggregation of data by sex, and the use of community-based/participatory approaches.
Use of gender-related theory, models, or frameworks to guide M&E
Many tools and frameworks used in gender integration are grounded in theory, both gender theory and theory that is relevant for understanding gender. Gender theories include, for example, the theory of gender and power [
39] and male gender norms theory, such as masculinities and hegemonic masculinities [
40]. Other cross-cutting theories were used to help contextualize gender as a construct and the ways it manifests, such as intersectionality, the socio-ecological model, life course approach, and social cognitive theory.
Twelve references described the use of some theory or conceptual model to guide the M&E (whether in data collection, development or interpretation of results), though most were not explicitly related to gender. Commonly used models and theories included the Socio-ecological model (SEM) (n=3) [
12,
22,
23] and Social cognitive theory (SCT) (n=2) [
19,
24]. Kowalczyk et al.’s review on evaluating gender-based programs found that roughly 25% (n=32) of the studies included mentioned a conceptual model or theory that guided the research; the most used was SCT [
34].
Five studies developed their own guiding frameworks which included a specific gender focus such as male engagement [
36], gender norms [
10], empowerment [
25], and agency and decision-making [
9,
16]. Sharma et al.’s review to identify promising practices for the monitoring and evaluation of GBV interventions highlights the valuable role of a theory of change to guide evaluations, particularly those seeking to assess effectiveness [
35].
In Exner-Cortens et al.’s Photovoice study, the authors followed “feminist evaluation principles” to design their community-based participatory research of an empowerment intervention for teenage boys in Canada. They designed the photo-based evaluation method to be participatory, reflexive, and action-oriented to advance feminist goals to move away from patriarchal societal structuring and the perpetuation of misogyny. Aiming to transform the young men into active participants in the effort to reduce gender-based disparities in health through reflecting critically on gender inequity and how to respond to it, they utilized an evaluation that centered gender [
27].
Incorporation of gender scores into data collection and analysis
Gender scores, or the use of gender variables to quantify the impact of gender as a social construct, measure and elucidate different ways in which gender norms, roles, and relations manifest and associated impact on a particular outcome or outcomes [
41]. As the way in which gender manifests is multi-faceted and context specific, a standardized set of gender variables is not relevant for all studies. Composite gender scores are also sometimes used. As such, many kinds of gender scores exist, with studies often developing their own gender scores relevant for the topic.
Twelve articles employed a gender score tool in their data collection. Often, studies would make use of an established gender score tool, most of which measured Likert scale responses. For example, the Gender Equitable Men (GEM) scale, a theoretically based measure of inequitable and equitable gender norms within sexual and intimate relationships was used in four evaluations [
14,
32,
36,
38]. This scale was originally developed by Pulerwitz et. al for their study on AIDS interventions in the United States. The GEM scale was also described in Mandal et al. systematic review on measures of women's empowerment and related gender constructs in family planning and maternal health program evaluations [
37]. Similarly, Krishnan et al. used the Gender Equity Scale for Women (GESW), an adaption of the GEM, for their study on workplace gender equity in India [
19]. Santhya et al. created three additive indices by combing indicators across three domains of interest (index of gender role attitudes and notions of masculinity; index of attendees rejecting men’s controlling behaviors; index of attitudes rejecting violence against women and girls). These indicators were captured using items from the GEM, the WHO multicounty study on violence, the National Family Health survey, and other studies of violence conducted in India [
14].
Figueroa et al. developed a gender attitudes scale for their analysis to evaluate an HIV prevention intervention in Mozambique [
28] The scale was based on a series of 12 statements about gender roles assessed using Likert scale responses. Also focusing on gender attitudes, Seff et al. developed a caregiver gender-equitable attitude score to measure the effectiveness of a girl’s education and safety intervention in the Democratic Republic of the Congo [
12]. This score was similarly based on Likert scale responses to 10 statements related to gender roles and dynamics. In their evaluation of a teen sexual confidence intervention. Lecroy et al., employed girl efficacy and self-assertive behaviors measures developed in previous studies, also based on sets of Likert items [
18].
Tura et al. used the previously validated Women’s generalized self-efficacy scale (GSE) to evaluate agency created by a women’s savings group in Mozambique [
9]. GSE consists of constructive declarations such as “I can always manage to solve difficult problems if I try hard enough”; these are reported on a Likert scale. Continuing the theme of efficacy and agency-focused scores, Sahyoun et al. used both the Women’s Empowerment in Agriculture Index (WEAI) and the Duke Social Support Index (DSSI) [
15]. Evaluating a food and cooking-based employment intervention among refugee women in Palestine, they used the WEAI to assess empowerment of the women and the DSSI to understand the financial and social wellbeing of women through the tool’s open-ended questions. In Mandal et al. Systematic review, authors pinpointed and described the sexual relationships power scale as a key tool to measure determinants or dimensions of women’s empowerment [
37].
Finally, Long et al. made use of The Menstruation Engagement, Self-efficacy, and Stress assessment (MENSES), a proven tool that aims to measure girls’ experiences managing menses at school [
17]. Used to analyze a menstrual health intervention among young adolescence in Mexico, this tool consists of 45 questions across three outcomes: school-engagement, stress, and self-efficacy.
Gender domains of analysis
Gender domains of analysis include the different ways in which gender power relations manifest as inequities. A gender analysis framework provides a structure for organizing information about gender power relations. Key domains that constitute these relations include access to resources; division of labor and activities; social norms, ideologies, beliefs, and perceptions; and rules and decision-making [
2]. These can be incorporated into interview and survey questions, indicators, and variables for analysis. Even though most studies did not use a specific gender framework to guide the M&E, many incorporated gender domains within their analysis, with some using specific tools to assess different gender domains, particularly gender norms.
For example, one study conducted a cultural consensus analysis (CCA) to assess norms among targeted cultural subgroups across the West Bank. Men and women answered questions across the dimensions of GBV, economic empowerment for women, and household and community dynamics of GBV [
13]. The UZIKWASA project also captured changing norms and practice in relation to key areas of their programs, including early forced marriage, education support, and gender equitable parenting [
26].
Blanchard et al. used composite empowerment indicators to explore associations between involvement with community mobilization programs, dimensions of self-reported empowerment (defined as power within, power with, and power over), and outcomes of HIV risk reduction [
25]. The Sensemaker tool also explored empowerment and decision-making through a complex mixed methods methodology [
30]. The authors note that this tool, while highly contextualized through co-development, can also be time and labor-intensive in low-literacy environments. It is also one of the few non-Likert scale approaches that were used.
Berti et al. modified a tool from the “Child Survival Technical support project” to assess mother’s perceptions of the father’s role in caring for their pregnant partners and in family planning [
29]. Considering gender analysis domains, these measurements can be related to access to resources (i.e. access to male engagement and support), and division of labor.
Approaches Which Support Gender Responsive M&E
Other approaches which support the integration of gender into M&E include the use of sex disaggregated data and community-based/participatory methods. On their own these approaches would not to be considered gender responsive, however, they are reported here due to their important role in gender responsive M&E. M&E that only employs disaggregated data by sex or community-based/participatory methods without other dimensions of gender integration would have been excluded in search results.
Sex disaggregation of data
Sex disaggregation is important to determine whether outcomes differ for different gendered groups. There are three important points of note regarding sex disaggregation. Firstly, a precursor to sex disaggregation is including people of different sexes or genders within the intervention and/or M&E process itself. Secondly, not all programs or interventions will need to include sex disaggregated data as some will be focused on one sex or gender only (however there may still be need for engagement of other genders). And thirdly gender differences will likely be missed if gender domains of analysis, some of which that do not pertain to all genders, are not also included, irrespective of whether sex disaggregated data is used.
Six studies that included both men and women reported sex-disaggregated data [
8,
10,
12,
13,
21,
23]. The MENSES study collected some data from both boys and girls but did not present disaggregated data in their tables [
17]. Alternatively, they used socio-economic status to disaggregate data among female respondents. The review conducted by Sharma et al. describes the importance of including participants of various genders and ages as well as disaggregating data based on age, sex, and disability status, to ensure GBV risk mitigation M&E effectively captures potential risks for vulnerable groups. They further add that data analysis should include triangulation of data from multiple sources to assess convergence and divergence of findings [
35].
Meaningful engagement and participation in gender responsive M&E
Some definitions of gender responsive M&E include process-related considerations, such as how data is collected and analyzed, by whom, and/or whether a M&E process is inclusive and participatory of diverse voices, particularly women’s voices. Often such approaches are not described as being gendered unless in reference to feminist methodologies. Such approaches are important because they emphasize collaboration and partnership between investigators and participants, center lived experience as an important form of knowledge, and seek to empower, foster agency, and amplify the voices of participants. This includes considerations for who is engaged and how they are engaged. For example, are gender norms, hierarchies, and power differentials accounted for when designing meetings and other activities? When applied in concert with other gender responsive actions, these approaches can inform how gender and gender-related elements can be integrated within M&E.
Seven studies described the use of community-based or participatory methods in the evaluation design and implementation [
9,
12,
13,
16,
20,
26,
30]. Five of these studies collaborated with a local non-governmental or civil society organizations in study design [
9,
30] or data collection [
12,
13,
20]. Two studies worked directly with community members to collect data and interpret data [
16,
26]. In Lees’ et al., a male and a female community member trained as primary data collectors gave contextual information for the study. Further, because the study was iterative and dynamic, preliminary findings were discussed with participants in the early stages of the project [
26].
The systematic review by Sharma et al., found that “building partnerships with key actors and stakeholders, including local groups and organizations” was important to ensuring effective M&E design and implementation [
35]. The authors further argue that diversity factors must be considered, and M&E should be developed with relevant subgroups in mind, and with an intersectional approach that recognizes gender domains such as power and agency. Promising practices in community engagement included providing mechanisms for community feedback that are not unidirectional and facilitating community dialogues to obtain buy-in.
Sharma et al. also identified positionality of data collectors as an important domain [
35]. Three studies described positionality considerations in the implementation and write-up of M&E. One manuscript described the positionality of the data collectors, including members of the authorship team (27). Burke et al. noted that to avoid potential harms, programs need to take steps to ensure that staff do not introduce their own biases, particularly those that reinforce inequitable gender norms (22). Seff et al. discussed intentionally using a separate tool for collecting sensitive information from girls, such as those on sexual behaviors and exposure to violence [
12].