Preprint
Article

This version is not peer-reviewed.

Quantifying the Cross-Sectoral Intersecting Discrepancies Within Multiple Groups Using Latent Class Analysis Towards Fairness

Submitted:

08 February 2025

Posted:

10 February 2025

You are already at the latest version

Abstract
The growing interest in fair AI development is evident. The ''Leave No One Behind'' initiative urges us to address multiple and intersecting forms of inequality in accessing services, resources, and opportunities, emphasising the significance of fairness in AI. This is particularly relevant as an increasing number of AI tools are applied to decision-making processes, such as resource allocation and service scheme development, across various sectors such as health, energy, and housing. Therefore, exploring joint inequalities in these sectors is significant and valuable for thoroughly understanding overall inequality and unfairness. This research introduces an innovative approach to quantify cross-sectoral intersecting discrepancies among user-defined groups using latent class analysis. These discrepancies can be used to approximate inequality and provide valuable insights to fairness issues. We validate our approach using both proprietary and public datasets, including both EVENS and Census 2021 (England & Wales) datasets, to examine cross-sectoral intersecting discrepancies among different ethnic groups. We also verify the reliability of the quantified discrepancy by conducting a correlation analysis with a government public metric. Our findings reveal significant discrepancies both among minority ethnic groups and between minority ethnic groups and non-minority ethnic groups, emphasising the need for targeted interventions in policy-making processes. Furthermore, we demonstrate how the proposed approach can provide valuable insights into ensuring fairness in machine learning systems.
Keywords: 
;  ;  

1. Introduction

The “Leave No One Behind” (LNOB) principle emphasises the importance of addressing multiple, intersecting inequalities that harm individuals’ rights [1]. Intersecting inequality refers to the compounded disadvantages that arise from both the overlap of marginalised social categories (e.g., being female and living with a disability) and the intersection of multiple, mutually reinforcing dimensions of exclusion (e.g., deprivation in both health and education) [2]. Meanwhile, the increasing adoption of AI tools in decision-making processes across various sectors, including health [3], energy [4], and housing [5], underscores the urgency of ensuring fairness in their design and implementation [6], as unfair AI systems may risk deepening existing inequalities.
AI fairness research spans various sectors, including healthcare [7,8,9], finance [10], and education [11]. However, research on cross-sectoral intersecting AI fairness remains limited. The term “cross-sectoral intersecting" refers to the interaction and overlap of multiple sectors, such as healthcare, housing, and energy.
To address this gap, we propose quantifying cross-sectoral intersecting discrepancies between groups. These discrepancies refer to differences in user profiles across groups and can serve as a proxy for underlying inequalities, providing valuable insights for stakeholders. Additionally, the quantified discrepancies in data can offer insights into AI fairness when the data is used to train models. According to the LNOB principle of equal opportunities, everyone should ideally have equal access to public services and resources without discrepancies. We prefer the term “discrepancy” over “disparity” or “difference” because it suggests an unexpected difference.
Bias and Fairness Recent studies highlight concerns that AI-supported decision-making systems may be influenced by biases [12], which can unfairly impact vulnerable groups, such as ethnic minorities, emphasising the need for research on AI system fairness [13]. A primary obstacle in advancing and implementing fair AI systems is the presence of bias [14]. In AI, bias can originate from various sources, including data collection, algorithmic design, and user interaction, as illustrated in Figure 1.
In particular, most AI systems rely on data for training and prediction. This close connection means that any inherent biases in the training data can be propagated and embedded into the AI systems, leading to biased predictions, the so called bias-in and bias-out. Even if the data itself is not inherently biased, algorithms can still exhibit biased behaviour due to inappropriate design and configuration choices. These biased outcomes can influence AI systems in real-world applications, creating a feedback loop where biased data from user interactions further trains and reinforces biased algorithms, resulting in a vicious cycle [6].
Bias stemming from data is a crucial factor affecting fairness, as inappropriate handling of it may trigger a cascade of other biases, exacerbating fairness issues.
Discrepancy in data may indicate inequalities and lead to biases and unfairness, given that individuals should ideally be treated equally in ideal situations. Figure 2 illustrates the correlations between fairness, bias, inequality, and discrepancy. Essentially, fairness is indirectly linked with discrepancy, and discrepancy can contribute to unfairness. The difference between fairness and bias is that the former can be viewed as a technical issue, while the latter can be viewed as a social and ethical issue [14]. Furthermore, bias is a problem caused by historical and current social inequality [15], and inequality can manifest as discrepancies. Figure 2 starts with discrepancy and moves through inequality and bias to fairness. Therefore, our research focuses on quantifying cross-sectoral intersecting discrepancies among different groups, with an aim to uncover insights or patterns related to inequality, bias, and AI fairness.
Background and Motivation Currently, there is limited research focusing on quantifying discrepancies. Most recent research on quantifying bias and/or inequality primarily revolves around resource allocation strategies and generally relies on objective data (e.g., [16]). However, these approaches have limitations and face challenges in effectively assessing and measuring bias or discrepancy in datasets unrelated to resource allocation. For example, in social sciences, much data is collected through questionnaires, which often include binary, categorical, or ordinal data types related to subjective responses and user experiences. These questionnaires may cover various aspects, resulting in intersecting and cross-sectoral data of high dimensions. Analysing data based on no more than two dimensions or sectors may overlook important information or patterns. Therefore, we believe that quantifying cross-sectoral intersecting discrepancies is valuable, as it can provide comprehensive insights.
The Anonymous project1 aims to establish safer online environments for minority ethnic (ME) populations in the UK. Its survey questionnaire covers five key aspects: demography, energy, housing, health, and online services. Notably, the data collected in this context does not directly pertain to resource allocation, making it challenging to explicitly define and detect bias within the data using current methods, despite the presence of discrepancies. These discrepancies may arise from various factors, including culture, user experience, and discrimination, potentially contributing to bias or unfairness. This research is mainly motivated by the Anonymous project, so the datasets we used primarily cover the health, energy, and housing sectors, with our research targeting ME groups.
In our preliminary research (see Appendix A) in England, based on a Anonymous project survey question regarding health and digital services2, we observed a notable discrepancy in the Chinese group: 30.16% lacked English proficiency and 26.98% struggled to use the online system, while most other ethnic groups reported less or no concerns. These discrepancies, stemming from cultural differences, user experiences, or discrimination, contribute to inequality and may affect AI fairness. For instance, an AI system might inappropriately assume that only Chinese individuals require English language support, thereby neglecting other ME groups who may also need assistance.
However, investigating multiple and cross-sectoral questions simultaneously is challenging. For instance, the Anonymous project’s data contains similar questions for the energy and housing sectors, and current methods struggle to analyse these sectors jointly. Therefore, we propose an approach to quantify intersecting and cross-sectoral discrepancies for multiple ethnic groups by leveraging latent class analysis (LCA) [17].
LCA is a popular method in social science [18] because it can identify latent groups within a population based on observed characteristics or behaviours. LCA offers a flexible framework for exploring social phenomena and integrating with other analytical techniques. In this research, we use LCA to cluster intersecting and cross-sectoral data, encompassing questions across the health, energy, and housing sectors. This method enables us to derive latent classes and outcomes, with each class describing a distinct cross-sectoral user profile. This approach moves beyond defining user classes solely based on individual questions. More details of our proposed approach are presented in Section 2, with experiments and results reported in Section 4.
The main contributions of this research are as follows: (1) we propose a novel and generic approach to quantify intersecting and cross-sectoral discrepancies between user-defined groups; (2) our findings reveal that ME groups cannot be treated as a homogeneous group, as varying discrepancies exist among them; and (3) we demonstrate how the proposed approach can be used to provide insights to AI fairness.

2. Quantifying the Cross-Sectoral Intersecting Discrepancies

The overall workflow of the proposed approach is shown in Figure 3, using a binary-encoded survey data as an example. It is noted that our approach is not limited to this specific format and can be applied to a wide range of similar problems. We will present more experiments with other datasets in Section 4 to further validate and showcase the features of our approach.
In Figure 3, Stage (1) illustrates the binary-encoded data D, where { Q 1 , Q 2 , } represents the selected survey questions, covering user experiences across different sectors. Similarly, { u 1 , u 2 , u 3 , } denotes the set of survey respondents. Here, q represents the response options; for example, q 2 , 1 refers to the first option for Q 2 . A value of `1’ indicates a selected option, while `0’ signifies that it was not selected. Each respondent’s responses can be represented as a vector q . The set of all indicator variables is denoted by X, with q X , where X is the space of all possible response vectors. For other datasets, the encoding method should be selected based on the data format and type. The red arrow in Figure 3 illustrates the LCA process3, which includes hyperparameter selection and model fitting.
For simplicity, the LCA process is defined in Equation (1), where θ C and θ X specify the marginal distribution of the latent classes C and the class-conditional distribution of the indicator variables X, respectively. Here, θ = θ C , θ X , and each latent class is denoted by c. In fact, the LCA is specified by a set of θ Θ , where θ C , θ X Θ C × Θ X .
p ( c , q ; θ ) = p c ; θ C p q c ; θ X
The advantage of using LCA is straightforward: it considers the joint probability distribution of all variables. This means potential inequalities or discrepancies can be analysed jointly. Once we obtain the distributions of latent classes { c 1 , c 2 , c 3 , . . . } over user-defined groups { e 1 , e 2 , e 3 , . . . } (as shown in Figure 3 Stage (2)), we can calculate the discrepancy Δ .
Quantification of Discrepancy Let us denote the size of the dataset as N, where i 1 , 2 , , N represents an individual sample. Concurrently, let c C denote a latent class, with the total number of classes being | C | , and let N c represent the count of samples classified into latent class c. To quantify the discrepancies, it is necessary to establish a grouping variable G, which can be defined based on factors such as ethnicity, age, or income level. Here, | G | denotes the total number of user-defined groups, e G represents one specific group within this set, N e denotes the number of individuals from group e, and N c e denotes the number of individuals from group e assigned to the class c.
To initiate the quantification process, the proportions r of samples from each user-defined group within each latent class need to be calculated, as detailed in Line 6 in Algorithm 1. This calculation can be performed using r c e = N c e / N e , e G , c C . The reason for calculating r is that user-defined groups may have different numbers of samples; therefore, using percentages for subsequent analyses ensures fairness and consistency.
Subsequently, we can derive a matrix of results characterised by dimensions | G | × | C | , as shown in Figure 3 (3). Within this matrix, each row corresponds to the proportions of samples from a specific group within each latent class. It is important to note that in this context, each latent class effectively represents an individual user profile and can be viewed as a distinctive feature. Consequently, each row within the matrix may be employed as a feature vector denoted as r e , serving as a representation of a specific group within the feature space.
In the assessment of discrepancy between two feature vectors, various methods may be employed, including the Euclidean distance, Kullback-Leibler Divergence, Earth Mover’s Distance, and Manhattan Distance, among others. In our approach, we propose the utilisation of Cosine Similarity to calculate the discrepancy, which is defined as Δ = 1 cos ( θ ) = 1 r e · r e r e r e . Finally, we can iteratively calculate Δ between any pairs of vectors r and obtain the discrepancy matrix S (as shown in Stage (4) of Figure 3). The AVG column in Stage (4) contains the mean discrepancy values for e, which can be viewed an approximation for how each e is different from others.
Algorithm 1 Quantifying the Intersecting Discrepancies within Multiple Groups
1:
Input: D and G G denotes a set of user - defined groups
2:
Initialise M Create LCA model M
3:
Estimate M based on D
4:
for  e  in  G  do
5:
    for c in C do
6:
         r c e = N c e / N e
7:
    end for
8:
end for
9:
for  e  in  G  do
10:
    for  e  in G do
11:
         Δ e e = 1 r e · r e r e r e Pair - wise Calculation
12:
    end for
13:
end for
14:
Output: Discrepancy matrix S of size | G | × | G |
We suggest the use of Cosine Similarity due to its inherent characteristics, including a natural value range spanning from 0 to 1 as r e and r e contain no negative values. Importantly, it does not necessitate additional normalisation procedures. This metric, possessing with a fixed value range, enhances comparability and offers support to subsequent AI fairness research. The proposed approach is summarised in Algorithm 1.

3. Related Work

Quantifying and improving AI fairness As AI technologies are used more and more frequently in real life, people’s concerns about the ethics and fairness of AI have always existed, especially when AI is increasingly used in problems with sensitive data [19]. Morley et al. [13] and Garattini et al. [20] noticed that an algorithm “learns” to prioritise patients it predicts to have better outcomes for a particular disease. And they also noticed that AI models have discriminatory potential when facing ME groups on health. Therefore, people are paying more and more attention on the impact and mitigating methods of AI bias.
Wu et al. [16] proposes the allocation-deterioration framework for detecting and quantifying health inequalities induced by AI models. This framework quantifies inequalities as the area between two allocation-deterioration curves. They conducted experiments on synthetic datasets and real-world ICU datasets to assess the framework’s performance and applied the framework to the ICU dataset and quantified the unfairness of AI algorithms between White and Non-White patients. So et al. [21] explores the limitations of fairness in machine learning and proposes a reparative approach to address historical housing discrimination in the US. In that work, they used contemporary mortgage data and historical census data to conduct case studies to demonstrate the impact of historical discrimination on wealth accumulation and estimate housing compensation costs. They then proposed a remediation framework that includes analysing historical biases, intervening in algorithmic systems, and developing machine learning processes that reduce correct historical harms.
Latent Class Analysis (LCA) is a statistical method based on mixture models and often used to detect potential or unobserved heterogeneity in samples [22]. By analysing response patterns of observed variables, LCA can identify potential subgroups within a sample set [23]. The basic idea of LCA is that some parameters of a postulated statistical model differ across unobserved subgroups, forming the categories of a categorical latent variable [24]. In 1950, Lazarsfeld [25] introduced LCA as a means of constructing typologies or clusters using dichotomous observed variables. Over two decades later, Goodman [26] enhanced the model’s practical applicability by devising an algorithm for obtaining maximum likelihood estimates of its parameters. Since then, many new frameworks have been proposed, including models with continuous covariates, local dependencies, ordinal variables, multiple latent variables, and repeated measures [24].
Because LCA is a person-centered mixture model, it is widely used in sociology and statistics to interpret and identify different subgroups in a population that often share certain external characteristics from data [27]. However, in social sciences, LCA is used in cross-sectional and longitudinal studies. For example, in relevant studies in psychology [28], social sciences [29], and epidemiology [30], mixed models and LCA can be used to establish probabilistic diagnoses when no suitable gold standard is available [17].
In [28], the relationship between cyberbullying and social anxiety among Hispanic adolescents was explored. The sample consisted of 1,412 Spanish secondary school students aged 12 to 18 years. There were significant differences in cyberbullying patterns across all social anxiety subscales after applying LCA. Compared with other profiles, students with higher cyberbullying traits scored higher on social avoidance and distress in social situations, as well as lower levels of fear of negative evaluation and distress in new situations. Researchers in [29] developed a tool, using LCA, to characterise energy poverty without the need to arbitrarily define binary cutoffs. The authors highlight the need for a multidimensional approach to measuring energy poverty and discuss the challenges of identifying vulnerable consumers. The research in [30] aimed to identify subgroups in COVID-19-related acute respiratory distress syndrome (ARDS) and compare them with previously described ARDS subphenotypes by using LCA. The study found that there were two COVID-19-related ARDS subgroups with differential outcomes, similar to previously described ARDS subphenotypes.

4. Experiments

4.1. The Anonymous Project

Within the Anonymous project4, a systematic online survey was conducted to explore the experiences of individuals from ME groups in the UK, focusing on digitalised health, energy, and housing aspects. To examine cross-sectoral intersecting discrepancies among the seven ethnic groups (as shown in Table 1), we selected three questions from the Anonymous survey data related to these sectors. The answers from ME participants formed the dataset for this experiment. For England and Scotland, we have 594 and 284 samples, respectively. Due to varying sample sizes across ethnicities, we calculated discrepancies separately for each region.
Selecting the number of latent classes in LCA requires presetting. To address this, we conducted hyperparameter optimisation for each experiment to find the elbow point, applying this optimisation to all subsequent experiments.
The discrepancy results for England are presented in the left part of Table 1. The table shows that the Chinese group has the largest average (AVG) discrepancy value compared to other groups. Meanwhile, the Indian group exhibits the smallest discrepancy with the Bangladeshi group and also shows similarity to the Pakistani group. This is likely due to their close geographical locations and similar cultural backgrounds and lifestyles.
The right part of Table 1 presents the results for Scotland, showing similar outcomes: the Chinese group is distinct from others, while the Bangladeshi group is similar to the Pakistani group. The differences between England and Scotland may be attributed to their different policies and circumstances. We hypothesise that the primary reason for the Chinese group standing out is a lack of English proficiency. This is supported by our preliminary research mentioned in Section 1, which shows a significant number of Chinese participants expressing this concern. Additionally, BBC News [31] reports that the Chinese community experiences some of the highest rates of racism among all ethnic groups in the UK. Our discrepancy values may help explain this, as the Chinese group shows different experiences in digitalised online services.
To further verify the reliability of the discrepancy computation, we used PCA dimensionality reduction to visualise the relationships among ME groups based on the percentage distributions (as shown in Figure 3 (3)) of each ME group across different latent classes. The patterns presented in Figure 4 align with the findings obtained from the analysis of Table 1 (England). Overall, the Chinese group continues to stand out, positioned further away from other groups, indicating it has the highest average discrepancy. The Mixed and Bangladeshi groups are relatively distant from others, as they are scattered on opposite sides. Indian, Bangladeshi, and Pakistani groups are closer to each other compared to other groups, implying a similarity supported by the discrepancy values in Table 1. Meanwhile, the African and Caribbean groups are the nearest to each other, as represented by the double arrows between them.

4.2. EVENS

The Centre on the Dynamics of Ethnicity (CoDE), funded by the Economic and Social Research Council (ESRC), conducted “The COVID Race Inequalities Programme”. As part of this project, CoDE carried out the Evidence for Equality National Survey (EVENS)5, which documents the lives of ethnic and religious minorities in Britain during the coronavirus pandemic. The EVENS dataset comprises 14,215 data points and categorizes participants into 18 different ethnic minority groups. To facilitate a comparative analysis with the Anonymous project’s experiment, we focused on the same seven ethnic groups from Anonymous project, resulting in a filtered dataset of 4,348 participants from England and 253 participants from Scotland. We excluded entries with missing values to ensure the robustness of our analysis.
It is important to note that the EVENS dataset uses a different definition for mixed or multiple ethnic groups compared to the Anonymous project. To avoid inconsistencies that could affect the final analysis, we excluded mixed or multiple ethnic groups from the EVENS calculations. Due to the different questions in the EVENS and Anonymous project surveys, we selected three types of cross-sectoral questions from EVENS for analysis: housing, experiences of harassment, and financial situation. These questions were chosen to provide a broad view of participants’ living conditions, social experiences, and economic status during the pandemic.
We applied LCA separately to the data from England and Scotland to uncover patterns and discrepancies within and between these regions. This approach allows us to identify distinct subgroups within the ethnic communities based on their responses to the selected questions, providing deeper insights into the intersecting cross-sectoral experiences of these groups.
Seeing the discrepancies in the EVENS England data, shown in the left part of Table 2, it is clear that all discrepancy values are relatively small compared to the Anonymous project results. Notably, there are large discrepancies between the Indian and Bangladeshi groups, and between the Indian and Chinese groups. This differs slightly from the conclusions of the Anonymous project’s experiment, likely due to the different types of cross-sectoral questions selected. In the AVG column, the Bangladeshi group has the highest values, indicating they experienced COVID-19 differently. The Census 2021 reported that COVID-19 mortality rates were highest for the Bangladeshi group, for both males and females [32], supporting our findings. The right part of Table 2 describes the discrepancies for the EVENS Scotland data. There are significant discrepancies between the Caribbean and other ethnic groups, likely related to the unique background of the Caribbean group.

4.3. Census 2021 (England and Wales)

To further test our approach, we applied it to the Census 2021 dataset [33], which gathers information on individuals and households in England and Wales every decade. These data help plan and finance essential local services. We compared our results with the UK deprivation indices data from 2019 [34], which classify relative deprivation in small areas. We hypothesised that discrepancy values should correlate with the deprivation indices, reflecting discrepancies across energy, health, housing, and socioeconomic sectors. The key difference is that our discrepancy values are data-driven, while deprivation indices are based on human-centred assessments, suggesting that our approach is complementary.
For our experiments, we selected four cross-sectoral questions from the census related to energy (type of central heating), health (general health), housing (occupancy rating for bedrooms), and socioeconomic status (household deprivation). Note that the socioeconomic data in Census 2021 differ from the 2019 deprivation indices due to different definitions and coverage [33,34].
Additionally, for Census 2021, we selected Lower Layer Super Output Areas (LSOAs) [33] as samples instead of individuals, as individual data were not accessible. After cleaning the data and removing unmatched LSOAs, we had 31,810 LSOAs in total. Unmatched LSOAs, which appear only in either Census 2021 or Deprivation 2019, were removed. In the Deprivation 2019 dataset, each LSOA is labelled with a deprivation level from 1 to 10 (1 being the most deprived). As our samples are LSOAs, we quantified discrepancies between different LSOAs. Since the raw data does not include group attributes, we classified LSOAs into five groups based on the percentage of the population from ME groups: [0%, 20%), [20%, 40%), [40%, 60%), [60%, 80%), and [80%, 100%].
The proposed approach quantifies the discrepancies between the defined ME population-related groups based on the selected Census 2021 data. The results are shown in the left part of Table 3. It is evident that the discrepancies between LSOAs increase as differences in ME population percentages increase, indicating significant disparities in living conditions for ME individuals across different LSOAs, particularly in terms of energy, housing, and health aspects. Notably, the 0%-20% group shows the largest AVG discrepancy compared to other groups, suggesting that White individuals in those LSOAs experience significantly different living conditions. Since the 0%-20% group constitutes a large portion of the UK (see Appendix B), this finding suggests potential unequal treatment and possible neglect of other LSOAs. Additionally, the 40%-60% group has the smallest AVG discrepancy value, likely due to its intermediate position among the ME groups, sharing characteristics with the 0%-20% and 20%-40% groups, as well as 60%-80% and 80%-100% groups.
Correlation Analysis Furthermore, based on the deprivation indices from Deprivation 2019 [34] and the defined groups, we calculated the percentages of LSOAs in each deprivation-labelled group across various ME population groups. The results are shown in Appendix B, and we treated each row as a feature vector representing one group of LSOAs. We then iteratively calculated the deprivation discrepancies for each pair of rows, with the results presented in the right part of Table 3. We observed similar patterns (color change) to those in the left part of Table 3, which can verify the reliability of our proposed approach.
To statistically verify our proposed approach, we ran Pearson and Spearman row-wise correlation analyses for Census 2021 discrepancies (the left part of Table 3) and Deprivation 2019 discrepancies (the right part of Table 3). The detailed results are shown in Appendix B. All rows exhibit very strong correlations, implying that our approach can draw conclusions very similar to those of experts. Furthermore, we also flattened both matrices to run a one-time correlation analysis. The Pearson correlation coefficient is 0.9797 with a p-value of 1.4437e-17, and the Spearman correlation coefficient is 0.9872 with a p-value of 7.436e-20. Both p-values are far less than 0.001, indicating a strong correlation.
Additionally, to further assess the performance of LCA, we conducted experiments using k-Means as a replacement for LCA to calculate the discrepancies and performed the same correlation analysis. While the correlation coefficients are still strong, they are lower than those achieved by LCA. This demonstrates the efficiency of LCA and further indirectly supports the usability of the proposed framework.

5. Discrepancy for AI Fairness

As we discussed earlier, our proposed approach can be used to quantify the discrepancies in the distribution of user (sample) profiles across multiple groups. We consider that these discrepancies can negatively impact the fairness of machine learning methods. Intuitively, the training process of machine learning models may struggle to extract patterns equally from two parts of a dataset with significant discrepancies. We expect that the discrepancy can serve as a data exploratory metric to alert AI users to the risk of fairness issues and support fairness analysis in AI.

5.1. Predefined Range Groups (Fixed Intervals)

Now, we will show how discrepancies relate to potential AI bias. We selected logistic regression (LR) to classify deprivation indices for LSOAs using the Census 2021 data from previous experiments. To simplify the classification task, we redefined the deprivation indices as deprived (indices 1-5, labelled as 0) and not deprived (indices 6-10, labelled as 1). We randomly split the dataset into training and validation sets in an 8:2 ratio, and the experiments are repeated for 10 times.
The bias is measured using the False Positive Rate (FPR). In this study, we argue that FPR deserves greater attention, as it quantifies the extent to which deprived areas are incorrectly predicted as not deprived. Such misclassification could potentially exacerbate deprivation. For instance, groups with higher FPR may not receive the deserved attention they need from the government when it comes to resource allocation decisions.
The experimental results indicate that the two groups with the largest discrepancy values exhibit the greatest difference in FPR. As shown in Table 4, the discrepancy between the 0-20% group and the 80-100% group is the largest with the value 0.9301. At the same time, the former group has the smallest FPR of 0, while the latter group has the largest FPR of 0.1225. This study reveals that the machine learning model can treat areas predominantly populated by non-ethnic minorities unfairly. Moreover, significant discrepancies between groups within a dataset warrant attention, as these disparities may lead to differential treatment by machine learning models.

5.2. Equal-Size Groups (Quantile-Based)

We observed that the grouping method employed in the aforementioned experiments presents a data imbalance issue, as the 0-20% group constitutes 82.59% of the LSOA samples. This imbalance may undermine the robustness of our findings, given the significant decrease in the number of samples from the 0-20% group to the 80-100% group. Consequently, the 80-100% group may achieve a 0 FPR with a small number of test samples, despite having a limited amount of training data.
To address this issue, we propose an alternative grouping method. First, we calculate the ME population percentages for all LSOAs, then sort and divide them into five groups, ensuring each group contains an equal number of samples. The ME population percentage ranges for these groups are as follows: 0-0.96%, 0.96-2.33%, 2.33-6.16%, 6.16-17.34%, and 17.34-95.02%. These groups are labeled as 1, 2, 3, 4, and 5, respectively. It is worth noting that the average ME population percentage in the UK is 18%, indicating that only the last group can be considered representative of the ME population.
To further explore the correlations between discrepancy and AI fairness, we conducted experiments using two models: LR and a Multilayer Perceptron (MLP). Additionally, we applied two sampling ratios (90% and 80%) to randomly generate two new datasets. From our perspective, the sampled datasets should preserve the patterns discussed in the previous section, as random sampling does not substantially alter the overall distribution. All the results presented in Table 5 and Table 6 are derived from experiments repeated 10 times.
In Table 5, we observed that the discrepancies between Groups 1, 2, 3, 4, and 5 have increased compared to the values presented in Table 4. We believe this is due to all ME LSOAs being concentrated in group 5 under the new grouping method, which explains this observation. In other words, the ME group and non-ME groups have differences in their user profiles. Meanwhile, the largest discrepancy remains between group 1 (predominantly non-ME) and Group 5 (predominantly ME) across the two sampled datasets (70% and 80%) and the original dataset (100%). In Table 5, “LR" and “MLP" indicate that the discrepancy calculations were performed on independently sampling datasets. The datasets are used to observe the correlations between discrepancy and AI bias.
In Table 6, the first group (predominantly non-ME), approximately equivalent to the 0-20% group in Table 4, still exhibits the largest FPR across both models and all three datasets. Meanwhile, Group 5 continues to have the smallest FPR. Thus, the results align with the findings presented in Section 5.1; the two groups with large discrepancy values may be treated differently by AI.
Additionally, compared to the results shown in Table 4, we observe that the FPR values for Groups 2∼4 increase along with the increase of discrepancy values between Groups 2∼4 and Group 5. Meanwhile, the overall discrepancies within Groups 2∼4 are smaller than the discrepancies between Group 5 and the other groups (Groups 1∼4). This indicates that the features of the data for Groups 2∼4 are relatively similar, and models treat them similarly. As shown in Table 4, Groups 1∼4 received relatively similar FPRs, while the FPR for Group 5 is significantly smaller. In summary, we believe our proposed method effectively quantifies the discrepancies between different groups. Additionally, these values are important for informing AI users and highlight potential risks of AI unfairness.

6. Conclusion and Limitations

In conclusion, the issue of AI fairness is of paramount importance and warrants attention from all stakeholders. In our research, we addressed this challenge by focusing on quantifying the discrepancies present in data, recognising that AI models heavily rely on data for their performance. Our proposed data-driven approach is aligned with the LNOB initiative, as it aids in discovering and addressing discrepancies between user-defined groups, thus contributing to efforts to mitigate inequality. Moreover, we believe that our proposed approach holds promise for applications across a broad spectrum of tasks, offering insights to develop fair AI models. Through testing on three datasets, we have demonstrated the efficacy and informativeness of our approach, yielding satisfactory results. Our proposed approach can be considered as an approximation of bias, as selecting different parameters for LCA may yield slightly varying results, to address this we have done hyperparameter optimisation.
In summary, our research represents a significant step towards promoting fairness in AI and offers an innovative avenue for social science research. By highlighting data-driven approaches and their alignment with broader societal initiatives, we aim to foster a more equitable and inclusive landscape for AI development and deployment.

Appendix A. The Anonymous Project

As part of the Anonymous project, a multilingual online survey, available in 10 languages, was conducted to investigate the experiences of individuals from minority ethnic groups with digitalised housing, health, and energy services. The survey contains a total of 32 questions. The survey data includes 594 responses from England and 284 responses from Scotland. In terms of respondent selection, researcher carefully determined the required number of participants from each ethnic group in England using a proportional allocation method based on their respective population percentages from the 2021 Census (England & Wales). However, due to the unavailability of Scotland’s 2021 census results during the planning phase, the project aimed to limit the number of respondents from each ME group in Scotland to a maximum of 40. The total number of survey respondents was 878. A detailed breakdown of the respondents’ demographic information, including their ethnicities, is provided in Table A1.
Table A1. The number of respondents from each ME group in both England and Scotland
Table A1. The number of respondents from each ME group in both England and Scotland
Anonymous Project’s Target Ethnic Group England Scotland Total
African 176 37 213
Bangladeshi 97 41 138
Indian 93 40 133
Chinese 63 39 102
Pakistani 62 40 102
Caribbean 47 32 79
Mixed or Multiple ethnic groups 56 55 111
Total 594 284 878
We show the distribution of responses from seven ethnic groups in England regarding health and digital services based on Anonymous project’s data. We observed a notable discrepancy in the Chinese group, with 30.16% lacking English proficiency and 26.98% struggling to use the online system, while most other ethnic groups reported no concerns, as shown in Figure A1.
Figure A1. This figure gives an example of the exploratory data analysis (EDA) for the Anonymous project’s England data focuses on Question 21: “Which of the following concerns do you have about communicating with your general practice (GP) through apps, websites, or other online services?" The bars represent the portion of respondents from each ethnic group selecting each option.
Figure A1. This figure gives an example of the exploratory data analysis (EDA) for the Anonymous project’s England data focuses on Question 21: “Which of the following concerns do you have about communicating with your general practice (GP) through apps, websites, or other online services?" The bars represent the portion of respondents from each ethnic group selecting each option.
Preprints 148767 g0a1
Figure A2. The example of hyperparameter optimisation process to seek the elbow point.
Figure A2. The example of hyperparameter optimisation process to seek the elbow point.
Preprints 148767 g0a2
It should be noted that the definitions of ethnicities differ between Scotland’s census and the England & Wales census. For instance, in the England and Wales census, individuals may select “Pakistani" as a sub-category under the broader “Asian or Asian British" category, whereas in Scotland census, this category is listed as “Pakistani, Pakistani Scottish, or Pakistani British." For consistency, this paper adopts the ethnicity naming conventions used in the England and Wales census.
The questions selected to quantify discrepancies in this study include:
  • Question 20: Which of the following concerns do you have about communicating with your GP through apps, websites or other online services?
  • Question 23: Do you have any concerns about using an app, website or other digital service for these housing-related activities?
  • Question 27: Do you have any concerns about using an app, website or digital system to carry out energy-related activities?
Regarding the selection of the number of latent classes, we used grid search for hyperparameter optimisation with 10-fold cross-validation. The search space for all experiments was adjusted based on empirical observations. In the Anonymous project experiments, we set the range from 2 to 10, while for EVENs and Census experiments, it was set from 2 to 30. One example result of hyperparameter optimisation is shown in Figure A2.

Appendix B. Census 2021 (England and Wales)

Table A2. The correlation analysis between Deprivation 2019 and the results obtained from our proposed based on Census 2021.
Table A2. The correlation analysis between Deprivation 2019 and the results obtained from our proposed based on Census 2021.
Pearson Spearman
0-20% 0.9802 1
20-40% 0.9769 1
40-60% 0.9949 0.9
60-80% 0.9829 1
80-100% 0.9830 1
Table A3. The percentages of LSOAs in different ME population groups cross 10 deprivation levels based on the Deprivation 2019 dataset and the total number LSOAs in each group.
Table A3. The percentages of LSOAs in different ME population groups cross 10 deprivation levels based on the Deprivation 2019 dataset and the total number LSOAs in each group.
Preprints 148767 i004
Table A2 provides details of the correlation analysis discussed in Section 4.3. Meanwhile, Table A3 presents the distribution of the deprivation index across 10 predefined LSOA groups.

Appendix C. Experiment Details

It is worth noting that our approach is not time-consuming like deep learning models. The time required, based on the hardware shown in Table A4, ranges from 5 seconds to a maximum of 5 minutes, depending on the dataset volume.
Table A4. The hardware and software details of experiments.
Table A4. The hardware and software details of experiments.
Hardware
CPU 12th Gen Intel(R) Core(TM) i9-12950HX 2.30 GHz
GPU NVIDIA GeForce RTX 3080 Ti Laptop GPU
Memory 1TB
RAM 64.0 GB
OS Windows 11 Pro

References

  1. UNSDG. Universal values principle two: leave no one behind, 2022.
  2. Arciprete, C.; Biggeri, M.; Ciani, F.; et al. Intersecting inequalities: theoretical challenges and implications for research on poverty and social exclusion in Europe 2022.
  3. Lysaght, T.; Lim, H.Y.; Xafis, V.; Ngiam, K.Y. AI-assisted decision-making in healthcare: the application of an ethics framework for big data in health and research. Asian Bioethics Review 2019, 11, 299–314. [Google Scholar] [CrossRef] [PubMed]
  4. Danish, M.S.S.; Senjyu, T. AI-enabled energy policy for a sustainable future. Sustainability 2023, 15, 7643. [Google Scholar] [CrossRef]
  5. Chan, H.; Rice, E.; Vayanos, P.; Tambe, M.; Morton, M. Evidence from the past: AI decision aids to improve housing systems for homeless youth. In Proceedings of the 2017 AAAI Fall Symposium Series; 2017. [Google Scholar]
  6. Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; Galstyan, A. A survey on bias and fairness in machine learning. ACM computing surveys (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
  7. Cirillo, D.; Catuara-Solarz, S.; Morey, C.; Guney, E.; Subirats, L.; Mellino, S.; Gigante, A.; Valencia, A.; Rementeria, M.J.; Chadha, A.S.; et al. Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare. NPJ digital medicine 2020, 3, 81. [Google Scholar] [CrossRef] [PubMed]
  8. Celi, L.A.; Cellini, J.; Charpignon, M.L.; Dee, E.C.; Dernoncourt, F.; Eber, R.; Mitchell, W.G.; Moukheiber, L.; Schirmer, J.; Situ, J.; et al. Sources of bias in artificial intelligence that perpetuate healthcare disparities—A global review. PLOS Digital Health 2022, 1, e0000022. [Google Scholar] [CrossRef] [PubMed]
  9. Byrne, M.D. Reducing bias in healthcare artificial intelligence. Journal of PeriAnesthesia Nursing 2021, 36, 313–316. [Google Scholar] [CrossRef] [PubMed]
  10. Zhang, Y.; Zhou, L. Fairness assessment for artificial intelligence in financial industry. arXiv 2019, arXiv:1912.07211. [Google Scholar]
  11. Fenu, G.; Galici, R.; Marras, M. Experts’ view on challenges and needs for fairness in artificial intelligence for education. In Proceedings of the International Conference on Artificial Intelligence in Education. Springer, 2022, pp. 243–255.
  12. Leslie, D.; Mazumder, A.; Peppin, A.; Wolters, M.K.; Hagerty, A. Does “AI” stand for augmenting inequality in the era of covid-19 healthcare? bmj 2021, 372. [Google Scholar]
  13. Morley, J.; Machado, C.C.; Burr, C.; Cowls, J.; Joshi, I.; Taddeo, M.; Floridi, L. The ethics of AI in health care: a mapping review. Social Science & Medicine 2020, 260, 113172. [Google Scholar]
  14. Ferrara, E. Fairness and bias in artificial intelligence: A brief survey of sources, impacts, and mitigation strategies. Sci 2023, 6, 3. [Google Scholar] [CrossRef]
  15. Data, I.B.M.; Team, A.I. Shedding light on AI bias with real world examples, 2025.
  16. Wu, H.; Wang, M.; Sylolypavan, A.; Wild, S. Quantifying health inequalities induced by data and AI models. arXiv 2022, arXiv:2205.01066. [Google Scholar]
  17. Morin, S.; Legault, R.; Bakk, Z.; Giguère, C.É.; de la Sablonnière, R.; Lacourse, É. StepMix: A Python Package for Pseudo-Likelihood Estimation of Generalized Mixture Models with External Variables. arXiv 2023, arXiv:2304.03853. [Google Scholar]
  18. Collins, L.M.; Lanza, S.T. Latent class and latent transition analysis: With applications in the social, behavioral, and health sciences; Vol. 718, John Wiley & Sons, 2009.
  19. Trocin, C.; Mikalef, P.; Papamitsiou, Z.; Conboy, K. Responsible AI for digital health: a synthesis and a research agenda. Information Systems Frontiers 2023, 25, 2139–2157. [Google Scholar] [CrossRef]
  20. Garattini, C.; Raffle, J.; Aisyah, D.N.; Sartain, F.; Kozlakidis, Z. Big data analytics, infectious diseases and associated ethical impacts. Philosophy & technology 2019, 32, 69–85. [Google Scholar]
  21. So, W.; Lohia, P.; Pimplikar, R.; Hosoi, A.; D’Ignazio, C. Beyond Fairness: Reparative Algorithms to Address Historical Injustices of Housing Discrimination in the US. In Proceedings of the Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 988–1004.
  22. Hagenaars, J.A.; McCutcheon, A.L. Applied latent class analysis; Cambridge University Press, 2002.
  23. Muthén, B.; Muthén, L.K. Integrating person-centered and variable-centered analyses: Growth mixture modeling with latent trajectory classes. Alcoholism: Clinical and experimental research 2000, 24, 882–891. [Google Scholar] [CrossRef] [PubMed]
  24. Vermunt, J.K.; Magidson, J. Latent class analysis. The sage encyclopedia of social sciences research methods 2004, 2, 549–553. [Google Scholar]
  25. Lazarsfeld, P.F. The logical and mathematical foundation of latent structure analysis. Studies in social psychology in world war II Vol. IV: Measurement and prediction 1950, pp. 362–412.
  26. Goodman, L.A. The analysis of systems of qualitative variables when some of the variables are unobservable. Part IA modified latent structure approach. American Journal of Sociology 1974, 79, 1179–1259. [Google Scholar] [CrossRef]
  27. Weller, B.E.; Bowen, N.K.; Faubert, S.J. Latent class analysis: a guide to best practice. Journal of Black Psychology 2020, 46, 287–311. [Google Scholar] [CrossRef]
  28. Martínez-Monteagudo, M.C.; Delgado, B.; Inglés, C.J.; Escortell, R. Cyberbullying and social anxiety: a latent class analysis among Spanish adolescents. International journal of environmental research and public health 2020, 17, 406. [Google Scholar] [CrossRef]
  29. Bardazzi, R.; Charlier, D.; Legendre, B.; Pazienza, M.G. Energy vulnerability in Mediterranean countries: A latent class analysis approach. Energy Economics 2023, 126, 106883. [Google Scholar] [CrossRef]
  30. Sinha, P.; Furfaro, D.; Cummings, M.J.; Abrams, D.; Delucchi, K.; Maddali, M.V.; He, J.; Thompson, A.; Murn, M.; Fountain, J.; et al. Latent class analysis reveals COVID-19–related acute respiratory distress syndrome subgroups with differential responses to corticosteroids. American journal of respiratory and critical care medicine 2021, 204, 1274–1285. [Google Scholar] [CrossRef] [PubMed]
  31. J. Sarpong. (2024) Bame we’re not the same: Chinese. [Online]. Available online: https://www.bbc.com/creativediversity/nuance-in-bame/chinese.
  32. Drummond and M. Pratt, “Updating ethnic and religious contrasts in deaths involving the coronavirus (covid-19), england: 24 january 2020 to 23 november 2022,” Feb 2023. [Online]. Available online: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/updatingethniccontrastsindeathsinvolvingthecoronaviruscovid19englandandwales/24january2020to23november2022.
  33. ONS. (2022) Census 2021 data and analysis from census 2021. Available online: https://www.ons.gov.uk/.
  34. GOV.UK. (2019) National statistics english indices of deprivation 2019. [Online]. Available online: https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019.
1
Project name removed to maintain anonymity
2
Question 21: Which of the following concerns do you have about communicating with your general practice (GP) through apps, websites, or other online services?
3
StepMix (https://stepmix.readthedocs.io/en/latest/index.html) Python repository is used to implement LCA in this research.
4
More dataset details can be found in the Appendix A.
5
Figure 1. The loop of bias placed in the data, algorithm, and user interaction feedback [6].
Figure 1. The loop of bias placed in the data, algorithm, and user interaction feedback [6].
Preprints 148767 g001
Figure 2. The correlations between fairness, bias, inequality, and discrepancy in the context of AI [14,15].
Figure 2. The correlations between fairness, bias, inequality, and discrepancy in the context of AI [14,15].
Preprints 148767 g002
Figure 3. The process of the proposed approach for quantifying cross-sectoral discrepancies within different groups.
Figure 3. The process of the proposed approach for quantifying cross-sectoral discrepancies within different groups.
Preprints 148767 g003
Figure 4. The PCA visualisation of relationships among ME groups based on percentage distributions across latent classes on PRIME dataset (England). The dashed line with an arrow indicates the distance from one dot to its nearest neighbour.
Figure 4. The PCA visualisation of relationships among ME groups based on percentage distributions across latent classes on PRIME dataset (England). The dashed line with an arrow indicates the distance from one dot to its nearest neighbour.
Preprints 148767 g004
Table 1. The matrix of discrepancies between 7 ethnic groups for Anonymous project’s England and Scotland data. The AVG denotes the average discrepancy value for one group.
Table 1. The matrix of discrepancies between 7 ethnic groups for Anonymous project’s England and Scotland data. The AVG denotes the average discrepancy value for one group.
England Scotland
African Bangladeshi Caribbean Chinese Indian Mixed Group Pakistani AVG African Bangladeshi Caribbean Chinese Indian Mixed Group Pakistani AVG
African 0.0000 0.0899 0.0383 0.1810 0.0547 0.0590 0.0517 0.0678 0.0000 0.0666 0.0296 0.1956 0.0342 0.0118 0.0227 0.0515
Bangladeshi 0.0899 0.0000 0.1359 0.3734 0.0200 0.2738 0.0308 0.1320 0.0666 0.0000 0.0324 0.3043 0.0989 0.0563 0.0118 0.0815
Caribbean 0.0383 0.1359 0.0000 0.2456 0.0764 0.1131 0.0951 0.1006 0.0296 0.0324 0.0000 0.3430 0.0191 0.0546 0.0159 0.0706
Chinese 0.1810 0.3734 0.2456 0.0000 0.3700 0.1201 0.2459 0.2194 0.1956 0.3043 0.3430 0.0000 0.3717 0.1334 0.2438 0.2274
Indian 0.0547 0.0200 0.0764 0.3700 0.0000 0.2139 0.0311 0.1094 0.0342 0.0989 0.0191 0.3717 0.0000 0.0821 0.0575 0.0948
Mixed Group 0.0590 0.2738 0.1131 0.1201 0.2139 0.0000 0.1987 0.1398 0.0118 0.0563 0.0546 0.1334 0.0821 0.0000 0.0209 0.0513
Pakistani 0.0517 0.0308 0.0951 0.2459 0.0311 0.1987 0.0000 0.0933 0.0227 0.0118 0.0159 0.2438 0.0575 0.0209 0.0000 0.0532
Table 2. The matrix of discrepancies for EVENS England and Scotland data.
Table 2. The matrix of discrepancies for EVENS England and Scotland data.
England Scotland
African Bangladeshi Caribbean Chinese Indian Pakistani AVG African Bangladeshi Caribbean Chinese Indian Pakistani AVG
African 0.0000 0.0112 0.0014 0.0038 0.0062 0.0018 0.0040 0.0000 0.0246 0.5408 0.0300 0.0637 0.1800 0.1398
Bangladeshi 0.0112 0.0000 0.0102 0.0031 0.0227 0.0090 0.0094 0.0246 0.0000 0.5283 0.0522 0.1539 0.2697 0.1714
Caribbean 0.0014 0.0102 0.0000 0.0047 0.0030 0.0002 0.0032 0.5408 0.5283 0.0000 0.3946 0.4505 0.2506 0.3608
Chinese 0.0038 0.0031 0.0047 0.0000 0.0138 0.0048 0.0050 0.0300 0.0522 0.3946 0.0000 0.0662 0.1036 0.1078
Indian 0.0062 0.0227 0.0030 0.0138 0.0000 0.0040 0.0083 0.0637 0.1539 0.4505 0.0662 0.0000 0.0589 0.1322
Pakistani 0.0018 0.0090 0.0002 0.0048 0.0040 0.0000 0.0033 0.1800 0.2697 0.2506 0.1036 0.0589 0.0000 0.1438
Table 3. The matrices for the discrepancies of Census 2021 and the deprivation discrepancies between LSOAs across user-defined ME population percentage groups based on Deprivation 2019.
Table 3. The matrices for the discrepancies of Census 2021 and the deprivation discrepancies between LSOAs across user-defined ME population percentage groups based on Deprivation 2019.
Census Deprivation
0-20% 20-40% 40-60% 60-80% 80-100% AVG 0-20% 20-40% 40-60% 60-80% 80-100% AVG
0-20% 0.0000 0.4865 0.6783 0.8603 0.9347 0.5920 0.0000 0.2001 0.2896 0.3877 0.5064 0.2768
20-40% 0.4865 0.0000 0.1371 0.3934 0.5565 0.3147 0.2001 0.0000 0.0313 0.1123 0.2203 0.1128
40-60% 0.6783 0.1371 0.0000 0.1173 0.2744 0.2414 0.2896 0.0313 0.0000 0.0314 0.0963 0.0897
60-80% 0.8603 0.3934 0.1173 0.0000 0.0445 0.2831 0.3877 0.1123 0.0314 0.0000 0.0283 0.1119
80-100% 0.9347 0.5565 0.2744 0.0445 0.0000 0.3620 0.5064 0.2203 0.0963 0.0283 0.0000 0.1703
Table 4. The Discrepancies and False Positive Rate.
Table 4. The Discrepancies and False Positive Rate.
Preprints 148767 i001
Table 5. Discrepancy Values for Five Groups with Three Sampling Ratios.
Table 5. Discrepancy Values for Five Groups with Three Sampling Ratios.
Preprints 148767 i002
Table 6. False Positive Rates for Five Groups with Three Sampling Ratios.
Table 6. False Positive Rates for Five Groups with Three Sampling Ratios.
Preprints 148767 i003
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated