4.1. The Anonymous project
Within the
Anonymous project
4, a systematic online survey was conducted to explore the experiences of individuals from ME groups in the UK, focusing on digitalised health, energy, and housing aspects. To examine cross-sectoral intersecting discrepancies among the seven ethnic groups (as shown in
Table 1), we selected three questions from the
Anonymoussurvey data related to these sectors. The answers from ME participants formed the dataset for this experiment. For England and Scotland, we have 594 and 284 samples, respectively. Due to varying sample sizes across ethnicities, we calculated discrepancies separately for each region.
Selecting the number of latent classes in LCA requires presetting. To address this, we conducted hyperparameter optimisation for each experiment to find the elbow point, applying this optimisation to all subsequent experiments.
The discrepancy results for England are presented in the left part of
Table 1. The table shows that the Chinese group has the largest average (AVG) discrepancy value compared to other groups. Meanwhile, the Indian group exhibits the smallest discrepancy with the Bangladeshi group and also shows similarity to the Pakistani group. This is likely due to their close geographical locations and similar cultural backgrounds and lifestyles.
The right part of
Table 1 presents the results for Scotland, showing similar outcomes: the Chinese group is distinct from others, while the Bangladeshi group is similar to the Pakistani group. The differences between England and Scotland may be attributed to their different policies and circumstances. We hypothesise that the primary reason for the Chinese group standing out is a lack of English proficiency. This is supported by our preliminary research mentioned in
Section 1, which shows a significant number of Chinese participants expressing this concern. Additionally, BBC News [
29] reports that the Chinese community experiences some of the highest rates of racism among all ethnic groups in the UK. Our discrepancy values may help explain this, as the Chinese group shows different experiences in digitalised online services.
4.2. EVENS
The Centre on the Dynamics of Ethnicity (CoDE), funded by the Economic and Social Research Council (ESRC), conducted “The COVID Race Inequalities Programme”. As part of this project, CoDE carried out the Evidence for Equality National Survey (EVENS)
5, which documents the lives of ethnic and religious minorities in Britain during the coronavirus pandemic. The EVENS dataset comprises 14,215 data points and categorizes participants into 18 different ethnic minority groups. To facilitate a comparative analysis with the
Anonymousproject’s experiment, we focused on the same seven ethnic groups from
Anonymousproject, resulting in a filtered dataset of 4,348 participants from England and 253 participants from Scotland. We excluded entries with missing values to ensure the robustness of our analysis.
It is important to note that the EVENS dataset uses a different definition for mixed or multiple ethnic groups compared to Anonymousproject. To avoid inconsistencies that could affect the final analysis, we excluded mixed or multiple ethnic groups from the EVENS calculations. Due to the different questions in the EVENS and Anonymousproject surveys, we selected three types of cross-sectoral questions from EVENS for analysis: housing, experiences of harassment, and financial situation. These questions were chosen to provide a broad view of participants’ living conditions, social experiences, and economic status during the pandemic.
We applied LCA separately to the data from England and Scotland to uncover patterns and discrepancies within and between these regions. This approach allows us to identify distinct subgroups within the ethnic communities based on their responses to the selected questions, providing deeper insights into the intersectional and cross-sectoral experiences of these groups.
Seeing the discrepancies in the EVENS England data, shown in the left part of
Table 2, it is clear that all discrepancy values are relatively small compared to the
Anonymousproject results. Notably, there are large discrepancies between the Indian and Bangladeshi groups, and between the Indian and Chinese groups. This differs slightly from the conclusions of the
Anonymousproject’s experiment, likely due to the different types of cross-sectoral questions selected. In the AVG column, the Bangladeshi group has the highest values, indicating they experienced COVID-19 differently. The Census 2021 reported that COVID-19 mortality rates were highest for the Bangladeshi group, for both males and females [
30], supporting our findings. The right part of
Table 2 describes the discrepancies for the EVENS Scotland data. There are significant discrepancies between the Caribbean and other ethnic groups, likely related to the unique background of the Caribbean group.
4.3. Census 2021 (England and Wales)
To further test our approach, we applied it to the Census 2021 dataset [
31], which gathers information on individuals and households in England and Wales every decade. These data help plan and finance essential local services. We compared our results with the UK deprivation indices data from 2019 [
32], which classify relative deprivation in small areas. We hypothesised that discrepancy values should correlate with the deprivation indices, reflecting discrepancies across energy, health, housing, and socioeconomic sectors. The key difference is that our discrepancy values are data-driven, while deprivation indices are based on human-centred assessments, suggesting that our approach is complementary.
For our experiments, we selected four cross-sectoral questions from the census related to energy (type of central heating), health (general health), housing (occupancy rating for bedrooms), and socioeconomic status (household deprivation). Note that the socioeconomic data in Census 2021 differ from the 2019 deprivation indices due to different definitions and coverage [
31,
32].
Additionally, for Census 2021, we selected Lower Layer Super Output Areas (LSOAs) [
31] as samples instead of individuals, as individual data were not accessible. After cleaning the data and removing unmatched LSOAs, we had 31,810 LSOAs in total. Unmatched LSOAs, which appear only in either Census 2021 or Deprivation 2019, were removed. In the Deprivation 2019 dataset, each LSOA is labelled with a deprivation level from 1 to 10 (1 being the most deprived). As our samples are LSOAs, we quantified discrepancies between different LSOAs. Since the raw data does not include group attributes, we classified LSOAs into five groups based on the percentage of the population from ME groups: [0%, 20%), [20%, 40%), [40%, 60%), [60%, 80%), and [80%, 100%].
The proposed approach quantifies the discrepancies between the defined ME population-related groups based on the selected Census 2021 data. The results are shown in the left part of
Table 3. It is evident that the discrepancies between LSOAs increase as differences in ME population percentages increase, indicating significant disparities in living conditions for ME individuals across different LSOAs, particularly in terms of energy, housing, and health aspects. Notably, the 0%-20% group shows the largest AVG discrepancy compared to other groups, suggesting that White individuals in those LSOAs experience significantly different living conditions. Since the 0%-20% group constitutes a large portion of the UK (see
Appendix B), this finding suggests potential unequal treatment and possible neglect of other LSOAs. Additionally, the 40%-60% group has the smallest AVG discrepancy value, likely due to its intermediate position among the ME groups, sharing characteristics with the 0%-20% and 20%-40% groups, as well as 60%-80% and 80%-100% groups.
Correlation Analysis Furthermore, based on the deprivation indices from Deprivation 2019 [
32] and the defined groups, we calculated the percentages of LSOAs in each deprivation-labelled group across various ME population groups. The results are shown in
Appendix B, and we treated each row as a feature vector representing one group of LSOAs. We then iteratively calculated the deprivation discrepancies for each pair of rows, with the results presented in the right part of
Table 3. We observed similar patterns (color change) to those in the left part of
Table 3, which can verify the reliability of our proposed approach.
To statistically verify our proposed approach, we ran Pearson and Spearman row-wise correlation analyses for Census 2021 discrepancies (the left part of
Table 3) and Deprivation 2019 discrepancies (the right part of
Table 3). The detailed results are shown in
Appendix B. All rows exhibit very strong correlations, implying that our approach can draw conclusions very similar to those of experts. Furthermore, we also flattened both matrices to run a one-time correlation analysis. The Pearson correlation coefficient is 0.9797 with a
p-value of 1.4437e-17, and the Spearman correlation coefficient is 0.9872 with a
p-value of 7.436e-20. Both
p-values are far less than 0.001, indicating a strong correlation.
Discrepancy for AI fairness Now, we will show how discrepancies relate to potential AI bias. We selected logistic regression to classify deprivation indices for LSOAs using the Census 2021 data from previous experiments. To simplify the classification task, we redefined the deprivation indices as deprived (indices 1-5, labelled as 0) and not deprived (indices 6-10, labelled as 1). We split the dataset into training and validation sets in an 8:2 ratio. The prediction accuracy on the Census 2021 dataset, with an overall accuracy of 90.35% and a standard deviation (STD) of 4.20 across five group results, is shown in the Census column of
Table 4. Meanwhile, the accuracy for each group is displayed on the left in
Figure 4. We noticed that the accuracy varies across different groups, with the 0-20% group showing 100% accuracy. In the context of LNOB and AI fairness, we consider this biased and problematic. Additionally, in this research, the STD is considered an important indicator of fairness; smaller STD values imply that the model treats each group more equally. In the following paragraph, we will discuss how the discrepancy relates to AI fairness based on two undersampled datasets.
For the Census 2021 data, we noticed two data imbalance issues likely affecting accuracy. Firstly, the 0-20% group constitutes 82.59% of the LSOA samples and has the largest AVG discrepancy and relatively low accuracy, potentially indicating distinct features compared to other groups. Generally, in machine learning, imbalanced data can negatively impact the majority, leading to issues like overfitting. To address this, we used random undersampling [
33], resampling all classes except the minority 80-100% group, which had 125 samples. After undersampling, all groups had an equal number of samples, and we split them into training and test sets with a ratio of 8:2. This method aimed to observe changes in discrepancies, accuracy, and STD to gain insights into bias. The results are shown in the middle of
Figure 4 and in the Undersampled Census (ME) column of
Table 4.
We found that all AVG discrepancies increased along with the STD (from 4.2 to 4.83), and the prediction accuracy decreased for three groups. This may indicate that our approach efficiently detects data discrepancies that can exacerbate bias issues. Notably, the 40-60% group showed the most significant increase in discrepancy, accompanied by a dramatic decrease in accuracy. We believe that this phenomenon is due not only to the increase in discrepancies but also to the reduced data size, which may not provide enough data for effective model training. When samples from different groups are far apart, the model may require more data to train effectively; otherwise, it may focus on some subareas of the feature space. It is worth noting that while the overall accuracy increases, the accuracy of most groups decreases due to a statistical artifact related to changes in test data size. In this research, our primary focus is on the STD and discrepancies.
Furthermore, another data imbalance issue may be related to deprivation labels. Specifically, in the original Census 2021 data, all groups except the 0-20% group have more data labelled as 0 than 1. Additionally, the ratio of data labelled as 0 to 1 increases with the percentage of the ME population. For example, in the 40-60% group, the number of samples for 0 and 1 labels are 1,014 and 215, respectively, and 537 and 54 for the 60-80% group. We believe this may contribute to bias issues. Therefore, we conducted undersampling on the original Census 2021 data, targeting the labels. After undersampling, the dataset had an equal number of samples for 0 and 1 labels. Then, we split the dataset into training and test sets following the same rule. The results are shown on the right of
Figure 4 and in the Undersampled Census (deprivation) column of
Table 4.
Overall, we found that all AVG discrepancies decreased, indicating that samples from different groups became closer to each other. This suggests that the model is likely to treat them more similarly in the feature space. On the right of
Figure 4, we observed that accuracy across different groups became more similar compared to the other two figures (left and middle), supported by the STD dramatically decreasing to 1.82. In the context of LNOB, the model is now less biased. Meanwhile, the overall accuracy is better than the original one. In general, this set of experiments demonstrated that data discrepancies obtained from our proposed approach can potentially indicate the degree to which the model treats each group equally.