Achieving Demographic Parity Across Multiple Artificial Intelligence Applications: A New Approach for Real-Time Bias Mitigation

Victoria Bou

doi:10.20944/preprints202412.0468.v1

Preprint

Article

Achieving Demographic Parity Across Multiple Artificial Intelligence Applications: A New Approach for Real-Time Bias Mitigation

This version is not peer-reviewed.

Victoria Bou^*

This version is not peer-reviewed.

Downloads

Views

Comments

Submitted:

04 December 2024

Posted:

05 December 2024

You are already at the latest version

Abstract

Through quantitative analysis of three datasets, this study examines the efficacy of synthetic data generation in mitigating demographic bias within artificial intelligence (AI) systems across multiple sectors. It evaluates approaches based on Generative Adversarial Networks for creating demographically balanced synthetic data whilst maintaining data fidelity and model performance. The findings demonstrate significant improvements in fairness metrics, with Demographic Parity scores increasing markedly for both race and gender attributes, whilst maintaining comparable accuracy scores between synthetic and original datasets (0.87 versus 0.88). Kullback-Leibler Divergence values below 0.003 for categorical features indicate successful replication of demographic characteristics. The study reveals that synthetic data can effectively address representational imbalances without compromising predictive performance, particularly in bias-sensitive domains such as healthcare and criminal justice. However, implementation requires rigorous validation protocols and human oversight to ensure quality and fairness. The research contributes to the growing body of evidence supporting synthetic data as a viable solution for developing more equitable AI systems, particularly within regulatory frameworks emphasising fairness and privacy. These findings have significant implications for the advancement of AI development, suggesting the integration of human-in-the-loop systems to enhance synthetic data generation quality and fairness.

Keywords:

Artificial intelligence fairness

;

Synthetic data generation

;

Demographic bias mitigation Generative Adversarial Demographic parity Regulatory compliance

Subject:

Business, Economics and Management - Business and Management

Introduction

The use of synthetic data has greatly aided artificial intelligence (AI) model training, particularly in reducing pervasive biases in data-driven applications in recent times. Sociodemographic variations in traditional datasets, which are frequently constrained by real-world issues, induce biases into AI models. Models trained on such data often replicate or exaggerate preexisting biases, resulting in bias or erroneous outcomes in critical sectors where accuracy and equity are critical, such as criminal justice, healthcare, and finance. These biases compromise the reliability and trustworthiness of AI systems. Synthetic data improves demographic inclusivity and provides a novel way to promote impartiality in AI algorithms.

Synthetic data production may now imitate real-world settings while maintaining privacy and regulatory compliance thanks to advancements in generative AI. In contrast to actual data, synthetic datasets can be made to accurately reflect a range of demographic groupings, hence correcting imbalances. According to recent trends, the use of synthetic data is growing quickly. By 2026, it is predicted that almost 75% of organisations would use synthetic data into their AI applications, a significant rise from less than 5% in 2023. This increase highlights the need for privacy-focused solutions as well as the understanding of how synthetic data might lessen biases in real-world datasets.

The ability of synthetic data to add variation to training datasets is one of its main advantages. By directly addressing the biases that are frequently ingrained in current models, synthetic data helps create a more equitable representation across gender and racial lines in domains like facial recognition. However, producing synthetic data is challenging; for it to be useful, it must accurately mimic real-world dynamics. Even though synthetic data is capable of capturing many real-world patterns, it still has trouble understanding the subtleties of social relationships and human behaviour, which may restrict its usefulness in particular applications.

Concerns about ethics and regulations highlight how crucial synthetic data is to the development of AI. Since synthetic data doesn't reveal genuine personal information, it complies with current data privacy laws like the US Executive Order on AI and the EU AI Act. These regulations place a strong emphasis on accountability, transparency, and equity in AI systems, which makes synthetic data a desirable option for businesses that prioritise compliance. However, as synthetic data technology advances, constant attention to detail is essential to prevent inadvertent biases in data creation. Here, human-in-the-loop (HITL) systems—which include human management—are essential. HITL techniques are used by about 80% of businesses that use synthetic data to find and fix biases that might be missed by fully automated systems.

Particularly in sectors where data collecting is costly, time-consuming, or constrained by privacy laws, synthetic data offers advantages. The cost and logistical obstacles that come with gathering real-world data might not be solved by synthetic data. However, there may be moral or legal obstacles to data collection in fields like healthcare and autonomous systems. For instance, by providing data that complies with regulations, synthetic photos help with identity verification and autonomous system training. For AI applications that need to scale while maintaining compliance with data protection regulations like the CCPA and GDPR, this flexibility is crucial.

Despite its benefits, synthetic data comes with limitations that need to be carefully addressed. One major concern is the risk of transferring biases from the original datasets to the synthetic data, especially if the source data lacks demographic diversity. While synthetic data has the potential to enhance demographic inclusivity, biases can still persist if the generative processes are not thorough enough. To ensure that synthetic data supports fair AI model development, it’s essential to implement fidelity checks and bias-monitoring protocols to maintain demographic alignment.

Recent case studies from various field show the growing use of synthetic data. In human resources, for example, synthetic data is used to create more balanced datasets, helping to reduce biases in hiring algorithms and support fairer hiring practices. In healthcare, synthetic patient data is improving diagnostic models, ensuring they perform equitably across diverse demographics. Goyal and Mahmoud (2024) argue that advances in prompt engineering for large language models (LLMs) are expanding the versatility of synthetic data, allowing for the generation of task-specific data in areas like customer service, financial analytics, and other bias-sensitive functions.

Synthetic data enables the formation of varied, Privacy-compliant datasets, handling restrictions connected with real-world data, such as privacy risks, scarcity, and ingrained biases. It enables the advancement of datasets including broad demographic, therefore improving the depictive equity of AI models. Moreover, synthetic data enhances real-world datasets by increasing the scenarios in which AI models operate productively, particularly where real data might be inadequate or constrained.

This research assesses the efficacy of synthetic data in mitigating bias, analysing its impact on model performance, and identifying best practices for generating and integrating synthetic data.

Techniques for generating Synthetic Data

Synthetic data generation plays a critical role in addressing data limitations and biases in AI. Key techniques in this domain include Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and diffusion models, each presenting unique benefits and difficulties. GANs, which comprise of a generator and discriminator working collaboratively to manufacture and assess synthetic data, generate highly realistic results. However, they face issues like training instability and mode collapse, leading to a lack of output variety. On the other hand, VAEs employ encoding and decoding within latent spaces, offering controlled variability and more stable training processes. VAEs are less complex than GANs. Recently, diffusion models have attracted increasing attention. These models refine random noise into detailed, high-resolution data but demand substantial computational resources. The diffusion models are particularly valuable for applications requiring high data fidelity and demographic diversity, ensuring accuracy and inclusivity.

These techniques hold promise for mitigating bias, though each comes with its own limitations. GANs, while producing high-quality outputs, may fail to ensure sufficient diversity in fields sensitive to demographic representation, such as healthcare and finance. VAEs, stable and efficient, might sacrifice fidelity, restricting their applicability to complex data distributions. On the other hand, diffusion models, known for their detailed outputs, may not be suitable for industries that require high resolution and demographic representation. However, their substantial computational requirements restrict wider accessibility.

The rise of large language models (LLMs) has significantly advanced synthetic data generation, particularly in applications involving text and multimodal data. Through prompt engineering, researchers can guide LLMs to create synthetic datasets that address demographic imbalances, improving representational accuracy. Quick engineering increases the flexibility of synthetic data applications, enabling the generation of precise and diverse text that meets specific demographic needs. This advancement is crucial for bridging demographic gaps and enhancing AI model inclusivity.

Nonetheless, ethical considerations remain a concern. Synthetic data methods, especially those powered by large language models, may risk perpetuating biases present in the training data. This underscores the need for cautious oversight to avoid reinforcing societal biases, even as generative models enhance data quality and diversity. These advancements emphasise both the opportunities and difficulties of using synthetic data to create fairer AI systems, illustrating the complexities of promoting inclusivity in artificial intelligence.

Synthetic Data in Bias Mitigation

Synthetic data is a powerful tool for alleviating biases in AI models, especially in addressing demographic imbalances frequently present in real-world datasets. The analysts can create datasets that promote fairer AI outcomes by generating synthetic data that represents underrepresented groups. In the case of facial recognition. Synthetic data fosters balanced demographic representation by including diverse racial, gender, and age groups, helping to reduce the biases commonly found in traditional data. Additionally, the studies show training models on synthetic datasets with varied facial characteristics improves model accuracy across demographics, contributing to more equitable outcomes for marginalised groups.

In healthcare, synthetic data equally decreases bias in diagnostic outcomes as synthetic patient data with a even distribution of age, gender, and ethnicity substantially enhances diagnostic accuracy, particularly for non-Caucasian patients. This method illustrates the possibility of synthetic data in addressing healthcare disparities, promoting patient equity, and fostering trust in AI-driven healthcare applications. Furthermore, synthetic data has been utilised in customer service to train chatbots on different linguistic and cultural interactions. This integration improves chatbot responses across diverse user groups, reducing biases that previously resulted in inaccuracies for foreign speakers. These applications highlight the versatility of synthetic data in fostering fairness across various areas.

Numerous key metrics assess the effectiveness of synthetic data in reducing bias. Demographic parity, examines whether model expectations are equally allocated across demographic groups, which is especially important in areas like hiring and criminal justice, where equity is crucial. Another important metric, equality of opportunity, ensures that individuals from different demographic groups who meet certain standards are treated equally. These metrics help refine synthetic datasets, improving model fairness and reliability, and providing a structured framework for evaluating demographic equity.

In spite of these advantages, challenges remain in balancing fairness metrics, particularly in complex applications where multiple biases intersect. Achieving both demographic parity and equality of opportunity can be contradictory, necessitating a holistic approach that integrates synthetic data generation with rigorous metric assessments to continuously refine models. While synthetic data plays a crucial role in advancing fairness, fully unbiased AI systems require a multi-faceted strategy, combining data augmentation and algorithmic debiasing with synthetic data to promote inclusive and reliable AI systems.

Quality and Validation of Synthetic Data

Evaluating the quality and effectiveness of synthetic data is essential for ensuring reliable AI performance and fair outcomes. Key quality metrics—fidelity, accuracy, and representational diversity—are fundamental for reviewing how well synthetic data matches real-world datasets. Fidelity, which measures the similarity between synthetic and real data, is crucial in areas such as autonomous driving and medical diagnostics, where realistic data is necessary for accurate training. Fidelity is usually assessed by comparing statistical properties like means and variances between synthetic and real datasets. Accuracy, which reflects the precision of synthetic data, is equally important in predictive applications, as it indicates how closely models trained on synthetic data perform compared to those trained on actual data. Representational diversity, which guarantees synthetic datasets encompass a broad demographic range, addresses underrepresentation and promotes more inclusive AI decisions.

Multiple empirical and statistical techniques are employed to authenticate synthetic data. Statistical analysis, such as the Kolmogorov-Smirnov and Chi-square tests, are frequently used to assess the consistency between feature distributions in synthetic and real data. Real-world comparisons, where AI models trained on both synthetic and real data are evaluated using a shared test set, offer practical insights into the usefulness of synthetic data. This approach, specifically relevant when synthetic data serves as a substitute or complement to real data, demonstrates its effectiveness in real-world scenarios. Furthermore, domain-specific evaluations strengthen the validation process, as subject-matter experts assess the accuracy of synthetic data within particular fields. In healthcare, for instance, medical professionals may examine synthetic patient data to ensure it reflects clinically relevant patterns, thereby elevating its suitability for diagnostic purposes.

Difficulties persist in generating high-fidelity synthetic data that accurately reflects real-world complexities. Although generative models are advanced, they frequently struggle to capture the complex, high-dimensional patterns found in fields like medical imaging and geospatial analysis. These limitations can impede model performance in practical applications, where even small discrepancies in data fidelity can affect outcomes. Additionally, synthetic data can reproduce biases present in training data, thus perpetuating demographic or contextual biases, especially when derived from biased datasets.

To tackle these challenges, studies researchers are increasingly advocating for hybrid approaches that combine synthetic and real data, aiming to leverage the diversity of synthetic data while maintaining the accuracy of real-world data. This strategy underscores the necessity for robust validation methods and ongoing enhancements in quality metrics as synthetic data becomes a crucial component of AI.

Ethical and Regulatory Considerations

Synthetic data has gained spotlight as a privacy-preserving solution that aligns with regulatory guidelines such as GDPR, CCPA, and the EU AI Act. These structures impose stringent controls on data collection, storage, and processing to safeguard individual privacy. For example, GDPR requires strict handling of personally identifiable information (PII), mandating data minimisation or anonymisation. Synthetic data complies with these standards by offering a non-identifiable alternative that minimises PII exposure and reduces the risk of data breaches. Similarly, the CCPA emphasises consumer rights over personal data, reinforcing synthetic data's role in generating functional datasets without compromising privacy. The EU AI Act further strengthens these privacy standards by establishing structures for the use of synthetic data in sensitive applications, promoting data privacy while supporting ethical AI development.

In spite of its privacy benefits, synthetic data introduces ethical challenges, especially regarding bias transfer. If the original datasets lack diversity, clarify data may acquire or even exacerbate existing biases, creating risks in fields like criminal justice and healthcare, where biased data could harm marginalised groups. Without rigorous oversight, synthetic data could perpetuate societal biases, sabotaging fairness in AI applications. Even synthetic data created for representational fairness may inadvertently reinforce existing biases if generated from flawed source data, highlighting the need for strong regulation to ensure ethical data techniques.

Human-in-the-loop (HITL) systems have become a key approach to mitigate these risks, allowing experts to monitor and address biases in synthetic data pipelines. The HITL systems provide expert oversight at multiple stages, incorporating human judgment to identify biases that automated systems might miss. These frameworks involve domain experts assessing demographic representation and fairness, adjusting generation parameters to correct any biases detected. However, HITL systems are resource-demanding and encounter difficulties in establishing universal fairness standards. Striking a balance between automation and HITL oversight will be crucial for promoting ethical and responsible synthetic data practices in bias-sensitive AI applications.

Comparing Synthetic Data with Real and Augmented Data

Analysing synthetic, real, and augmented data unveils distinct advantages and drawbacks, particularly in bias-sensitive AI applications. Real data, sourced from actual events, provides unparalleled authenticity and reflects complex social patterns. Nevertheless, it is frequently constrained by privacy regulations, such as GDPR, and demographic imbalances that can perpetuate biases in AI models. Augmented data, which is generated by applying modifications like rotation or noise to existing datasets, enhances the instability of real data but remains reliant on the differences of the source data, thus preserving any inherent biases.

On the other hand, synthetic data, created algorithmically, allows for the incorporation of diverse demographic features without compromising privacy. Synthetic data is particularly beneficial in facial recognition, where balanced demographic representation is crucial for fairness. Research has demonstrated that synthetic data can enhance model fairness and precision by addressing the underrepresentation of certain groups, as seen in areas like healthcare and facial recognition. Nevertheless, synthetic data may lack the fine-grained details present in real data, which could affect tasks dependent on subtle contextual cues, such as automated driving, where real or augmented data might better capture essential environmental nuances.

Hybrid approaches, which combine synthetic, augmented, and real data, are increasingly recognised as effective strategies for creating balanced datasets. By integrating the authenticity of real data, the variability of augmented data, and the demographic balance of synthetic data, hybrid datasets offer a thorough solution for mitigating bias. This approach is especially beneficial in applications such as customer service, where real engagements provide context and synthetic data fosters cultural inclusivity. Yet, managing these different data sources presents complexities and necessitates advanced tools for data integration and quality verification. As hybrid datasets become increasingly common, creating standardised structures for combining and verifying these data types will be crucial to ensuring fairness and dependability in bias-sensitive AI applications, signalling a broader shift in AI advancement regarding ethical data practices and tactics for handling bias.

Methodology

This study employs a quantitative approach to assess synthetic data's role in reducing AI model bias. It is structured around three objectives: evaluating data generation techniques, analysing bias mitigation, and assessing synthetic data quality. Three datasets—UCI Adult, COMPAS Recidivism, and MIMIC-III Clinical—were selected for their demographic attributes, with fairness, fidelity, and performance metrics guiding analysis.

For synthetic data generation (Objective 1), the UCI Adult Dataset was used with Generative Adversarial Networks (GANs) to replicate demographic features. In GANs, a generator GGG and discriminator DDD optimise data realism through a minimax game. Kullback-Leibler Divergence (KL Divergence) measures distributional similarity, while the Inception Score (IS) evaluates diversity. For bias mitigation (Objective 2), the COMPAS Dataset was used, emphasizing demographic balance in race and gender for recidivism predictions. Logistic regression

was trained on synthetic data, with Demographic Parity:

P (A = 1) = P (A = 0)

and Equality of Opportunity:

P (Y = 1, A = 1) = P (Y = 1, A = 0)

as fairness metrics.

Lastly, the MIMIC-III Clinical Database tested fidelity and diversity (Objective 3). Fidelity was assessed with the Kolmogorov-Smirnov (KS) test:

D = ∣ F_{r e a l} (x) - F_{s y n} (x) ∣

Meanwhile, representational diversity across demographic variables confirmed balanced representation. Model predictive power was compared using Accuracy and AUC-ROC where high AUC values indicated strong performance and demographic fair ess across predictions.

Results

(1) Evaluation of Synthetic Data Generation Techniques

To assess the effectiveness of GAN-generated synthetic data in replicating the demographic characteristics of the UCI Adult Dataset, the Kolmogorov-Smirnov (KS) test was applied to continuous data (age) to measure distributional similarity, and KL Divergence was calculated for categorical features (gender, race, education, and income level) to evaluate alignment with the original dataset. Additionally, the Inception Score (IS) assessed the diversity within the synthetic samples, ensuring a broad representation of demographic characteristics. For categorical features (Gender, Race, Education, and Income Level), KL Divergence values are consistently low. Gender achieved a KL Divergence of 0.000, indicating an almost perfect match with the original data, while Race, Education, and Income Level have divergence values of 0.0004, 0.0028, and 0.0023, respectively. These low values demonstrate minimal divergence between synthetic and original distributions, underscoring the GAN's capability to replicate demographic balance across features.

(2) Analysis of Bias Mitigation through Synthetic Data

To evaluates the effectiveness of synthetic data in mitigating bias within models trained on the COMPAS Recidivism Dataset. Key fairness metrics, including Demographic Parity and Equality of Opportunity, were calculated for both race and gender to determine the extent to which synthetic data reduces bias. Model performance metrics (Accuracy and AUC-ROC) are also assessed to ensure predictive power is maintained. Similarly, Equality of Opportunity scores show improvement when synthetic data is used. For race, the Equality of Opportunity score increases from 0.65 in the original data to 0.83 in the synthetic dataset, while for gender, the score rises from 0.66 to 0.84. These gains in fairness metrics demonstrate that synthetic data contributes positively to ensuring equitable outcomes across demographic groups, aligning with the goal of bias mitigation.

(3) Assessment of Quality and Fairness of Synthetic Data

To assess the quality and fairness of synthetic data generated from the MIMIC-III Clinical Database, key metrics were evaluated, including fidelity (measured by KS Test values across age, gender, and ethnicity), accuracy, and representational diversity, which examines the demographic balance within the synthetic data. The Accuracy scores are comparable for both datasets, with the original dataset scoring 0.88 and the synthetic dataset achieving 0.87. This minimal difference suggests that the synthetic data effectively preserves predictive performance, ensuring that quality is not compromised in the generation process. Representational Diversity scores across age, gender, and ethnicity are close in value for both datasets. For example, gender diversity is 0.78 in the original data and 0.77 in the synthetic data, indicating that the synthetic data maintains demographic balance effectively, crucial for achieving fair outcomes in healthcare applications.

Discussion

The results of this study underscore the potential of synthetic data as a strategic tool to address biases in AI model training, aligning with a growing body of literature that advocates for synthetic data’s role in creating fairer and more inclusive AI systems. In the evaluation of synthetic data generation techniques which is the major focus of objectives one, the findings showed that the GAN-generated synthetic data maintained demographic alignment with the original UCI Adult Dataset, as evidenced by the Kolmogorov-Smirnov (KS) test and low KL Divergence values across key demographic features. The high Inception Score across all features further demonstrated diversity within the synthetic samples, essential for ensuring representational inclusivity in model training. These findings resonate with previous studies, which highlight GANs’ effectiveness in replicating real-world distributions and maintaining demographic representation within synthetic datasets.

Moreover, in the analysis of bias mitigation through synthetic data, the focus of objectives two, synthetic datasets trained on the COMPAS Recidivism Dataset showed considerable improvements in fairness metrics, including Demographic Parity and Equality of Opportunity, without compromising model performance. Demographic Parity scores increased significantly for both race and gender, and similar trends were observed in Equality of Opportunity, which is critical in minimizing outcome disparities across groups. These results align with existing research that suggests synthetic data can support fairer outcomes by achieving demographic parity across sensitive attributes. Notably, accuracy and AUC-ROC scores were nearly identical for models trained on synthetic versus original datasets, demonstrating that the fairness improvements did not come at the expense of predictive performance, which concurs with prior studies that emphasise synthetic data’s capability to retain model accuracy.

Further analysis in the assessment of quality and fairness of synthetic data, the main focus of objectives three validated the synthetic data’s fidelity, representational diversity, and accuracy when compared to the original MIMIC-III Clinical Database. KS Test values for age, gender, and ethnicity were notably low, with marginal differences from the original dataset, indicating robust fidelity—a result consistent with findings from prior research that posits synthetic data’s ability to mirror complex real-world distributions accurately. The minimal difference in accuracy between synthetic and original datasets (0.87 vs. 0.88) reinforces the efficacy of synthetic data in healthcare applications where predictive precision is paramount, aligning with existing literature that advocates for synthetic data’s potential to maintain model performance in critical domains such as healthcare and autonomous systems.

Notably, the study’s results align with ethical and regulatory frameworks that prioritise fairness and accountability in AI systems. The data preservation of demographic diversity aligns with principles outlined in GDPR, CCPA, and the EU AI Act, which mandate ethical and equitable AI practices. Synthetic data’s compliance with these regulations is further validated by the ability to maintain balanced representation across demographic groups without infringing on privacy—a priority in sensitive fields like criminal justice and healthcare, where privacy risks and demographic biases can hinder fairness. The alignment with regulatory frameworks in this study reflects a broader trend noted in previous research, advocating for synthetic data as a viable alternative to real-world data in bias-sensitive domains.

These findings support the broader argument that synthetic data can be an effective solution to demographic imbalances, provided that its generation and validation processes are conducted with rigor. Recent studies highlight the need for human-in-the-loop (HITL) systems to oversee data generation and mitigate residual biases, a point reinforced by the consistency observed across demographic features in the current study. Human oversight remains a critical component, particularly in applications where demographic nuances are essential for accurate and fair predictions. The synthetic data’s success in balancing fairness metrics and fidelity without significantly impacting model accuracy reinforces the utility of synthetic data as an equitable solution in AI training..

Conclusions

This study demonstrates the efficacy of synthetic data in mitigating bias within AI model training whilst maintaining performance standards. The empirical analysis across three datasets—UCI Adult, COMPAS Recidivism, and MIMIC-III Clinical—reveals that GAN-generated synthetic data successfully replicates demographic characteristics whilst improving fairness metrics. Notably, Demographic Parity and Equality of Opportunity scores showed marked improvement for both race and gender attributes, with minimal impact on model accuracy. The findings indicate that synthetic data can effectively address demographic imbalances without compromising predictive performance, achieving KL Divergence values below 0.003 for categorical features and comparable accuracy scores (0.87 versus 0.88) between synthetic and original datasets.

These results hold significant implications for AI development, particularly in bias-sensitive domains such as healthcare and criminal justice. However, the study acknowledges that successful implementation requires rigorous validation protocols and human oversight. Future research should explore the integration of human-in-the-loop systems to enhance the quality and fairness of synthetic data generation. The findings support synthetic data as a viable solution for developing more equitable AI systems, particularly within regulatory frameworks emphasising fairness and privacy.

References

Ferrara, E. (2023). Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies. Sci, 6(1), 3. [CrossRef]
Offenhuber, D. (2024). Shapes and frictions of synthetic data. Big Data & Society, 11(2). [CrossRef]
Miletic, M., & Sariyar, M. (2024). Challenges of Using Synthetic Data Generation Methods for Tabular Microdata. Applied Sciences, 14(14), 5975. [CrossRef]
Pezoulas, V. C., Zaridis, D. I., Mylona, E., Androutsos, C., Apostolidis, K., Tachos, N. S., & Fotiadis, D. I. (2024). Synthetic data generation methods in healthcare: A review on open-source tools and methods. Computational and Structural Biotechnology Journal, 23, 2892-2910. [CrossRef]
Outeda, C. C. (2024). The EU's AI act: A framework for collaborative governance. Internet of Things, 27, 101291-101291. [CrossRef]
Gong, Y., Liu, M., & Wang, X. (2023). IndusSynthe: Synthetic data using human-machine intelligence hybrid for enhanced industrial surface defect detection through self-updating with multi-view filtering. Advanced Engineering Informatics, 59, 102253. [CrossRef]
Van Giffen, B., Hershausen, D., & Fahse, T. (2022). Overcoming the pitfalls and perils of algorithms: A classification of machine learning biases and mitigation methods. Journal of Business Research, 144(1), 93-106. [CrossRef]
Ciucu, R., Adochiei, I. R., Argatu, F. C., Nicolescu, S. T., Petroiu, G., & Adochiei, F.-C. (2024). Enhancing Super-Resolution Microscopy Through a Synergistic Approach with Generative Machine Learning Models. IFMBE Proceedings, 110, 313-323. [CrossRef]
Jacobsen, B. N. (2023). Machine learning and the politics of synthetic data. Big Data & Society, 10(1), 205395172211453. [CrossRef]
Pagano, T. P., Loureiro, R. B., Lisboa, F. V. N., Peixoto, R. M., Guimarães, G. A. S., Cruz, G. O. R., Araujo, M. M., Santos, L. L., Cruz, M. A. S., Oliveira, E. L. S., Winkler, I., & Nascimento, E. G. S. (2023). Bias and Unfairness in Machine Learning Models: A Systematic Review on Datasets, Tools, Fairness Metrics, and Identification and Mitigation Methods. Big Data and Cognitive Computing, 7(1), 15. [CrossRef]
Saxena, N. C. (2023). Using Machine Learning to improve the performance of Public Enterprises. Public Enterprise, 27, 39-51.
Saxena, N. C. (2021). Yogic Science for Human Resource Management in Public Enterprises. Public Enterprises, 25, 27-38.
Saxena, N. C. (2022). Profitability prediction in Public Enterprise contracts. Public Enterprise, 26, 25-42.
Asthana, A. N. (2023). Prosocial behavior of MBA students: The role of yoga and mindfulness. Journal of Education for Business, 98(7), 378-386.
Limantė, A. (2023). Bias in Facial Recognition Technologies Used by Law Enforcement: Understanding the Causes and Searching for a Way Out. Nordic Journal of Human Rights, 42(2), 1-20. [CrossRef]
Ueda, D., Kakinuma, T., Fujita, S., Kamagata, K., Fushimi, Y., Ito, R., Matsui, Y., Nozaki, T., Nakaura, T., Fujima, N., Tatsugami, F., Yanagawa, M., Hirata, K., Yamada, A., Tsuboyama, T., Kawamura, M., Fujioka, T., & Naganawa, S. (2023). Fairness of Artificial Intelligence in healthcare: Review and Recommendations. Japanese Journal of Radiology, 42(1). [CrossRef]
Adigwe, C. S., Olaniyi, O. O., Olabanji, S. O., Okunleye, O. J., Mayeke, N. R., & Ajayi, S. A. (2024). Forecasting the Future: The Interplay of Artificial Intelligence, Innovation, and Competitiveness and its Effect on the Global Economy. Asian Journal of Economics, Business and Accounting, 24(4), 126-146. [CrossRef]
Asthana, A. N. (2014). Thirty years after the cataclysm: toxic risk management in the chemical industry. Journal of Toxicological Sciences, 6(1), 01-08.
Min, A. (2023). Artifical Intelligence and Bias: Challenges, Implications, and Remedies. Journal of Social Research, 2(11), 3808-3817. [CrossRef]
Seyyed-Kalantari, L., Zhang, H., McDermott, M. B. A., Chen, I. Y., & Ghassemi, M. (2021). Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nature Medicine, 27(12), 2176-2182. [CrossRef]
Bekbolatova, M., Mayer, J., Ong, C. W., & Toma, M. (2024). Transformative Potential of AI in Healthcare: Definitions, Applications, and Navigating the Ethical Landscape and Public Perspectives. Healthcare, 12(2), 125-125. [CrossRef]
XAsthana, A. (1998). Dorn, James A., Steve H. Hanke and Alan A. Walters (eds.)(1998). The Revolution in Development Economics. Kyklos, 51(4), 589-590.
Johnson, G. M. (2024). Varieties of Bias. Philosophy Compass, 19(7). [CrossRef]
Paik, K. E., Hicklen, R. S., Kaggwa, F., Puyat, C. V., Nakayama, L. F., Ong, B. A., Shropshire, J. N., & Villanueva, C. (2023). Digital Determinants of Health: Health data amplifies existing health disparities—A scoping review. PLOS Digital Health, 2(10), e0000313-e0000313. [CrossRef]
Aldoseri, A., Khalifa, K. N. A. -, & Hamouda, A. M. (2023). Re-Thinking Data Strategy and Integration for Artificial Intelligence: Concepts, Opportunities, and Challenges. Applied Sciences, 13(12), 7082-7082. [CrossRef]
Asthana, A. N. (2022). Enhancing social intelligence of public enterprise executives through yogic practices. Public Enterprise, 26, 25-40.
Morley, J., Kinsey, L., Elhalal, A., Garcia, F., Ziosi, M., & Floridi, L. (2021). Operationalising AI ethics: barriers, Enablers and next Steps. AI & Society, 38. [CrossRef]
Alao, A. I., Adebiyi, O. O., & Olaniyi, O. O. (2024). The Interconnectedness of Earnings Management, Corporate Governance Failures, and Global Economic Stability: A Critical Examination of the Impact of Earnings Manipulation on Financial Crises and Investor Trust in Global Markets. Asian Journal of Economics Business and Accounting, 24(11), 47-73. [CrossRef]
Arigbabu, A. S., Olaniyi, O. O., & Adeola, A. (2024). Exploring Primary School Pupils' Career Aspirations in Ibadan, Nigeria: A Qualitative Approach. Journal of Education, Society and Behavioural Science, 37(3), 1-16. [CrossRef]
Sulastri, R., Janssen, M., van de Poel, I., & Ding, A. (2024). Transforming towards inclusion-by-design: Information system design principles shaping data-driven financial inclusiveness. Government Information Quarterly, 41(4), 101979. [CrossRef]
Asthana, A. N. (2023). Role of Mindfulness and Emotional Intelligence in Business Ethics Education. Journal of Business Ethics Education, 20, 5-17.
Megahed, M., & Mohammed, A. (2023). A comprehensive review of generative adversarial networks: Fundamentals, applications, and challenges. WIREs Computational Statistics, 16(1). [CrossRef]
Bao, J., Li, L., & Davis, A. (2022). Variational Autoencoder or Generative Adversarial Networks? A Comparison of Two Deep Learning Methods for Flow and Transport Data Assimilation. Mathematical Geosciences, 54. [CrossRef]
Akkem, Y., Biswas, S. K., & Varanasi, A. (2024). A comprehensive review of synthetic data generation in smart farming by using variational autoencoder and generative adversarial network. Engineering Applications of Artificial Intelligence, 131, 107881. [CrossRef]
Paladugu, P., Ong, J., Nelson, N. G., Kamran, S. A., Waisberg, E., Zaman, N., Kumar, R., Dias, R. D., Lee, A. G., & Tavakkoli, A. (2023). Generative Adversarial Networks in Medicine: Important Considerations for this Emerging Innovation in Artificial Intelligence. Annals of Biomedical Engineering, 51. [CrossRef]
Adel Remadi, A., El Hage, K., Hobeika, Y., & Bugiotti, F. (2024). To prompt or not to prompt: Navigating the use of large language models for integrating and modeling heterogeneous data. Data & Knowledge Engineering, 152, 102313-102313. [CrossRef]
Al-kfairy, M., Mustafa, D., Kshetri, N., Insiew, M., & Alfandi, O. (2024). Ethical Challenges and Solutions of Generative AI: An Interdisciplinary Perspective. Informatics, 11(3), 58-58. [CrossRef]
Giuffrè, M., & Shung, D. L. (2023). Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. Npj Digital Medicine, 6(1), 1-8. [CrossRef]
Murray, A., Francks, L., Hassanein, Z. M., Lee, R., & Wilson, E. (2023). Breast cancer surgical decision-making. Experiences of Non-Caucasian women globally. A qualitative systematic review. European Journal of Surgical Oncology, 49(12), 107109-107109. [CrossRef]
Izadi, S., & Forouzanfar, M. (2024). Error Correction and Adaptation in Conversational AI: A Review of Techniques and Applications in Chatbots. AI, 5(2), 803-841. [CrossRef]
XAsthana, A. (1998). Fisher, Ronald C.(ed.)(1997). Intergovernmental Fiscal Relations, 1997. Kyklos, 51(4), 595-596.
Abràmoff, M. D., Tarver, M. E., Loyo-Berrios, N., Trujillo, S., Char, D., Obermeyer, Z., Eydelman, M. B., & Maisel, W. H. (2023). Considerations for addressing bias in artificial intelligence for health equity. Npj Digital Medicine, 6(1), 1-7. [CrossRef]
Meiser, M., & Zinnikus, I. (2024). A Survey on the Use of Synthetic Data for Enhancing Key Aspects of Trustworthy AI in the Energy Domain: Challenges and Opportunities. Energies, 17(9), 1992. [CrossRef]
Guardieiro, V., Raimundo, M. M., & Poco, J. (2023). Enforcing fairness using ensemble of diverse Pareto-optimal models. Data Mining and Knowledge Discovery, 37. [CrossRef]
Olabanji, S. O., Marquis, Y. A., Adigwe, C. S., Abidemi, A. S., Oladoyinbo, T. O., & Olaniyi, O. O. (2024). AI-Driven Cloud Security: Examining the Impact of User Behavior Analysis on Threat Detection. Asian Journal of Research in Computer Science, 17(3), 57-74. [CrossRef]
XAsthana, A. N. (1999). Lemmen, J. and Elgar, E. (eds.)(1999). Integrating financial markets in the European Union. Kyklos, 52(3), 465-467.
Yoon, J., Mizrahi, M., Ghalaty, N. F., Jarvinen, T., Ravi, A. S., Brune, P., Kong, F., Anderson, D., Lee, G., Meir, A., Bandukwala, F., Kanal, E., Arık, S. Ö., & Pfister, T. (2023). EHR-Safe: generating high-fidelity and privacy-preserving synthetic electronic health records. Npj Digital Medicine, 6(1), 1-11. [CrossRef]
Oladoyinbo, T. O., Olabanji, S. O., Olaniyi, O. O., Adebiyi, O. O., Okunleye, O. J., & Alao, A. I. (2024). Exploring the Challenges of Artificial Intelligence in Data Integrity and its Influence on Social Dynamics. Asian Journal of Advanced Research and Reports, 18(2), 1-23. [CrossRef]
Jiang, D., Chang, J., You, L., Bian, S., Kosk, R., & Maguire, G. (2024). Audio-Driven Facial Animation with Deep Learning: A Survey. Information, 15(11), 675-675. [CrossRef]
XAsthana, A. (2013). Decentralisation and supply efficiency: Evidence from a natural experiment. International Development Planning Review, 35(1), 67-86.
Olaniyi, O. O. (2024). Ballots and Padlocks: Building Digital Trust and Security in Democracy through Information Governance Strategies and Blockchain Technologies. Asian Journal of Research in Computer Science, 17(5), 172-189. [CrossRef]
Mennella, C., Maniscalco, U., De Pietro, G., & Esposito, M. (2023). Generating a novel synthetic dataset for rehabilitation exercises using pose-guided conditioned diffusion models: A quantitative and qualitative evaluation. Computers in Biology and Medicine, 167, 107665-107665. [CrossRef]
Olaniyi, O. O., Ezeugwa, F. A., Okatta, C. G., Arigbabu, A. S., & Joeaneke, P. C. (2024). Dynamics of the Digital Workforce: Assessing the Interplay and Impact of AI, Automation, and Employment Policies. Archives of Current Research International, 24(5), 124-139. [CrossRef]
Asthana, A. (2000). Social mechanisms, Peter Hedström...(eds.): Cambridge[u. a], Cambridge Univ. Press. Kyklos, 53(1), 88-89.
Murtaza, H., Ahmed, M., Khan, N. F., Murtaza, G., Zafar, S., & Bano, A. (2023). Synthetic data generation: State of the art in health care domain. Computer Science Review, 48, 100546. [CrossRef]
Olaniyi, O. O., Olaoye, O. O., & Okunleye, O. J. (2023). Effects of Information Governance (IG) on Profitability in the Nigerian Banking Sector. Asian Journal of Economics, Business and Accounting, 23(18), 22-35. [CrossRef]
Olaniyi, O. O., Ugonnia, J. C., Olaniyi, F. G., Arigbabu, A. T., & Adigwe, C. S. (2024). Digital Collaborative Tools, Strategic Communication, and Social Capital: Unveiling the Impact of Digital Transformation on Organizational Dynamics. Asian Journal of Research in Computer Science, 17(5), 140–156. [CrossRef]
XAsthana, A. N. (2011). The business of water: fresh perspectives and future challenges. African Journal of Business Management, 5(35), 13398-13403.
Zhang, Q., & Wang, T. (2024). Deep Learning for Exploring Landslides with Remote Sensing and Geo-Environmental Data: Frameworks, Progress, Challenges, and Opportunities. Remote Sensing, 16(8), 1344. [CrossRef]
Samuel-Okon, A. D., Akinola, O. I., Olaniyi, O. O., Olateju, O. O., & Ajayi, S. A. (2024). Assessing the Effectiveness of Network Security Tools in Mitigating the Impact of Deepfakes AI on Public Trust in Media. Archives of Current Research International, 24(6), 355–375. [CrossRef]
Samuel-Okon, A. D., Olateju, O. O., Okon, S. U., Olaniyi, O. O., & Igwenagu, U. T. I. (2024). Formulating Global Policies and Strategies for Combating Criminal Use and Abuse of Artificial Intelligence. Archives of Current Research International, 24(5), 612–629. [CrossRef]
XAsthana, A. N. (2011). Entrepreneurship and Human Rights: Evidence from a natural experiment. African Journal of Business Management, 5(3), 9905-9911.
ElBaih, M. (2023). The Role of Privacy Regulations in AI Development (A Discussion of the Ways in Which Privacy Regulations Can Shape the Development of AI). Social Science Research Network. [CrossRef]
Singh, S. S. (2023). Using Natural Experiments in Public Enterprise Management. Public Enterprise, 27, 52-63.
Singh, S. S. (2022). Mergers and Acquisitions: Implications for public enterprises in developing countries. Public Enterprise, 26, 43-52.
Asthana, A. N. (2023) Determinants of Cultural Intelligence of Operations Management Educators. The Seybold Report, 18(6), 789-800.
Pina, E., Ramos, J., Jorge, H., Váz, P., Silva, J., Wanzeller, C., Abbasi, M., & Martins, P. (2024). Data Privacy and Ethical Considerations in Database Management. Journal of Cybersecurity and Privacy, 4(3), 494–517. [CrossRef]
Olateju, O. O., Okon, S. U., Igwenagu, U. T. I., Salami, A. A., Oladoyinbo, T. O., & Olaniyi, O. O. (2024). Combating the Challenges of False Positives in AI-Driven Anomaly Detection Systems and Enhancing Data Security in the Cloud. Asian Journal of Research in Computer Science, 17(6), 264–292. [CrossRef]
Asthana, A. (2000). Soltan, Karol, Eric M. Uslaner und Virginia Haufler (eds.)(1998). Institutions and Social Order. Kyklos, 53(1), 105.
Díaz-Rodríguez, N., Del Ser, J., Coeckelbergh, M., López de Prado, M., Herrera-Viedma, E., & Herrera, F. (2023). Connecting the dots in trustworthy Artificial Intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation. Information Fusion, 99(101896), 101896. https://www.sciencedirect.com/science/article/pii/S1566253523002129.
Breugel, B. van, Liu, T., Oglic, D., & Mihaela , V. der S. (2024). Synthetic data in biomedicine via generative artificial intelligence. Nature Reviews Bioengineering. [CrossRef]
Trabelsi, Z., Alnajjar, F., Parambil, M. M. A., Gochoo, M., & Ali, L. (2023). Real-Time Attention Monitoring System for Classroom: A Deep Learning Approach for Student’s Behavior Recognition. Big Data and Cognitive Computing, 7(1), 48. [CrossRef]
XAsthana, A. N. (1999). Ahmad, Ehtisham (ed.)(1997). Financing Decentralized Expenditures. An International Comparison of Grants. Kyklos, 52(1), 103-104.
Alomar, K., Aysel, H. I., & Cai, X. (2023). Data Augmentation in Classification and Segmentation: A Survey and New Strategies. Journal of Imaging, 9(2), 46. [CrossRef]
Joseph, S. A., Kolade, T. M., Val, O. O., Adebiyi, O. O., Ogungbemi, O. S., & Olaniyi, O. O. (2024). AI-Powered Information Governance: Balancing Automation and Human Oversight for Optimal Organization Productivity. Asian Journal of Research in Computer Science, 17(10), 110–131. [CrossRef]
XAsthana, A. N. (1998). Pines, David, Efraim Sadka and Itzhak Zilcha (eds.)(1998). Topics in Public Economics. Kyklos, 52(1), 122-123.
Selesi-Aina, O., Obot, N. E., Olisa, A. O., Gbadebo, M. O., Olateju, O. O., & Olaniyi, O. O. (2024). The Future of Work: A Human-centric Approach to AI, Robotics, and Cloud Computing. Journal of Engineering Research and Reports, 26(11), 62–87. [CrossRef]
Jiang, Y., García-Durán, A., Losada, I. B., Girard, P., & Terranova, N. (2024). Generative models for synthetic data generation: application to pharmacokinetic/pharmacodynamic data. Journal of Pharmacokinetics and Pharmacodynamics. [CrossRef]
Asthana, A. N. (2010). Descentralización y necesidades básicas. Politai, 1(1), 13-22.
Arigbabu, A. T., Olaniyi, O. O., Adigwe, C. S., Adebiyi, O. O., & Ajayi, S. A. (2024). Data Governance in AI - Enabled Healthcare Systems: A Case of the Project Nightingale. Asian Journal of Research in Computer Science, 17(5), 85–107. [CrossRef]
Arokun, E. (2024). Complexities of AI Trends: Threats to Data Privacy Legal Compliance. SSRN. [CrossRef]
Asthana, A. N. (2022). Contribution of Yoga to Business Ethics Education. Journal of Business Ethics Education, 19, 93-108.
Asthana, A. N. (2023). Reskilling business executives in transition economies: can yoga help? International Journal of Business and Emerging Markets, 15(3), 267-287. [CrossRef]
Bou, V. C. M. P (2023). Reskilling Public Enterprise executives in Eastern Europe. Public Enterprise, 27, 1-25.
Bou, V. C. M. P. (2022). Measuring Energy efficiency in public enterprise: The case of Agribusiness. Public Enterprise, 26, 53-59.
Asthana, A. N. (2015). Sustainable Fisheries Business in Latin America: Linking in to Global Value Chain. World Journal of Fish and Marine Sciences, 7(3), 175-184.
Asthana, A. N., & Tavželj, D. (2022). International Business Education Through an Intergovernmental Organisation. Journal of International Business Education, 17, 247-266.
Gonzales, C. (2023). Privatisation of water: New perspectives and future challenges. Public Enterprise, 27, 26-38.
Melzi, P., Tolosana, R., Vera-Rodriguez, R., Kim, M., Rathgeb, C., Liu, X., DeAndres-Tame, I., Morales, A., Fierrez, J., Ortega-Garcia, J., Zhao, W., Zhu, X., Yan, Z., Zhang, X.-Y., Wu, J., Lei, Z., Tripathi, S., Kothari, M., Zama, M. H., & Deb, D. (2024). FRCSyn-onGoing: Benchmarking and comprehensive evaluation of real and synthetic data to improve face recognition systems. Information Fusion, 107, 102322–102322. [CrossRef]
Smith, M. C. (2023). Enhancing food security through Public Enterprise. Public Enterprise, 27, 64-77.
Wu, S., Kurugol, S., & Tsai, A. (2024). Improving the radiographic image analysis of the classic metaphyseal lesion via conditional diffusion models. Medical Image Analysis, 97, 103284. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Downloads

Views

Comments

Subscription

Notify me about updates to this article or when a peer-reviewed version is published.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

Achieving Demographic Parity Across Multiple Artificial Intelligence Applications: A New Approach for Real-Time Bias Mitigation

Abstract

Keywords:

Subject:

Introduction

Techniques for generating Synthetic Data

Synthetic Data in Bias Mitigation

Quality and Validation of Synthetic Data

Ethical and Regulatory Considerations

Comparing Synthetic Data with Real and Augmented Data

Methodology

Results

Discussion

Conclusions

References

MDPI Initiatives

Important Links

Subscribe