Applying the agent-based modeling paradigm to epidemics requires two major components: the population and the disease model. Epidemiological Agent-Based Modeling relies critically on the definition of a population that is usually synthetic. The former entails statistical summaries or distributions of various population characteristics, often represented in the form of statistical tables, such as the count of individuals within specific age groups or income brackets. On the other hand, disaggregated data consists of individual-level records for persons or households, encompassing diverse attributes like age, income, gender, and more.
The first major challenge of an epidemiological ABM is the generation of the synthetic population and all its characteristics and properties. In particular, a synthetic population requires the definition of the agents and their properties that define the populations, their spatial placement, and finally their relations. In fact, this is usually the order in which the complete synthetic population is generated. Traditionally, population synthesis can be achieved through three approaches [
6]: 1) Synthetic Reconstruction (SR) (e.g., [
7]), 2) Combinatorial Optimization (e.g., [
8]) (CO) and 3) Statistical Learning (SL) (e.g., [
9,
10]). Recently, there has been a very small number of studies that adopt machine learning (ML) approaches (e.g., [
11]) that fall under the umbrella of SL but support efficiently the generation of synthetic populations with a high number of characteristics/traits, which is certainly not the focus of this paper. In principle, both SR and SL are based on samples of the population although SR methods have been devised that are sample-free. Sample-free methods try to rebuild disaggregated population data from the aggregated real population data while sample-based methods try to generate the entire population by replicating the available disaggregated data, which is a sample of the real population. Indeed, in [
12] a comparison is made between a sample-free method and a similar sample-based method leading to the conclusion that the sample-free method is superior regarding the population fitting although it requires more preprocessing. In [
6] a comparison is given between these approaches for generating a two-layered population (individuals and households) while at the same time they propose a decision-making procedure as to the best approach available based on the characteristics of the data that describe the real population.
Initially, the synthetic agents and their properties need to be defined, resembling the target population with respect to various statistical measures (e.g., age distribution among agents). The key objective is to reduce the disparity between the synthetic population and the actual population concerning these statistical measures.
2.1. Related Work
Before discussing ABM-based systems for virus propagation that contain a population synthesis module, we briefly discuss open-source approaches (we do not consider approaches that have not published their code) for population synthesis in general. Creation of synthetic populations (or ecosystems to be more precise) start all the way from 1996 with the Transportation Analysis Simulation System (TRANSIMS) [
15], which simulates traffic patterns on synthetic individuals. In a rather outdated survey on population synthesizers [
16], the goal was set to create a general software solution. Indeed, in 2017 a general framework for generating Synthetic Populations and Ecosystems of the World (SPEW) was implemented [
17]. SPEW supports various sampling methods for constructing the synthetic population and their location within a geographic region. Given that appropriate data exists, SPEW can create a synthetic ecosystem for different agents. They also claim to have generated a synthetic population for over 70 countries worldwide (among which is Greece) based on the data taken from Integrated Public Use Microdata Series International (IPUMS-I) [
18]. However, the web domain is deactivated and this synthesized data cannot be found anywhere.
Understanding the current state of medical knowledge about COVID-19 and considering demographic factors are crucial when developing strategies to mitigate its spread. Simulation models can aid policymakers in making informed decisions by taking into account the prevailing situation. Agent-based modeling (ABM), which incorporates human behavior and interactions, has proven particularly useful in studying the spread of COVID-19. A comprehensive literature search yielded several papers that focus on agent-based modeling of COVID-19 transmission pathways (a representative small subset is shown in
Table 1 and discussed further below). These models vary in their objectives, the number of simulated individuals (agents), the geographical areas they represent, and their approaches to modeling transmission patterns, illness states, human behavior, and treatments. However, a gap exists between policymakers’ requirements and the capabilities of simulation models in accurately reflecting real-world factors that influence human decision-making and transmission dynamics [
19].
[
20] reports on the results of agent-based modeling of the COVID-19 outbreak in
Australia using a fine-grained computational simulation. This model has been calibrated to meet important COVID-19 transmission parameters. The age-dependent fraction of symptomatic instances is a key calibration outcome, with this fraction for children being one-fifth of that for adults. The model is used to compare a variety of intervention techniques, including international flight restrictions, case isolation, home quarantine, social separation with varied levels of compliance, and school closures. School closures do not appear to provide significant benefits unless they are accompanied with a high level of social distancing compliance.
In [
21], a stochastic agent-based microsimulation model of the COVID-19 epidemic in
France is presented. The model also assesses the possible effects of post-quarantine interventions, such as social isolation, mask usage, and population shielding of those most susceptible to severe COVID-19 infection on the disease’s overall incidence and death, as well as on the utilization of ICU beds. The model was accurately calibrated, and changes in model parameter values had no effect on the estimations of the results. The authors concluded that even while quarantine is efficient in stopping viral transmission, no matter how long it lasts, it is unlikely to stop the epidemic from spreading again.
In [
22], an agent-based model for towns in
Ireland was created. A data-driven agent-based model to mimic the development of an airborne infectious illness in an Irish town using publicly accessible data was built. By recreating a measles outbreak that happened in Schull, Ireland in 2012, the model was put to the test. The same outbreak in 33 different towns was replicated, and then the relationships between the model’s output and the attributes of each town (such as population, area, vaccination rates, and age distribution) were examined, to see if these attributes have an impact on the model’s output.
In [
23] a multilayer network with an extended SIR disease model taking into account homes, transport, workplaces, schools, religious activities, and random encounters was considered for the COVID-19 virus in
Brazil. Due to efficiency issues, they considered only a population of
agents that matches census data of Brazil and then scaled up their results. By explicitly calculating the demand for hospital ICUs in the case where the schools and universities are closed, social isolation of individuals over sixty is imposed and home quarantine is proposed in a voluntary basis, they show that the brazilian health system cannot cope with the demand.
EnerPol is another agent-based simulation framework used in disease spread in [
24] for influenza and COVID-19 epidemics in
Switzerland. They offer a stochastic model for daily activities as well as a mobility model and they also take into account mesoscopically the weather. All this information is extracted from publicly available data. The daily activities model, generate the contacts when agents are in the same place (workplace, school, etc.) while the mobility model allows for taking into account public/private transportation. For the mobility model, the road network and the train network as well as any public transportation method have been taken into account (e.g., buses or airports). For approximately 9 million agents over a 3-month period with sub-hourly time steps, a single scenario requires a few hours to run on a GPU.
A less sophisticated model called REINA (2020) (Realistic Epidemic Interaction Network Agent) [
25] maintains 1.6 million agents in the region of Helsinki University Hospital,
Finland. The implementation is open-source and one instance runs within a few seconds on their online platform. The agents are individuals with certain properties while the epidemic model is basically an SEIR with additional states related to hospitalization or ICU. The contact network is rather simple: each day an agent according to some age-dependent distribution has certain random contacts. Thus, the contacts are basically random (homogenous population mixing) and there is no consideration to model real social networks (e.g., working environments).
FRED (Framework for Reconstructing Epidemic Dynamics) is tailored for epidemic diseases in the
USA. FRED is an open-source agent-based modeling system that is free to use and closely based on models used in earlier studies of the pandemic flu. FRED makes use of open-access census-based synthetic populations that accurately reflect the demographic and geographic diversity of the population, as well as social networks in the workplace, in homes, and in schools. Every state and county in the United States as well as a few other countries presently have access to FRED epidemic models [
26].
In [
27], the authors use an SEIR model for case importation and an individual-based model (IBM) for modeling the spread of pandemic influenza in
Italy. The impact of various control strategies were assessed. Travel destinations that matched the information from the 2001 census for the 57 million Italians were used. Several
values (
,
, and 2) to assess the effect of control methods (vaccination, antiviral prophylaxis, international air travel restrictions, and increased social distancing) were used.
In [
28], an age-structured agent-based model of the
Canadian population to simulate how public health actions at present and anticipated levels may affect the spread of SARS-CoV-2 was created. Case identification and isolation, contact tracing and quarantine, physical seclusion, and community closures were among the interventions that were tested separately and in combination.
In [
29] they simulate the spread of COVID-19 Omicron by using an innovative three-dimensional agent-based model that takes into account
Hong Kong’s vertically extended hyperdense urban environment. The model evaluated the efficacy of the "zero-COVID" initiatives, such as citywide lockdown and mandatory universal testing (CUT), that were under discussion during the Omicron wave in Hong Kong. It was discovered that even quicker and tougher execution was required for such rigorous interventions to be successful. They conclude that adaptable long-term methods for controlling and preventing future epidemics should also be taken into consideration.
In [
30], the COVID-19 spread among the 11.2 million residents of Shenzhen City,
China, using a spatially explicit agent-based model was replicated. It was achieved by integrating huge mobile phone tracking records, census data, and building features. The model was used to determine the likelihood of a COVID-19 comeback if sporadic cases appeared in a city that had been entirely restored after it had been validated by local epidemiological evidence. At different degrees of public compliance, combined scenarios of three crucial non-pharmaceutical treatments (contact tracking, mask-wearing, and quick testing) were evaluated.
In [
31], an agent-based model is presented that replicates the spatiotemporal patterns of COVID-19 epidemic. The effects of various COVID-19 outbreak control tactics, including office closures, social exclusion, and closing of schools and educational facilities, in Urmia City,
Iran are examined. All control methods used in Urmia city together with the accompanying actions of each control strategy were incorporated into the ABM. The transmission of COVID-19 between human agents was replicated using the SEIRD propagation model.
In [
32], an agent-based model named INFEKTA is proposed for modeling the spread of infectious diseases subject to social distance regulations. INFEKTA combines demographic data (population density, age, and different types of people) from geographical regions of the actual town or city under investigation with the transmission dynamics of a particular disease (according to parameters discovered in the literature). Agents (virtual people) can roam through a complicated network of accessible venues defined on a Euclidean space that represents a town or city in accordance with their mobility patterns and the imposed social separation policy. With one million virtual people, INFEKTA simulates the COVID-19 transmission dynamics in Bogotá, the capital of
Colombia, under various social exclusion policies. Based on the sensitivity study of the effects of social distance policies, they concluded that the implementation of "medium" strength social distance policies (i.e., closure of
of the sites) significantly reduces the spread of the disease.
In [
33], COVID-19 propagation modeling results for several mitigation and confinement scenarios are presented for the Madrid,
Spain metropolitan region. Utilizing EpiGraph, an epidemic model that has been enhanced to replicate COVID-19 spread, these scenarios were put into practice and tested. In order to create a social interaction model that accurately reflects a variety of individual and group traits as well as their unique linkages, EpiGraph analyzes connection patterns in social networks. Along with the epidemiological and social interaction components, a transportation model is used to simulate how individuals move across short and large distances. These characteristics provide EpiGraph the ability to replicate the COVID-19 development and identify the medium-term consequences of the virus when using mitigation techniques, in addition to the ability to model scenarios with millions of people and apply various contention and mitigation mechanisms. In the Madrid metropolitan region, EpiGraph achieves closely linked infection and death curves associated with the first wave, attaining comparable seroprevalence levels. The authors demonstrate the reduction of the mortality toll when a selective lockdown policy for elderly (over 60) is imposed. Additionally, the impact of mask-wearing after the initial wave was considered, demonstrating that a key element in limiting the spread of the virus is the proportion of people who wear masks as instructed.
In [
34], a method for calculating the level of immunization in the Austrian population and a discussion about possible repercussions on the effects of herd immunity were discussed. A calibrated agent-based simulation model that accurately simulates the COVID-19 epidemic in
Austria is used to determine vaccination rates. The number of vaccinated individuals may be determined from the generated synthetic individual-based statistics. Then, by altering the acquired degree of vaccination in simulations of an imagined uncontrolled epidemic wave, the pandemic’s course to show potential implications on the effective reproduction rate was extrapolated.
In [
35], an iterative process based on data from land use, questionnaire surveys, and population censuses was used to create a synthetic population for
American Samoa. The population serves as the foundation for an agent-based model created primarily to close knowledge gaps regarding the transmission and eradication of lymphatic filariasis while also being easily adaptable to mimic other infectious diseases. The statistically realistic population and household structure, as well as the high-resolution geographic placements of households, were characteristics of the synthetic population. From 2010 to 2050, the population was simulated over a 40-year period. The projected and estimated populations from the U.S. Census Bureau were contrasted with those from the simulation. The findings suggested that contrary to the huge number of emigrants that were seen, the total population would continue to decline. The study indicated that the population was ageing, consistent with the estimates from the Bureau and the two latest population censuses. The examination of sex ratios across various age groups indicated a rise in the percentage of males in both the 0–14 and 15–64 age brackets. Concerning household size, the simulation consistently followed a Gaussian distribution, with an average size close to
. Interestingly, this average size was slightly lower than the initial average size of
.
In [
36], a population-based prospective study on mixing patterns in eight European countries using a conventional paper-diary method was conducted. It was found that across many European countries, mixing patterns and contact features were remarkably similar. Strongly assortative age-related interaction patterns were seen, with young adults and schoolchildren in particular being more likely to associate with people of similar ages. Preliminary modeling predicts that during this measurement’s initial epidemic phase, when the population is most vulnerable, children aged 5 to 19 will have the highest prevalence of a new virus disseminated by social contacts.
In [
37], the researchers focus on developing a high-resolution, data-driven agent-based model to analyze the spread of COVID-19 in five Spanish cities: Barcelona, Valencia, Seville, Zaragoza, and Murcia. Utilizing synthetic populations based on multiple data sources, the model incorporates detailed interaction environments through multilayer networks, considering home, nursing homes, school, work, university, and community layers. The research aims to simulate and assess the impact of various non-pharmaceutical interventions on COVID-19 transmission. The work addresses the need for quantitatine approaches to characterize intervention impacts, which can vary based on cultural, regional, and population-specific circumstances. By presenting a detailed framework, the study offers a tool for simulating different intervention scenarios, contributing to evidence-based decision-making in managing the pandemic. The model’s effectiveness is demonstrated through a case study, illustrating the impact of key interventions in the studied cities.
In [
38], the limitations of existing models at capturing COVID-19’s impact on human mobility at a neighborhood level are addressed. Employing an agent-based model (VIABLE), the study simulates individual mobility choices based on social activities in neighborhoods, focusing on Porto Alegre,
Brazil. The model considers agents’ adaptation to exposure risks and their impact on well-being, revealing temporal variations and segregation in mobility patterns among agents with different vulnerability levels. It highlights the shift in mobility choices during the pandemic, influenced by socio-demographic factors like age, car ownership, and economic status. While previous studies explored mobility tendencies at larger scales, this model aims to bridge the gap, providing insights into individual-level adaptations and neighborhood-specific mobility patterns under COVID-19, offering a nuanced perspective for urban planning and public health interventions.
In [
39], the spread of a viral infection was modeled using agents representing citizens of the Moscow Oblast,
Russia. In [
40], an agent-based model framework was created to predict the
Liberian Ebola epidemic in 2014–2015 and then used for Ebola forecasting. GSAM [
41] is a global scale (billion agents) ABM Java framework. Its efficiency is critically based on the spatial homogeneity of the population at a specific granularity level. The GSAM is an agent-based epidemic modeling high-performance distributed platform that can simulate a disease outbreak in a population of several billion agents.
There are numerous other relevant research works. In [
42], the applications of three simulation approaches (System Dynamics Model - SDM, Agent-Based Model - ABM, and Discrete Event Simulation - DES) and their hybrids in COVID-19 research are systematically reviewed. Out of 372 eligible papers, 72 focused on COVID-19 transmission dynamics, 204 evaluated interventions, 29 predicted the pandemic, and 67 investigated the impacts of COVID-19. ABM was the most widely used simulation method (275 papers), followed by SDM (54 papers), DES (32 papers), and hybrid models (11 papers). The primary focus was on evaluating and designing intervention scenarios, accounting for
of the papers.
Table 1.
Indicative Agent-Based Models comparison per country
Table 1.
Indicative Agent-Based Models comparison per country
Country |
Population Creation |
Number of Agents |
Model type |
Infection Model |
Year |
Reference |
Australia |
census, national data sources |
0.5m |
several mixing groups |
SEIR |
2020 |
[20] |
France |
previous work, papers |
0.5m extrapolated to 67m |
stochastic microsimulation ABM |
not defined |
2020 |
[21] |
Ireland |
census mainly |
0.1m |
NetLogo User Community Models |
SEIR |
2018 |
[22] |
Brazil |
census |
10m |
multi-layer network |
SIRD |
2020 |
[23] |
Switzerland |
synthetic population from census |
9m |
ABM and a stochastic model that simulates, on a sub-hourly timescale, the different daily activities of all individuals |
not defined |
2020 |
[24] |
Finland |
census statistics |
1.6m |
random interactions |
SEIR |
2020 |
[25] |
USA |
synthetic population from census |
30m |
mixing patterns |
SEIRS |
2013 |
[26] |
Italy |
census |
57m |
multi-layer network |
SEIR |
2008 |
[27] |
Canada |
projections |
not defined |
multi-layer network |
SEIR |
2020 |
[28] |
Hong Kong |
synthetic population from census |
0.73m |
three-dimensional vertically expanded |
not defined |
2022 |
[29] |
Shenzen, China |
mobile phone records, census |
11.2m |
spatially explicit ABM |
SLIR |
2021 |
[30] |
Urmia, Iran |
census and spatial data |
0.75m |
mobile & static agents |
SEIRD |
2020 |
[31] |
Bogotá, Colombia |
synthetic population from census |
9m |
random network |
SEIRMD |
2021 |
[32] |
Madrid, Spain |
census and social network data |
5m |
multi-layer network |
SEIR |
2021 |
[33] |
Austria |
census |
9m |
multi-layer network |
not defined |
2022 |
[34] |
Moscow oblast, Russia |
census |
10m |
multi-layer network |
SLIR |
2022 |
[39] |
American Samoa |
census, questionnaires and land usage |
0.055m |
age and household distribution, population evolution |
not defined |
2017 |
[35] |