Abstract
International migrants comprised 14% of the UK population in 2020, but migrant health in the UK has rarely been studied at a population level using primary care electronic health records (EHRs). Given the difficulty of determining migration status using EHRs, this study developed a migration phenotype and assessed its validity. We developed a phenotyping algorithm using codes for country of birth, visa status, non-English main/first language and non-UK origin. It was applied to a Clinical Practice Research Datalink (CPRD) GOLD database of 16,071,111 primary care patients between 1997 and 2018. We compared the completeness and representativeness of the identified migrant population to Office for National Statistics (ONS) country of birth and 2011 census data by year, age, sex, geographic region of birth and ethnicity. Between 1997-2018, 403,768 migrants (2.51% of the CPRD GOLD population) were identified using the phenotype. 178,749 (1.11%) of these migrants were identified by codes indicating foreign country of birth or visa status, 216,731 (1.35%) a non-English main/first language, and 8,288 (0.05%) non-UK origin. The cohort was similarly distributed compared to ONS migration statistics in terms of sex and region of birth. Recording of migration improved from identifying approximately one-tenth of the expected proportion of migrants according to the ONS in 2004 to a quarter in 2018. Younger migrants were better represented than those aged 50 and over. The migration phenotype identified a large number of migrants and can be used to undertake large-scale migration health research in CPRD GOLD to inform healthcare policy, practice and action. While the cohort was representative of the UK migrant population in terms of sex and region of birth, migration status was under-recorded in earlier years and older ages, and future studies for these groups should therefore be interpreted with caution.