3. Results
Figure 1 shows changes in false omission rates, RFO, as a function of prevalence from 0 to 100%. The red curves reflect the results of testing three times for asymptomatic home self-testers participating in the Collaborative Study. Please see the inset table for details.
Repeating RAgTs improved sensitivity from an initial 34.4% to 55.3% on the first repetition and 68.5% on the second repetition when singleton RT-PCR positives were included. The initial PB was 7.37% (red dot). However, subsequent PBs (10.46%, 2nd test; 14.22%, 3rd test) did not improve as theoretically predicted for successive repetitions.
In community settings and hotspots with prevalence >7.37%, the RFO curves predict that more than 1 in 20 diagnoses will be missed with the first test, while with the second and third tests, RFO breeches will occur at 10.46% and 14.22% prevalence, respectively.
Relaxation of the RFO threshold to 10%, 20%, and 33.3% for the third test generates unacceptable levels of missed diagnoses (1 in 10, 1 in 5, and 1 in 3, respectively) as the PB moves up and to the right at 25.9, 44.1, and 61.2% prevalence, indicated by the red symbols (see inset table) on the exponentially increasing red curve for the third test.
The second repetition of the RAgTs (third RAgT) did not achieve the World Health Organization (WHO) performance criteria [
10] (blue dot and curve) for RAgT sensitivity of at least 80% and generated a PB of only 14.22% (R
FO = 5%), which is 69.9% of the PB (20.34%) calculated (using
Eq. 24) for the WHO specifications.
The highest levels of performance in
Figure 1 were attained by the home molecular loop-mediated isothermal amplification (LAMP, purple curve) assay median performance [
11] (sensitivity 91.7%, specificity 98.2%, and PB 38.42%), the mathematically predicted performance of a Tier 2 test (PB 50.6%, green curve), and the Tier 2 repeated test (PB 95.2%, large green dot). Tier 2 sensitivity is 95%; for R
FO = 5%, the predicted PB would increase by 44.6% to 95.2% when the test is repeated.
Figure 1.
False Omission Rates Increase Exponentially with Prevalence. The median performance of a home molecular diagnostic test (HMDx LAMP, purple curve) performed only once beats that for three serial RAgTs in the Collaborative Study. A repeated Tier 2 test (green curve rising on the right) will not miss more than 1 in 500 diagnoses until the prevalence exceeds 43.8%, then 1 in 200 up to 65.9% prevalence, and subsequently 1 in 100 up to 79.4%, 1 in 50 up to 88.6%, and 1 in 20 (large green dot) up to 95.24%. Abbreviations: HMDx, home molecular diagnostic; LAMP, loop-mediated isothermal amplification; NPA, negative percent agreement; PB, prevalence boundary; RAgT, rapid antigen test; RFO, rate of false omissions; and WHO, World Health Organization.
Figure 1.
False Omission Rates Increase Exponentially with Prevalence. The median performance of a home molecular diagnostic test (HMDx LAMP, purple curve) performed only once beats that for three serial RAgTs in the Collaborative Study. A repeated Tier 2 test (green curve rising on the right) will not miss more than 1 in 500 diagnoses until the prevalence exceeds 43.8%, then 1 in 200 up to 65.9% prevalence, and subsequently 1 in 100 up to 79.4%, 1 in 50 up to 88.6%, and 1 in 20 (large green dot) up to 95.24%. Abbreviations: HMDx, home molecular diagnostic; LAMP, loop-mediated isothermal amplification; NPA, negative percent agreement; PB, prevalence boundary; RAgT, rapid antigen test; RFO, rate of false omissions; and WHO, World Health Organization.
Figure 2 displays gain in the prevalence boundary, ∆PB, on the vertical (y) axis versus sensitivity of the test on the horizontal (x) axis. Initially, the ∆PB curve is relatively shallow. As prevalence increases, it peaks at 91.0 to 91.4% (see the magnifier at the top). The curves cluster together because of the small span in specificity (please see the left column of the inset table). The magnifier at 25% ∆PB shows that the relative order within the cluster is the same as the ranking by specificity in the inset table.
The righthand columns of the inset table in Figure 2 list actual PBs and theoretical predictions. For the Collaborative Study, the gain in PB obtained with the first repeated test, 3.09%, approximated that predicted, 3.37%. Upon testing twice, the gain in PB of 3.76% was only 37.1% of the 10.13% predicted. There is no clear explanation for the meager improvement.
The PBs for the second and third tests, 10.46% and 14.22%, respectively, lagged behind the theoretical predictions of 10.74% and 20.59%, respectively. The two red boxes show where the repetition points lie on the red ∆PB curve and explain the progression of PBs. The arrows point to the coordinates of ∆PB (y axis) and sensitivity (x axis).
Looking back at Figure 1, we see that for RFO = 5%, the median of home molecular diagnostic LAMP tests (HMDx, purple curve) performs better with just one test than three serial RAgTs and beats WHO performance by positioning itself between the Tier 1 and Tier 2 RFO curves. In general, the plot of ∆PB versus sensitivity in Figure 2 reveals that when one tolerates 1 in 20 missed diagnoses, repeating a test will not increase the PB maximally unless the sensitivity is 91.03-91.41%.
In Figure 2 the curves cluster together (see magnifiers) in the right-skewed peak shape because specificity is uniformly high (95-99.2%), and its range is small. The rate of gain in ∆PB depends primarily on sensitivity (x axis) and follows the slope of the curve cluster. The slope is highest from about 75-85%, which implies test performance has the most to gain there. This mathematical analysis is not exclusive to COVID-19 testing. It applies to other positive/negative qualitative diagnostic tests for infectious diseases and can help optimize future assay design.
Figure 2.
Gain in Prevalence Boundary as a Function of Test Sensitivity. This figure illustrates three key findings: 1) The curves cluster together because of the narrow range in clinical specificity (95% to 99.2%), which means that the primary driver of the increase in prevalence boundary (∆PB) is sensitivity; 2) The shallow shape of the curves on the left emphasizes how little is gained by repeating RAgTs tests that start with low sensitivity; and 3) Only when sensitivity is 91.0-91.4% will a repeated test maximally increase the prevalence boundary as show by the peaks on the right, making the tests more useful in settings of different prevalence because missed diagnoses are minimized. Please see the inset table for performance metrics. The curves were created using Eq. 26. Abbreviations: ∆PB, the increase in PB with repeated testing; PB, prevalence boundary; RAgT, rapid antigen test; RFO, rate of false omissions; T1, Tier 1, T2, Tier 2; and WHO, World Health Organization.
Figure 2.
Gain in Prevalence Boundary as a Function of Test Sensitivity. This figure illustrates three key findings: 1) The curves cluster together because of the narrow range in clinical specificity (95% to 99.2%), which means that the primary driver of the increase in prevalence boundary (∆PB) is sensitivity; 2) The shallow shape of the curves on the left emphasizes how little is gained by repeating RAgTs tests that start with low sensitivity; and 3) Only when sensitivity is 91.0-91.4% will a repeated test maximally increase the prevalence boundary as show by the peaks on the right, making the tests more useful in settings of different prevalence because missed diagnoses are minimized. Please see the inset table for performance metrics. The curves were created using Eq. 26. Abbreviations: ∆PB, the increase in PB with repeated testing; PB, prevalence boundary; RAgT, rapid antigen test; RFO, rate of false omissions; T1, Tier 1, T2, Tier 2; and WHO, World Health Organization.
4. Discussion
Clinical evaluations show that the specificity of COVID-19 RAgTs is high [
7]. In
Figure 2 the ∆PB curves cluster together because the range of specificity (95-99.2%) is narrow. Therefore, the degree to which a repeated RAgT increases the PB depends primarily on the test sensitivity. Investigators have addressed the sensitivity of RAgTs in various settings.
In hospitalized patients, Kweon et al. [
12] found that for RT-PCR cycle thresholds of 25-30, point-of-care antigen test sensitivity ranged from 34.0% to 64.4% with higher sensitivity within the first week. Hirotsu et al. [
13] reported antigen testing exhibited 55.2% sensitivity and 99.6% specificity in 82 nasopharyngeal specimens from seven hospitalized patients tested serially.
In twenty community clinical evaluations of asymptomatic subjects, RAgT sensitivity ranged from 37% to 88% (median 55.75%) and specificity, from 97.8% to 100% (median 99.70%) [
8]. During a nursing home outbreak, Mckay et al. [
14] documented a RAgT sensitivity of 52% with asymptomatic patients.
In correctional facilities, Lind et al. [
15] showed that serial RAgTs had higher but diminishingly different sensitivities for symptomatic versus asymptomatic residents. In a university setting, Smith et al. [
16] found that serial testing multiple times per week increased the sensitivity of RAgTs. Wide variations in sensitivity in these studies and others indicate that for RAgTs to rule out disease, performance should be improved and also more consistent with less uncertainty [
2].
Asymptomatic infections highlight the need to moderate false negatives, that is, curtail missed diagnoses and assure that repeating RAgTs shifts PBs to the right to mitigate spread of disease. The schematic in
Figure 3 illustrates how missed diagnoses might trigger dysfunctional outcomes. Starting in the top left, highly specific tests may generate false positives when prevalence is very low (e.g., <2%) [
1]. For graphs of false positive to true positive ratios versus prevalence, please see
Figure 1 in reference [
1].
Figure 3.
Potential Vicious Cycle Fueled by Repeating Poorly Performing Rapid Antigen Tests. Poorly performing RAgTs can perpetuate virus transmission by missing diagnoses, more so as prevalence increases and the weighting of test performance shifts from specificity (top left) to sensitivity (top right). In high-risk settings and hotspots, prevalence breaches and evolving variants may compound an outbreak to generate an epidemic. Repeating the RAgTs consumes valuable time. Asymptomatic people may unknowingly spread disease to family, friends, workers, and clients creating a vicious cycle. Abbreviation: RAgTs, rapid antigen tests.
Figure 3.
Potential Vicious Cycle Fueled by Repeating Poorly Performing Rapid Antigen Tests. Poorly performing RAgTs can perpetuate virus transmission by missing diagnoses, more so as prevalence increases and the weighting of test performance shifts from specificity (top left) to sensitivity (top right). In high-risk settings and hotspots, prevalence breaches and evolving variants may compound an outbreak to generate an epidemic. Repeating the RAgTs consumes valuable time. Asymptomatic people may unknowingly spread disease to family, friends, workers, and clients creating a vicious cycle. Abbreviation: RAgTs, rapid antigen tests.
Patients with false positive COVID-19 test results generally will be isolated (upper left,
Figure 3) and cannot spread disease because they are not infected with SARS-CoV-2. The prevalence in the Collaborative Study was in the range of 2.39 to 2.75% (134/5,609 to 154/5,609) in late 2021 and early 2022 when data were collected [
4]. The singleton RT-PCR positives reported by the investigators may have been false positive RT-PCR reference test results; to avoid bias in the present study, singletons were not excluded.
As prevalence increases, the weighting of RAgT performance shifts from specificity to sensitivity (top sequences in
Figure 3). A vicious cycle may develop as diagnoses are missed. Repeating low-sensitivity RAgTs does not advance PBs substantially (see
Figure 2). False negatives will increase exponentially (see
Figure 1) as prevalence hits double digits. Pollan et al. [
17] reported seroprevalence >10% in Madrid in 2020. Gomez-Ochoa et al. [
18] reported healthcare worker prevalence of 11% with 40% asymptomatic.
In 2020 Kalish et al. [
19] documented 4.8 undiagnosed infections for every case of COVID-19 in the United States. The 2022 meta-analysis of Dzinamarira et al. [
20] found 11% prevalence of COVID-19 among healthcare workers. In a 2021 meta-analysis, Ma et al. [
21] discovered that asymptomatic infections were common among COVID-19 confirmed cases, specifically 40.5% overall, 47.5% in nursing home residents or staff, 52.9% in air or cruise travelers, and 54.1% in pregnant women.
Prevalence can be estimated from positivity rates using
Eq. 30 when high sensitivity RT-PR testing is used. For example, if the positivity rate is 5%, sensitivity is 100%, and specificity is 99% (Tier 3), estimated prevalence will be ~4%, and if the positivity rate is 20% and specificity 97.5% with 100% sensitivity, then ~18%. Cox-Ganser et al. [
22] documented test positivity percentages of up to 28.6% in high-risk occupations. In 2020 the median New York City positivity was 43.6% (range 38-48.1 across zip codes) [
23]; estimated prevalence is 43.0%.
Thus, RAgTs and other COVID-19 diagnostic tests must perform well over wide ranges of prevalence that vary geographically and in time. Higher sensitivity point-of-care molecular diagnostics (left in
Figure 3), such as LAMP assays [
11] with EUAs for home testing or other portable molecular diagnostics, offer a way out of the vicious cycle. Exiting the vicious cycle with highly sensitive and highly specific molecular testing will decrease community risk and enhance resilience [
24,
25].
Time spent testing is important too. Delaying diagnosis increases the risk of infecting close contacts (see the inner feedback loop in Figure 3). Asymptomatic people carrying SARS-CoV-2 may unknowingly spread disease to family, friends, workers, and patients as viral loads increase during the protracted 3-test, 5-day protocol now mandated by the US FDA for RAgTs. Delays allow new variants to emerge, which in turn, increase prevalence. The Eris variant, EG.5 (a descendent lineage of XBB.1.9.2) currently threatens well-being, especially elderly.
The US FDA now requires RAgT labeling to state that results are “presumptive.” RT-PCR or other COVID-19 molecular diagnostic tests should be used to confirm negative RAgT results. The WHO and the US declared an end to the pandemic, but people still need to test [
26,
27]. For the week ending July 29
th, 9,056 new US hospitalizations were reported, ER cases doubled, and the positivity rate rose to 8.9% for tests reported to the CDC [
28].
There are limitations to this work. First, Bayesian theory was not proven during the pandemic, although it appears to explain testing phenomena. Second, self-testing in the Collaborative Study was not controlled and the reference test comparison was incomplete. Third, QC was omitted and reagents may have degraded. Fourth, layperson testing technique may have been faulty or inconsistent. Fifth, manufacture PPA and NPA specifications may have been overstated in the small studies submitted to the FDA to obtain EUAs.
Further, there was no comparison LAMP molecular assay included in the Collaborative Study for parallel self-testing at points of care. Nonetheless, these limitations do not obviate the need for higher performance standards and upgrading of RAgT and other diagnostic assays that will be needed for future threats. Timely diagnosis of COVID-19 is important, especially for children this fall. Mellou et al. [
29] found that 36% of children who self-tested were asymptomatic, the median lag to testing positivity was two days, and early diagnosis “…probably decreased transmission of the virus…”.
5. Conclusions and Recommendations
Speed and convenience are two of the primary reasons people seek COVID-19 self-testing [
11]. Repeating RAgTs three times over five days defeats the purpose of
rapid point-of-care testing, does not inform public health in a timely manner, could complicate contact tracing, and may not be cost-effective. Missed diagnoses can perpetuate virus transmission, exponentially more so when prevalence exceeds PBs. Tolerances limits for missed diagnoses have not been established nor have they been tied to different levels of prevalence. The ∆PB [
Eq. 26] does not depend on prevalence, per se, and should be optimized if tests are repeated.
No precise temporal trend maps of COVID-19 prevalence in different countries are available for comparison, so the impact of prevalence, per se, is uncertain, although prevalence is known to have been very high in COVID-19 hotspots and high-risk settings [
30]. Breaches of RAgT PBs may have generated vicious cycles, adversely transformed outbreaks into endemic disease, prolonged contagion, defeated mitigation, allowed new variants to arise, and fueled the pandemic, as
Figure 3 illustrates.
The FDA allowed manufacturers to support RAgT serial screening claims with new clinical evaluations [
9]. Upgraded performance should be demonstrated in multicenter trials with large numbers of subjects. To decrease missed diagnoses with a repeated test, mathematical analysis suggests that RAgT sensitivity should be 91.03 to 91.41% in actual clinical evaluations. The theory also shows that a test with Tier 2 clinical sensitivity of 95% will generate PB of 95.2% when only repeated once (see
Table 2).
COVID-19 was shown to have positivity rates and/or prevalence as high as 75% or more [
30,
31], which creates potential for asymptomatic infections to spread silently. If superior RAgT performance is not attainable, the FDA should retire EUAs. New RAgTs for COVID-19 or future highly infectious disease threats should achieve high performance proven clinically to be at least the level of Tier 2 (95% sensitivity, 97.5% specificity), especially in high-risk settings and hotspots.