Covid-19 test results: a tale of cause, effect and conditional probabilities
By Nick Petford
Declared as a global pandemic by the World Health Organisation in March 11, the SARS-Cov-2 virus responsible for the respiratory disease Covid-19 has so far claimed the lives of over 1.9 million people¹. In response, governments worldwide have imposed a mix of non-pharmaceutical interventions (NPIs), ranging in scale and severity from social distancing measures to full lockdowns, to quash the spread while waiting for a vaccine. Our own research has shown that the spring 2020 lockdown in the UK had a positive effect in ‘flattening the curve’ and reducing deaths in Northamptonshire². Nevertheless, research into the relative effectiveness of NPIs and their impact on lowering the time varying reproduction number more generally remains controversial³ ⁴ ⁵. Time will tell which have worked best to curtail transmission, given that death rates in the UK and elsewhere continue to rise. As does the economic fallout. Up to November 2020 the UK Govt has borrowed £280 billion to pay for costs associated with the pandemic — over £4000 per head of population and set to rise further. This is without factoring in the longer-term impact on employment, preventable non-Covid deaths and mental health.
In what follows I’m going to show how official test data published daily by Gov.UK may be overestimating the true number of Covid-19 cases in the UK. This line of reasoning might seem insensitive when hospitals are in danger of overflowing. Clearly no reasonable person would want to see our most vulnerable exposed unnecessarily to a deadly virus, nor public health services brought to their knees by the pandemic. However, this is more than a technical distinction. If the data used to justify NPIs are inaccurate in a significant way, then best efforts to contain the pandemic could themselves become the cause of the effect they are designed to mitigate. Put simply, well-intended interventions may inadvertently make things worse.
Covid-19 test results
To date, approximately 3.2 million people (c. 5%) out of a total of nearly 59 million tests have shown up positive⁶ using one or more of the three different types of test being carried out in the UK to detect the SARS-CoV-2 virus. The most common is the reverse transcription polymerise chain reaction (RT-PCR) test for viral RNA. The others are serology tests that look for the presence of Covid-19 antibodies, and a lateral flow test (LTF) that give results in less than an hour. This is the test being used currently on students and staff at the University of Northampton. You can find out more about results to date here.
How reliable are the tests?
No medical test is ever 100% accurate. Miscalculations arise for various reasons including faulty kit and human error. More fundamentally, the test itself presents a dilemma to do with the probabilities of cause and effect. To understand this better, let’s suppose you have had a medical test for a disease, the result of which comes back positive. How likely is it that you really have the disease?
Brief introduction to conditional probability
To answer we need to step back in time to the eighteenth century and draw on the wisdom of Thomas Bayes. Bayes (1702–1761) was interested (amongst other things as a practicing theologian), in the idea of cause and effect. If we know a prior event has caused something — e.g. flicking a coin, we can then work out the probability P of the effect — landing either as a head or a tail, as 50:50 per flick. This forward probability, where each flick is independent of the other, is a familiar concept in our everyday lives. But Bayes was curious about what he called the inverse probability, where the outcome is dependent on a prior event. To understand what this means, think about the coin tossing example. If you see a coin lying heads up (an effect), what is the probability this is conditional upon some other prior event, e.g. flicking as the cause? Put differently, suppose you know some event B has occurred. What is the probability that a second outcome, A is a direct consequence? Bayes found a way to answer this tricky dilemma. The theory named after him, shown in Fig. 1, has not only stood the test of time, it is currently at the forefront of complex data analysis, artificial intelligence and machine learning.
Fig. 1. Thomas Bayes and his famous formula for conditional (or inverse) probability.
What are false positives and why should we care?
It seems a ridiculous proposition. You’ve been tested for a disease and it came back positive. Surely that’s the end of it? Positive means 100% certain, right? Well, not exactly. The test result could be a false positive. To explain this apparent contradiction, I’ve borrowed an example from Pearl and Mackenzie⁷ in their 2018 book on the science of cause and effect, where they used Bayes’ Theory to work out false positives in women who tested positive for breast cancer. Although you can’t catch cancer from someone else, the principles of conditional probability still apply to a communicable disease like Covid-19.
Often it’s false negatives (FNs) that draw most concern as their presence undermines the true incidence rate. A FN patient is free to infect others undetected, making them a significant transmission risk⁸. You can find out more about work into FNs in the UK here. But the implications of FPs are serious too, including delayed surgery, unnecessary isolation, track and trace misattribution, or infection through misdiagnosed segregation in a high prevalence setting (hospital or care home).
So, let’s look a bit more closely at the PCR test and its ability to generate a false positive result. As already mentioned, a false positive can come about in several ways including sample contamination and sensitivity issues with the test equipment⁹. The FP rate in PCR testing is currently estimated at between 0.8 and 4.0% (interquartile range), with a test sensitivity of approximately 95%¹⁰. Given this information and remembering that our simple example of flipping a coin has both a forward (cause to effect) and an inverse or conditional probability of effect to cause, we can think about the PCR test in a counterintuitive way. The forward probability assumes you already have Covid-19 (CV), and the test (T) comes out positive. We can write this formally as: P(T|CV), where the symbol | means “given that”. So far, so good. But what about the inverse probability P(CV|T)? Put simply, how likely is it that the effect (testing positive) is due to the cause that you really do have Covid-19? This is where things get a bit more complicated. To answer this, we need to define a new, revised probability of having the disease based on your initial or prior probability plus the likelihood of a positive test result compared to the population in general. Following Bayes’ Rule (Fig. 1), this can be written in symbols as:
It looks complicated so let’s break it down, starting with the denominator, P(T). This is the weighted average of the probability of a positive test result in those who have Covid-19, P(T|CV) and the ‘inverse’ probability of a positive result in those who don’t (the PCR false positive rate), written P(T|~CV).
The reason for the weighted average is because for every 1 person in 80 or so carrying the disease, P(CV), 79 don’t. The probability of not having Covid-19, P(~CV), is thus 79/80. Taken together, P(T) = 95% x 1/80 + 4% x 79/80, divided into 95% x 1/80. The surprising result is that the revised probability of having Covid-19 after testing positive using PCR in a mass random test lies between 60% (FP = 0.8%) and 23% (FP = 4.0%). Despite this being less than 100%, you will nonetheless be recorded automatically as a positive Covid-19 statistic in the ONS data.
There are some important caveats here though. At a practical level, the experimental evidence on receiving a false positive, or negative result, using RT PCR (and other Covid-19 tests), needs improving. As a communicable disease, the infection rate may change rapidly, although this is easily factored into a revised calculation. Nor will the revised probability be the same for all individuals. For example, as with non-communicable disease, genetic factors or underlying conditions mean some will be more susceptible to infection than others chosen at random.
Community prevalence will also play a role, as shown in a blog here. In a random population of 10,000 people with a 1% chance of having Covid-19, and where 95% test positive, a positive result still means you are 5 times more likely to be negative than positive. However, for community testing with a prior 5% chance of having Covid-19, a positive and negative result are both equally likely. Finally, in a high-risk community (e.g. hospital or care home), where the prior probability of having Covid-19 before being tested is 10%, a positive outcome means its twice as likely that you are positive than negative.
What does this mean for Covid-19 positive case rates in UK?
It is quite possible, for the reasons given above, that official ONS data on positive Covid-19 cases obtained by FT PCR are overestimates. This is especially likely where mass testing is being done randomly in low prevalence areas¹¹. To be sure that the numbers are accurate, more granular-level information is needed on prior (pre-test) probabilities where the chance of the people being tested having Covid-19 is higher than in a random sample. This is because the higher the infection rate in the group being tested, the more confident you can be that a positive test result means you really do have Covid-19.
“Headlines of death and sorrow, they tell of tomorrow, madmen on the rampage…”
It’s inevitable that daily media reports of increasing Covid-19 cases based on PCR tests which may be overestimating the true number of positive cases, and by extension fatalities, has added to the sense of fear and frustration felt by many. An unhelpful outcome has been the polarisation of public attitudes in how best to tackle the pandemic. For some, the “madmen on the rampage” are lockdown rule breakers whose perceived irresponsible and reckless actions are driving up case rates. For others, the “madmen” are governments and their advisors, who seem hell bent on sacrificing education, jobs and the economy needlessly. Wherever you stand, one thing is for sure — the pandemic is a lose-lose situation. That’s why it’s vital to understand better pre-test probabilities in the population at large¹² and the magnitude of false positives and negatives being recorded, both to give full confidence to the official figures, and help refine future public health policy decisions.
Fig. 2 (a) Two examples of how cause and effect related to health service provision might interpreted differently in a situation where data (evidence) is unreliable. (a) The desirable case (negative feedback loop) where NPIs lead to a beneficial effect (reduced pressure on services). (b) The undesirable case, (positive feedback loop) where restrictions (NPIs) become the cause that amplifies the effect (increased pressure on services) they were supposed to reduce.
2. Petford, N, Campbell, J. (2020). Covid-19 mortality rates in Northamptonshire UK: initial sub-regional comparisons and provisional SEIR model of disease spread. MedRxiv preprint, https://doi.org/10.1101/2020.07.30.20165399.
3. Chin, V, Loannidis, JPA, Tanner, MA, Cripps S. (2020). Effects of non-pharmaceutical interventions on COVID-19: A Tale of Three Models. medRxiv 2020.07.22.20160341; doi: https://doi.org/10.1101/2020.07.22.20160341
4. Soltzes et. al. 2020. The effect of interventions on Covid-19. Nature, 588, 24–31. https://doi.org/10.1038/s41586-020-3025-y
5. Larochelambert, QD, Marc, A, Antero, J, Le Bourg, E, Toussaint, JF. (2020). Covid-19 Mortality: A Matter of Vulnerability Among Nations Facing Limited Margins of Adaptation. Front. Public Health, 19 November, https://doi.org/10.3389/fpubh.2020.604339
7. Pearl, J, Mackenzie, D. (2018). The Book of Why. Penguin, Great Britain, 418 pp.
8. Ingrid Arevalo-Rodriguez et. al. (2020). False-Negative results of initial RT-PCR assays for COVID-19: a systematic review. medRxiv 2020.04.16.20066787; doi: https://doi.org/10.1101/2020.04.16.20066787
9. Mayers, C, Baker, K. (2020). Impact of false-positives and false-negatives in the UK’s COVID-19 RT-PCR testing programme. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/895843/S0519_Impact_of_false_positives_and_negatives.pdf
10. Cohen, AN, Kessel, B. (2020). False positives in reverse transcription PCR testing for SARS-CoV-2. https://www.medrxiv.org/content/10.1101/2020.04.26.20080911v1.full.pdf
11. Healey, B, Kahn, A, Metezia, H, Blyth, I, Asad, H. (2021). The impact of false positive COVID-19 results in an area of low prevalence. Clinical Medicine, 21, 1–3. Doi: 10.786/clinmed.2020–0839.
12. Surkova, E, Nikolayevskyy, V, Drobniewski, F. (2020). False-positive Covid-19 results: hidden problems and costs. The Lancet, https://doi.org/10.1016/S2213-2600(20)30453-7
Nick is Vice Chancellor of the University of Northampton. Although a geologist by training he has published several medical-related research articles on topics malaria prevention, mathematical modelling of blood flow in stroke victims and the three-dimensional structure of animal skin. He is also Chair of Northamptonshire Health and Wellbeing Board, one of over 100 statutory bodies responsible for developing integrated health and social care strategies and reducing health inequalities. View ORCID Profile