Now that we are two months past the peak of the UK coronavirus epidemic, many fear the emergence of a second wave of the disease and remain anxious about any evidence that reopening the country has gone too far. For this reason media headlines like ‘Germany’s R number rockets again – from 1.79 to 2.88’ (Sky News) and ‘UK coronavirus cases no longer falling, ONS figures show’ (the Times) are amplified very quickly. But how worried should we really be by these headlines?
By now, we have become familiar with the R number (the average number of people that each infected person will themselves infect) and are alert to the danger of it being greater than 1. However, we have likely spent less time thinking about how the value is found. It is important to understand that we cannot directly measure R, like temperature on a thermometer. Simply put, it is impossible to be sure how many people are infected at any given time, let alone who infected them. For this reason, we have to infer the value of R indirectly.
One way to do this is to count daily positive tests and track the rate at which they are growing or shrinking. But positive tests are an imperfect proxy for the number of infections in the population. If several people display symptoms in a single workplace, it makes sense to test the entire workforce there. As a result, more positive tests will occur, as asymptomatic people or those with mild symptoms are also discovered. It is not necessarily that infections have become more common, but rather we know where to look for them.
When we were close to the peak, discovering a few hundred such extra cases would make very little difference to our estimate of R. But because the prevalence of the disease is now so low, a few localised outbreaks of this kind can cause of spike in the number of daily cases, making estimates of R jump. This is why Germany’s R number rose so dramatically last week. A few days later the spike no longer affected the estimate of R, which then fell to 0.72 – a fact barely reported by the media.
The true R number is unlikely to fluctuate so wildly: in fact, the sheer size and speed of Germany’s reported increase made it clear that this was probably a statistical artefact due to low prevalence. I suggest you should be more concerned about estimates of R that are roughly constant and above 1, particularly if they concern many regions of the same country. This is why the present situation in the United States is so troubling.
An alternative way to estimate R is also prone to uncertainty for a similar reason. Each week, the ONS announces the results of its coronavirus infection survey, which broadly speaking works like an opinion poll. In a poll, if we survey a representative sample of 10,000 people and find that 4,000 of them will vote Conservative, we may infer that 40 per cent of people are Tory supporters. However, proper reporting of an opinion poll will carry a margin of error, more correctly referred to as a confidence interval. For example, in this case statistical analysis shows the true proportion of Tory voters is likely to be between 39 and 41 per cent.
With a reduced number of coronavirus cases in the UK, the proportion of positive tests in the infection survey are very low. The headline numbers underlying the Times report showed that the ONS estimate of prevalence had gone up from 0.05 per cent to 0.09 per cent from one week to the next. Roughly speaking, if we imagine that the ONS test around 10,000 people weekly for the coronavirus, in one week they had five positive tests and in the next they had nine. This does not feel like a significant difference, and may well just have been caused by random fluctuations.
To understand this better, a good habit is to go past the headline numbers and look for reports of the confidence intervals of studies. For example, the ONS tell us that their figure of 0.09 per cent should really be thought of as ‘somewhere between 0.04 and 0.19 per cent’. This does not reflect any failure in the survey, but is rather an inevitable mathematical outcome of the low prevalence of the virus. In that sense, rather than looking at headline values, it is better to look at weekly changes in the confidence intervals, and to remember that any number within the interval is somewhat plausible.
One final piece of advice is to not judge the situation by any one piece of data. All numbers involving coronavirus come with uncertainty, which is inevitably not reflected in a headline or a tweet.
Extremely surprising numbers make better headlines and will spread faster, but almost by definition such results are more likely to be wrong. By relying on social media reports of coronavirus, particularly by only following people who share your prior beliefs, you will receive an unbalanced view of the true situation. As always, it would be wise to consider as many views as possible, including those with which you disagree.
Oliver Johnson, School of Mathematics, University of Bristol, @BristOliver
Got something to add? Join the discussion and comment below.