How do you make a projection?

Our world runs on projections. We collect data to predict everything from the weather and sports results to retail sales and investment performance. Now we’re turning to predictive models to determine potential COVID-19 cases and deaths, allocate healthcare resources, and determine when we can ease or remove social distancing policies.

But projection modeling for COVID-19 is difficult. For starters, the models are only as good as the data going into them. During a pandemic, we only get that data from testing, which will need to scale up in a massive way. New York City plans to buy 100,000 test kits per week — a good start, but a densely populated city of 8.7 million people will need more to get back to anything resembling “normal.”

There is also a lot we still don’t know, including the symptomaticity ratio, fatality rate, and immunity. The fatality rate will also be impacted by demographics — such as age and underlying conditions — and access to healthcare, all of which vary between countries, cities, and even neighborhoods.

Outcomes will also vary based on behaviors, such as how well communities adhere to social distancing measures. (The IHME model revised its projections down last week, based on more optimistic assumptions about adherence.) Models are not our destiny, and the ultimate outcomes will depend on how we respond. As New York’s Governor Cuomo noted this morning: “Whatever we do today will determine the infection rate tomorrow. You stop what you’re doing or behave differently, and you will get a different result.”

As the focus shifts to reopening the economy, countries and states are deciding what indicators to use to inform decisions to relax social distancing rules. For example, yesterday New York City started reporting on three indicators — hospital admissions, critical care capacity, and positive test rates — which will inform when social distancing rules are relaxed. And today, the World Health Organization shared criteria for lifting restrictions, including that health system capacities are in place to detect, test, isolate and treat every case and trace every contact.

The unknowns

As with any virus, there are concerns over false negative virology test results, but a positive result is very likely to be accurate. Without accurate and widespread testing, we don’t know how many people have the virus or what the actual fatality rate is.

There’s evidence that COVID-19 patients who are asymptomatic or have mild symptoms can transmit the virus; biometric data from wearables and connected devices can point to trends, but won’t identify every potential case if they rely on symptoms alone. Kinsa is aggregating temperature data from its smart thermometers to track “atypical fevers” in the United States; the company’s Health Weather Map showed the positive impact of social distancing measures before public health officials could spot the decline in hospital admissions.

Understanding social distance measures — and local adherence to them — can also improve projection models. Stanford researchers are attempting to build a county-level dataset of measures, and companies like Apple and Google are using geolocation data from map apps to publish mobility reports. The CDC used cell phone data from four different cities to see if people in New York City, Seattle, New Orleans, and San Francisco were limiting their movements last month. Steep declines in public transportation ridership also indicate a sharp drop in mobility.

These secondary sources of data help us understand social distancing and how it’s helping us flatten the curve. Similarly, we could use nontraditional data sources to help us assess opportunities for selectively opening and closing parts of the economy. There’s a lot we don’t know, and we may need to think creatively about how to spot real-time, community-level trends. Could data from credit card payments, traffic cameras, transit turnstiles, smartwatches and fitness trackers, or internet search results tell us about which areas are safe and which ones might be emerging hotspots?

Where do we go from here?

Assuming states and cities can obtain, distribute, and process tests quickly — and trace potential transmission in real-time — we’ll need post-peak models that can identify stubborn hotspots and project future waves. Then, finally, lower-risk areas may be able to reopen.

The longer-term outlook depends on several factors we’re still learning about — such as the degree of seasonal variation in transmission, the role of antibodies, the duration of immunity, and the degree of cross-immunity between COVID-19 and other coronaviruses.

Of course, the need for accurate projection models won’t end with everyone going back to work. We need to ask and answer important questions — and create models that will address the longer-term health impacts of widespread shutdowns. For example, how many children won’t get MMR vaccinations on time? How many people will have untreated cavities because they weren’t able to go to the dentist? And what is the cumulative impact on outcomes for those living with conditions such as diabetes, heart disease, or kidney disease?