Facts are hard-won in medical science. To establish that one drug is slightly more effective than another might require hundreds of millions of dollars and a years-long trial among thousands of patients. A year ago, when the coronavirus first appeared, scientists knew virtually nothing about it—whom it might affect, how it spread. In March, when the pandemic forced much of the United States into lockdown, there was no consensus about whether people should wear masks; more fundamentally, there was no consensus about whether the virus was spread by droplets or whether it was airborne. Such facts are so hard to establish because they need to be sturdy enough to support a person’s life.
Last Friday morning, I received a phone alert from the Boston Globe: epidemiologists and genomicists had traced COVID-19 infections at a Boston biotech conference in late February and estimated that, by October, the conference had led to between some two and three hundred thousand cases, across twenty-nine states and multiple countries. That was the headline, but the substance of the paper, which appeared in Science, was a careful tracking of a mutation of the coronavirus that had appeared among patients who were infected at the conference, which was held by the drug company Biogen, and then moved through Massachusetts and to other states and continents. These are the kind of empirical facts that have been in short supply, and they provide a glimpse of not just a cluster of infections isolated in place and time but a branch of the pandemic as it spread through the world and through the year.
By 4 p.m. Friday, I was on a video call with the two lead authors of the paper, Jacob Lemieux, an infectious-disease physician and postdoctoral researcher at Harvard, and Bronwyn MacInnis, the director of pathogen genomic surveillance at the Broad Institute’s Infectious Disease and Microbiome Program. They looked a little worn out, having spent the day trying to explain a paper about the processes of superspreading to reporters who were primarily interested in how many infections could be traced to the conference. “In the initial version of the paper, we didn’t have a number—we didn’t want to go there,” Lemieux said. “People kept asking, ‘How many cases? How many cases?’ So we did our best, as scientists, to flesh this out.” Lemieux noted that, though the paper offered an estimate of the number of infections, it also included half a paragraph of caveats, most of which acknowledged the incompleteness of databases of coronavirus genomes and the imperfections of the calculations, including Lemieux and MacInnis’s, that rely on them. “What we’re trying to point out is it’s a big number,” Lemieux said. “It’s bigger than we would have expected.”
From one point of view, an end to the coronavirus seems near. This week, the first doses of the vaccine were administered at the hospital where Lemieux works and at many others around the country. From another point of view, we are just beginning to see clearly how high the stakes have been, both in every public gathering and in every effort to regulate them. Although researchers have suspected since early in the pandemic that the virus’s spread has been shaped by “superspreader events,” in which some branches of the disease spread much more explosively than others, no one knows whether the cause is the viral load in a host or the situations in which a host encounters other people—or some combination of the two. Nothing about the Biogen conference was unique, Lemieux emphasized: a hundred and seventy-five people from several countries gathered in a hotel at the end of February, before anyone wore masks. It was just a normal event in the course of the modern world. Eight months later, that meeting had led to the infections of a quarter of a million people, give or take. MacInnis tilted her head back and looked straight upward while she thought through what she wanted to say. She said, “It certainly captures my imagination.”
Lemieux works in the lab of Pardis Sabeti, a researcher who has helped to pioneer the field of genomic epidemiology. (Sabeti was profiled in The New Yorker in 2014, at the height of the Ebola outbreak, when she led a global effort that pinpointed the first person to be infected with the virus; the paper in which this finding was published credited five co-authors who were killed by the disease.) MacInnis is a close collaborator. The idea behind the work, as Lemieux described it to me, is that classical epidemiology can be informed by the study of a virus’s genome; researchers go back and forth between contact tracing and the lab; insights from each pull the other along “in a constant tug of war.” In the case of the Biogen conference, an early breakthrough came from twenty-eight samples collected through initial testing. Having sequenced and compared them, Lemieux and MacInnis’s research team discovered a mutation that was common to all of the cases. Remarkably, the mutation, C2416T, had only appeared twice in GISAID, an international database of COVID-19 sequences. Those two samples came from a pair of elderly French patients who were tested a couple days after the conference ended. Lemieux, MacInnis, and their co-authors wrote, “This strongly suggests low-level community transmission of C2416T in Europe in February 2020 before the allele came to Boston via a single introduction, which was then amplified by superspreading at the conference.” The researchers also identified a second mutation, which was present in seven of the twenty-eight samples but not recorded in any database; they believe that it developed during or shortly after the two-day conference, which gives some sense of how quickly the virus can evolve.
In general, the team at the Broad Institute was more interested in understanding how the disease had spread than in the conference as a spreader event, so even as they worked on the Biogen samples they also sought and analyzed other samples. Even in those with no connection to C2416T, they tended to find some similar patterns. One data set came from a nursing home in greater Boston that had tested all its residents for COVID-19. None of the residents had been suspected of having the virus, but it turned out that eighty-five per cent did. Sequencing the genomes from these samples, Lemieux and MacInnis’s team discovered that three separate lineages of the virus had entered the facility, likely over a period of a few weeks in March—three knocks at the door. Two lineages infected a couple of patients and did not spread further. The third spread like wildfire, quickly infecting nearly the entire facility. This lineage also followed the superspreader pattern: some factor had given it an unusual reach. “Most cases don’t lead to a lot of onward infection—they stop after one or two cases,” MacInnis said. “But then certain cases, and certain introductions, just explode.”
The C2416T lineage evolved every week or two, but its genomic signature remained consistent enough that the researchers could follow it out from the conference and across the country. In March, the Massachusetts Department of Public Health had identified about a hundred cases that were connected to the Biogen conference. But public-health tracing is designed to identify and isolate close contacts of those who have been infected, not to follow the spread of the virus infinitely into the future. “They’re usually limited in scope to the people at the event and maybe first-degree contacts—maybe secondary, at most,” MacInnis told me. “But, when you bring the genomic data to the picture, you can see just how much further these events can go, and how much more of a devastating effect they can have.” Even while following the path of the C2416T lineage, the researchers were also collecting unrelated samples of the virus in the community. One cluster of samples that included the C2416T allele turned out to have come from a group of homeless shelters. Lemieux said, “And that was a moment when we were, like, ‘Oh, my God.’ ” Within a couple of weeks, the virus had migrated from the scientists and executives at the conference to the homeless. Lemieux and MacInnis’s team eventually estimated that the viruses descended from the conference were responsible for between thirty and forty per cent of cases across the state by Halloween.
In the press, there were clues pointing farther afield. A state health department might issue a press release saying they suspected that an outbreak originated with someone who had returned from Boston; a newspaper report might suggest the same thing. The researchers could pinpoint which samples to search, and quite frequently they found the telltale mutations. By the end of October, they discovered that the C2416T lineage was present in 1.9 per cent of all American genomes in the GISAID database. In North Carolina and Indiana, where outbreaks had clear connections to the Biogen conference, nearly twenty per cent of samples contained the C2416T allele. Another marker, called G26233T, which Lemieux and MacInnis believe developed during or shortly after the conference, had travelled to Australia, Sweden, and Slovakia.
In Lemieux’s view, he and his colleagues had supplanted epidemiological theory about how the virus spread with empirical fact, and found that the spread was far more explosive than they might have otherwise predicted. The story of the pandemic in the United States is marked by suspected superspreader events: a motorcycle rally in South Dakota, Mardi Gras in New Orleans, a funeral in Albany, Georgia, and another in Detroit. Lemieux said, “If you look at the sequences in Louisiana, they’re highly related in much the same way that a good fraction of the sequences in Boston are related. Then you wonder, Well, was that Mardi Gras? But no one, to my knowledge, connected where those people went and who they interacted with to those sequences.” It was very difficult to see the full picture, Lemieux said, unless you were returning “to the blackboard and drawing the linkages between people” and then merging that information with the data from genomic sequencing. He added, “I think that if we looked and did similar studies in other places, at other events, we’d really come to appreciate how the virus may not be behaving as we predicted it would, or how we think it does.”
The Biogen conference ended on February 27th, at which point there had been only about a dozen deaths from COVID-19 across the United States. The turning point in the Democratic nominating contest occurred the next day, when Joe Biden won the South Carolina primary; it would be effectively over a day after that, when Biden largely swept the Super Tuesday slate and several of his rivals dropped out and endorsed him at a rally in Dallas—an evening that ended with Biden and Beto and Amy O’Rourke happily sliding into a booth at a Whataburger, without masks. On the campaign trail, I’d grown accustomed to candidates, when events grew sleepy, turning to two reliable applause lines: one was to criticize Betsy DeVos, the Secretary of Education, and the other was to promise that a Democratic White House would “listen to the scientists” about COVID-19. Listening to scientists would have been an immeasurable improvement on how Donald Trump spent the year, likely saving tens of thousands of lives. But the slogan also obscured how little scientists knew at the time about how the virus spread—and how partial our understanding of it remains.
On Sunday, after speaking with Lemieux and MacInnis, I called a Harvard epidemiologist named William Hanage, who collaborated on their paper and had been thinking about its public-health implications, to ask what should have been done differently. I suppose, in a way, I was asking about a parallel year—about what could have happened had politicians listened to the advice of scientists, and had scientists seen the whole pandemic as clearly as the Broad researchers now see the Biogen event. Hanage said that, although scientists did not immediately know how unbalanced the spread could be, they did suspect that there were superspreading dynamics from early on in the pandemic, and that public policy could have been built around that insight. The study, Hanage said, “strongly suggests that limiting opportunities for superspreading helps, and what that means is limiting gatherings, basically.” For example, Hanage pointed out that Japan had successfully pursued a far more aggressive and centralized approach aimed at “cluster-busting.” He said, “I think also the number of introductions illustrates the relative importance of—I don’t want to call them travel bans, but—preventing the early introductions of the virus into the community. That’s something which I think is much more widely accepted now than it was.” He predicted that the “dogma” that argued against restricting travel would be widely revisited.
They don’t sound like measures that Americans were ready to accept, before even Biden was wearing a mask, before public-health authorities were sure that masks would help, and before they fully understood that the virus was airborne. Measures like these required trust, and every link in that chain was weak: the public did not broadly trust the political leadership, the President did not trust the scientists, and the scientists could not yet clearly see the disease for what it was, because they were still basing their knowledge on theory and analogy rather than empirical fact. In many ways, they still are. We expect a certain pattern to a catastrophic event like this, but the pandemic has upended it; the vaccine has arrived before we fully understand the disease. When I replayed the tape of my conversation with Lemieux and MacInnis, I could hear in MacInnis’s voice, in particular, a current of awe—that, a year in, we were still discovering how contagious and unpredictable the virus can be. In the recording of my own voice, I heard futility—that, no matter how brilliant the scientists studying it, the virus had a head start.