Genomic Surveillance for SARS-CoV-2 Variants Circulating in the United States, December 2020–May 2021
Weekly / June 11, 2021 / 70(23);846–850
Prabasaj Paul, PhD1; Anne Marie France, PhD1; Yutaka Aoki, PhD1; Dhwani Batra, MS, MBA1; Matthew Biggerstaff, ScD1; Vivien Dugan, PhD1; Summer Galloway, PhD1; Aron J. Hall, DVM1; Michael A. Johansson, PhD1; Rebecca J. Kondor, PhD1; Alison Laufer Halpin, PhD1; Brian Lee, MPH1; Justin S. Lee, DVM, PhD1; Brandi Limbago, PhD1; Adam MacNeil, PhD1; Duncan MacCannell, PhD2; Clinton R. Paden, PhD1; Krista Queen, PhD1; Heather E. Reese, PhD1; Adam C. Retchless, PhD1; Rachel B. Slayton, PhD1; Molly Steele, PhD1; Suxiang Tong, PhD1; Maroya S. Walters, PhD1; David E. Wentworth, PhD1; Benjamin J. Silk, PhD1 (View author affiliations)View suggested citation
What is already known about this topic?
SARS-CoV-2 variants have the potential to affect transmission, disease severity, diagnostics, therapeutics, and natural and vaccine-induced immunity.
What is added by this report?
CDC’s genomic surveillance for SARS-CoV-2 variants generates population-based estimates of the proportions of variants among all SARS-CoV-2 infections in the United States. During April 11–24, 2021, the B.1.1.7 and P.1 variants represented an estimated 66.0% and 5.0% of U.S. infections, respectively, demonstrating the potential for new variants to emerge and become predominant.
What are the implications for public health practice?
Robust genomic surveillance can help guide prevention strategies (e.g., enhanced vaccination coverage efforts) and clinical management decisions (e.g., monoclonal antibody distribution) to control the COVID-19 pandemic in the United States.
Views equals page views plus PDF downloads
SARS-CoV-2, the virus that causes COVID-19, is constantly mutating, leading to new variants (1). Variants have the potential to affect transmission, disease severity, diagnostics, therapeutics, and natural and vaccine-induced immunity. In November 2020, CDC established national surveillance for SARS-CoV-2 variants using genomic sequencing. As of May 6, 2021, sequences from 177,044 SARS-CoV-2–positive specimens collected during December 20, 2020–May 6, 2021, from 55 U.S. jurisdictions had been generated by or reported to CDC. These included 3,275 sequences for the 2-week period ending January 2, 2021, compared with 25,000 sequences for the 2-week period ending April 24, 2021 (0.1% and 3.1% of reported positive SARS-CoV-2 tests, respectively). Because sequences might be generated by multiple laboratories and sequence availability varies both geographically and over time, CDC developed statistical weighting and variance estimation methods to generate population-based estimates of the proportions of identified variants among SARS-CoV-2 infections circulating nationwide and in each of the 10 U.S. Department of Health and Human Services (HHS) geographic regions.* During the 2-week period ending April 24, 2021, the B.1.1.7 and P.1 variants represented an estimated 66.0% and 5.0% of U.S. SARS-CoV-2 infections, respectively, demonstrating the rise to predominance of the B.1.1.7 variant of concern† (VOC) and emergence of the P.1 VOC in the United States. Using SARS-CoV-2 genomic surveillance methods to analyze surveillance data produces timely population-based estimates of the proportions of variants circulating nationally and regionally. Surveillance findings demonstrate the potential for new variants to emerge and become predominant, and the importance of robust genomic surveillance. Along with efforts to characterize the clinical and public health impact of SARS-CoV-2 variants, surveillance can help guide interventions to control the COVID-19 pandemic in the United States.
With high levels of SARS-CoV-2 transmission globally, continued emergence of new variants is expected. Variants have potential impacts on COVID-19 severity, transmission, diagnostics, therapeutics, and natural and vaccine-induced immunity (1). The emergence and rapid expansion of multiple SARS-CoV-2 variants of interest§ (VOIs) and VOCs, and the potential for variants of high consequence,¶ (VOHCs) (Supplementary Table 1, https://stacks.cdc.gov/view/cdc/106690) indicate the need for robust genomic surveillance to monitor circulating viruses and help guide the public health response to the COVID-19 pandemic.
CDC’s national SARS-CoV-2 genomic surveillance program includes genomic sequences from the National SARS-CoV-2 Strain Surveillance (NS3) program and contracted commercial laboratories. Each week, public health laboratories from all U.S. jurisdictions (50 states, the District of Columbia, and eight U.S. territories and freely associated states) are requested to submit a target number of specimens representative of the geographic and demographic diversity in each jurisdiction collected during the preceding 7 days, which can be achieved through random selection.** Specimens are submitted to CDC for assessment, sequencing, and genomic analysis. SARS-CoV-2 lineages are assigned using the Phylogenetic Assignment of Named Global Outbreak Lineages software (PANGOLIN; version 3.03; Rambaut Laboratory) (2).
In December 2020, CDC expanded the volume of SARS-CoV-2 sequencing through contracts with large commercial diagnostic laboratories, which were selected based on geographic coverage and specimen volume. Commercial laboratories submit random samples of geographically diverse sequences with limited demographic data to CDC weekly. Specimen sources for these laboratories include retail pharmacies, community testing sites, and inpatient and outpatient health care settings served by large commercial laboratories; any type of specimen tested for SARS-CoV-2 by reverse transcription–polymerase chain reaction (RT-PCR) may be submitted. Commercial laboratories use a variety of platforms and approaches to conduct sequencing; all SARS-CoV-2 sequence data are submitted to CDC for quality assessment, genomic analysis, and database upload. Sequences generated by both NS3 and commercial laboratories are deposited into public repositories (National Center for Biotechnology Information [NCBI] and Global Initiative on Sharing All Influenza Data [GISAID]). Data from genomic surveillance based on specimens received from NS3 and commercial laboratories were analyzed weekly to monitor SARS-CoV-2 variants circulating in the United States. This activity was reviewed by CDC and was conducted consistent with applicable federal law and CDC policy.††
The estimated proportions of variant lineages among circulating SARS-CoV-2 viruses are calculated based on specimen collection date. Proportions of all lineages accounting for >1% of sequences nationally during the preceding 12 weeks as well as all VOIs and VOCs identified among circulating viruses are estimated nationally and for all 10 HHS regions and are updated weekly to CDC’s COVID Data Tracker.§§
Because the proportion of sequenced SARS-CoV-2 infections varies geographically and over time, proportions of variants at the jurisdiction level and by week of specimen collection are weighted to generate population-based national and regional estimates of the proportion of each circulating variant among all SARS-CoV-2 infections. Weighting accounts for the inverse probability that 1) a specimen from a positive RT-PCR test was sequenced (wp), and 2) a person with SARS-CoV-2 infection was tested by RT-PCR (wi) (i.e., the infection was diagnosed). To calculate wp, first the number of positive RT-PCR tests is divided by the number of sequences in the sample to obtain a weight to represent all RT-PCR positive cases; this weight is then adjusted for a known sampling bias (oversampling of S-gene target failure [SGTF] results by one laboratory) using a logistic regression model that assumes no sampling bias in the remainder of the laboratories. Second, wi is calculated to account for variations in probability of RT-PCR testing among persons with SARS-CoV-2 infection. SARS-CoV-2 infection incidence is estimated as the geometric mean of the incidence of test-positive cases and the percentage of positive test results.¶¶ The estimated number of infections, divided by the number of RT-PCR–positive cases, yields wi. The final weight is the inverse of the probability that a person with SARS-CoV-2 infection contributes to the sample of sequences and is calculated as wp multiplied by wi. Variance is estimated for 95% confidence intervals (CIs) using a survey design-based approach. HHS regions are designated as survey strata, and data sources within each state are designated clusters (i.e., NS3 or each commercial laboratory).
Because the time from specimen collection to sequence availability currently is approximately 3 weeks, projections extending beyond the time frame of available data are made to enable estimation of current variant proportions during this 3-week interval. These projections, termed “nowcasts,” and their 95% prediction intervals, are generated by using a multinomial logistic regression model fit to weighted sequencing data. The nowcast model is a multivariant extension to a two-variant framework previously described (3). Nowcast estimates are projections and might differ from weighted estimates that are subsequently generated for the same periods.***
As of May 6, 2021, a total of 177,044 SARS-CoV-2 viral sequences for specimens collected during December 20, 2020–May 6, 2021 from 55 U.S. states and territories had been generated by NS3 or reported to CDC by contract laboratories; these included 3,275 sequences from specimens collected during the 2-week period ending January 2, 2021, compared with a sixfold increase to 25,000 sequences from specimens collected during the 2-week period ending April 24, 2021 (accounting for 0.1% and 3.1% of positive RT-PCR tests reported to CDC, respectively) (Figure). The proportion of specimens with sequences varied across states (Supplementary Table 2, https://stacks.cdc.gov/view/cdc/106690); weighting methods generated regional- and national-level estimates of variant proportions over time.
The B.1.1.7 VOC represented an estimated 0.2% of U.S. infections during the 2-week period ending January 2 and increased to 66.0% during the 2-week period ending April 24 (Table). During this period, estimated proportions of B.1.1.7 infections varied across HHS regions, from 50.9% in HHS Region 1 to 74.1% in HHS Region 6 (Supplementary Table 2, https://stacks.cdc.gov/view/cdc/106690). This rapid expansion is consistent with a model-based prediction that B.1.1.7 could become a predominant variant (3). The nowcast model estimated that B.1.1.7 represents 72.4% (95% prediction interval = 67.4%–77.1%) of infections for the 2-week period April 25–May 8, 2021 (Table). The P.1 VOC first appeared the 2 weeks ending January 30; by the 2-week period ending April 24, the P.1 variant represented an estimated 5.0% of infections, ranging from 1.6% in HHS Region 3 to 7.7% in HHS Region 5 (Table). Uncertainty around point estimates, as captured by confidence and prediction intervals, differed substantially by variant, time period, and region (Table).
The distribution of circulating SARS-CoV-2 variants in the United States changed rapidly during December 2020–May 2021. The expansion of the B.1.1.7 VOC to become the predominant variant in all U.S. regions within a 4-month period, and the more recent emergence of the P.1 VOC in all regions, underscore the critical need for robust and timely genomic surveillance. These findings are consistent with reports of potential increased transmission of the B.1.1.7 and P.1 variants††† (4). In addition, there is evidence of potential impact of B.1.1.7 on diagnostics (i.e., SGTF in at least one RT-PCR–based diagnostic assay) (5) and disease severity and potential impact of P.1 on therapeutics and immunity (1). Four additional VOIs or VOCs (B.1.526, B.1.526.1, B.1.429, and B.1.427) are estimated to each account for >1% of circulating infections domestically as of the 2-week period ending April 24. Currently, there is no variant listed as a VOHC.
The findings in this report are subject to at least four limitations. First, although U.S. SARS-CoV-2 genomic sequencing has rapidly expanded in volume and in geographic coverage since late 2020, assessments of the national and regional representativeness of sequence data are needed. Second, although the weighting and variance estimation methods used for this analysis adjust these data to generate population-based estimates of variant proportions and quantify uncertainty, the methods assume that, within strata and clusters, sequence reporting is random. This assumption might be inaccurate; the true representativeness of sequenced specimens within each jurisdiction is unknown. Linking sequencing with epidemiologic data, for example from national case-based surveillance, might provide a better understanding of representativeness, so that specimen selection and weighting methods can be further adjusted as needed. Analyses at state and local levels have demonstrated the utility of linking sequencing with sentinel or population-based surveillance data to characterize new SARS-CoV-2 variants (6,7). Third, sequencing data from many state and local public health laboratories that are conducting SARS-CoV-2 surveillance sequencing apart from NS3§§§ are not yet available for inclusion in national estimates. Efforts to integrate such state and local genomic surveillance data into national surveillance and further improve national and regional surveillance are in progress. Finally, as sequence data become more complete over time, national and regional weighted estimates might change.
To respond to emerging SARS-CoV-2 variants, CDC rapidly expanded national genomic surveillance to monitor trends in circulating SARS-CoV-2 variants nationally and regionally. Along with efforts to characterize the clinical and public health impact of variants, surveillance can help guide interventions to mitigate the COVID-19 pandemic in the United States by informing prevention strategies (e.g., enhanced vaccination coverage efforts) and clinical management decisions (e.g., monoclonal antibody distribution).
Public health program and laboratory staff members who contribute to NS3, including the Association of Public Health Laboratories, and commercial laboratory staff members; Mark Burroughs, Jason Caravas, Roxana Cintron Moret, Peter W. Cook, Morgan L. Davis, Katie Dillon, Christopher Gulvik, Norman Hassell, Dakota Howard, Maliha Ishaq, Jesica Jacobs, Kristen Knipe, Kristine Lacek, Shoshona Le, Yan Li, Rachel L. Marine, Gillian McAllister, Anna Montmayeur, Kara Moser, Sarah Nobles, Jasmine Padilla, Benjamin Rambo Martin, Alexis Roundtree, Lori A. Rowe, Matthew Schmerer, Mili Sheth, Samuel Shepard, Sarah Talarico, Ying Tao, Anna Uehara, Yvette Unoarumhi, Haibin Wang, Jing Zhang, CDC.
Corresponding author: Prabasaj Paul, email@example.com.
All authors have completed and submitted the International Committee of Medical Journal Editors form for disclosure of potential conflicts of interest. No potential conflicts of interest were disclosed.
† A variant for which there is evidence of an increase in transmissibility, more severe disease (e.g., increased hospitalizations or deaths), significant reduction in neutralization by antibodies generated during previous infection or vaccination, reduced effectiveness of treatments or vaccines, or diagnostic detection failures.
§ A variant with specific genetic markers that have been associated with changes to receptor binding, reduced neutralization by antibodies generated against previous infection or vaccination, reduced efficacy of treatments, potential diagnostic impact, or predicted increase in transmissibility or disease severity.
¶ A variant with clear evidence of significantly reduced effectiveness of prevention measures or medical countermeasures relative to previously circulating variants.
†† 45 C.F.R. part 46.102(l)(2), 21 C.F.R. part 56; 42 U.S.C. Sect.241(d); 5 U.S.C.0 Sect.552a; 44 U.S.C. Sect. 3501 et seq.
*** Supplementary materials, including R code and sample data, are available at https://github.com/CDCgov/SARS-CoV-2_Genomic_Surveillanceexternal icon.
- CDC. SARS-CoV-2 variant classifications and definitions. Atlanta, GA: US Department of Health and Human Services, CDC; 2021. https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/variant-surveillance/variant-info.html
- Rambaut A, Holmes EC, O’Toole Á, et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol 2020;5:1403–7. https://doi.org/10.1038/s41564-020-0770-5external icon PMID:32669681external icon
- Galloway SE, Paul P, MacCannell DR, et al. Emergence of SARS-CoV-2 B.1.1.7 lineage—United States, December 29, 2020–January 12, 2021. MMWR Morb Mortal Wkly Rep 2021;70:95–9. https://doi.org/10.15585/mmwr.mm7003e2external icon PMID:33476315external icon
- Davies NG, Abbott S, Barnard RC, et al.; CMMID COVID-19 Working Group; COVID-19 Genomics UK (COG-UK) Consortium. Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England. Science 2021;372:eabg3055. https://doi.org/10.1126/science.abg3055external icon PMID:33658326external icon
- Public Health England. Investigation of novel SARS-CoV-2 variant: variant of concern 202012/01. Technical briefing 3. London, United Kingdom: Public Health England; 2020. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/959360/Variant_of_Concern_VOC_202012_01_Technical_Briefing_3.pdfpdf iconexternal icon
- Thompson CN, Hughes S, Ngai S, et al. Rapid emergence and epidemiologic characteristics of the SARS-CoV-2 B.1.526 variant—New York City, New York, January 1–April 5, 2021. MMWR Morb Mortal Wkly Rep 2021;70:712–6. https://doi.org/10.15585/mmwr.mm7019e1external icon PMID:33983915external icon
- Martin Webb L, Matzinger S, Grano C, et al. Identification of and surveillance for the SARS-CoV-2 variants B.1.427 and B.1.429—Colorado, January–March 2021. MMWR Morb Mortal Wkly Rep 2021;70:717–8. https://doi.org/10.15585/mmwr.mm7019e2external icon PMID:33988184external icon
FIGURE. Number of SARS-CoV-2 genomic sequences generated by National SARS-CoV-2 Strain Surveillance or reported to CDC by commercial laboratories* for specimens collected December 20, 2020—May 6, 2021, by laboratory source — United States, May 6, 2021
Abbreviation: NS3 = National SARS-CoV-2 Strain Surveillance
* Sequences generated by or reported to CDC through NS3 and contract laboratories do not include the >5,000 sequences per week produced by public health laboratories and other U.S. institutions, which are not currently integrated into CDC’s surveillance for SARS-CoV-2 variants using genomic sequencing. https://covid.cdc.gov/covid-data-tracker/#published-covid-sequences
Suggested citation for this article: Paul P, France AM, Aoki Y, et al. Genomic Surveillance for SARS-CoV-2 Variants Circulating in the United States, December 2020–May 2021. MMWR Morb Mortal Wkly Rep 2021;70:846–850. DOI: http://dx.doi.org/10.15585/mmwr.mm7023a3external icon.
MMWR and Morbidity and Mortality Weekly Report are service marks of the U.S. Department of Health and Human Services.
Use of trade names and commercial sources is for identification only and does not imply endorsement by the U.S. Department of Health and Human Services.
References to non-CDC sites on the Internet are provided as a service to MMWR readers and do not constitute or imply endorsement of these organizations or their programs by CDC or the U.S. Department of Health and Human Services. CDC is not responsible for the content of pages found at these sites. URL addresses listed in MMWR were current as of the date of publication.
All HTML versions of MMWR articles are generated from final proofs through an automated process. This conversion might result in character translation or format errors in the HTML version. Users are referred to the electronic PDF version (https://www.cdc.gov/mmwr) and/or the original MMWR paper copy for printable versions of official text, figures, and tables.
Questions or messages regarding errors in formatting should be addressed to firstname.lastname@example.org.