Genomic Surveillance for SARS-CoV-2 Variants Circulating in the United States, December 2020–May 2021

SARS-CoV-2, the virus that causes COVID-19, is constantly mutating, leading to new variants (1). Variants have the potential to affect transmission, disease severity, diagnostics, therapeutics, and natural and vaccine-induced immunity. In November 2020, CDC established national surveillance for SARS-CoV-2 variants using genomic sequencing. As of May 6, 2021, sequences from 177,044 SARS-CoV-2-positive specimens collected during December 20, 2020-May 6, 2021, from 55 U.S. jurisdictions had been generated by or reported to CDC. These included 3,275 sequences for the 2-week period ending January 2, 2021, compared with 25,000 sequences for the 2-week period ending April 24, 2021 (0.1% and 3.1% of reported positive SARS-CoV-2 tests, respectively). Because sequences might be generated by multiple laboratories and sequence availability varies both geographically and over time, CDC developed statistical weighting and variance estimation methods to generate population-based estimates of the proportions of identified variants among SARS-CoV-2 infections circulating nationwide and in each of the 10 U.S. Department of Health and Human Services (HHS) geographic regions.* During the 2-week period ending April 24, 2021, the B.1.1.7 and P.1 variants represented an estimated 66.0% and 5.0% of U.S. SARS-CoV-2 infections, respectively, demonstrating the rise to predominance of the B.1.1.7 variant of concern† (VOC) and emergence of the P.1 VOC in the United States. Using SARS-CoV-2 genomic surveillance methods to analyze surveillance data produces timely population-based estimates of the proportions of variants circulating nationally and regionally. Surveillance findings demonstrate the potential for new variants to emerge and become predominant, and the importance of robust genomic surveillance. Along with efforts to characterize the clinical and public health impact of SARS-CoV-2 variants, surveillance can help guide interventions to control the COVID-19 pandemic in the United States.

With high levels of SARS-CoV-2 transmission globally, continued emergence of new variants is expected. Variants have potential impacts on COVID-19 severity, transmission, diagnostics, therapeutics, and natural and vaccine-induced immunity (1). The emergence and rapid expansion of multiple SARS-CoV-2 variants of interest § (VOIs) and VOCs, and the potential for variants of high consequence, ¶ (VOHCs) (Supplementary Table 1, https://stacks.cdc.gov/view/ cdc/106690) indicate the need for robust genomic surveillance to monitor circulating viruses and help guide the public health response to the COVID-19 pandemic.
CDC's national SARS-CoV-2 genomic surveillance program includes genomic sequences from the National SARS-CoV-2 Strain Surveillance (NS3) program and contracted commercial laboratories. Each week, public health laboratories from all U.S. jurisdictions (50 states, the District of Columbia, and eight U.S. territories and freely associated states) are requested to submit a target number of specimens representative of the geographic and demographic diversity in each jurisdiction collected during the preceding 7 days, which can be achieved through random selection.** Specimens are submitted to CDC for assessment, sequencing, and genomic analysis. SARS-CoV-2 lineages are assigned using the Phylogenetic Assignment of Named Global Outbreak Lineages software (PANGOLIN; version 3.03; Rambaut Laboratory) (2).
In December 2020, CDC expanded the volume of SARS-CoV-2 sequencing through contracts with large commercial diagnostic laboratories, which were selected based on geographic coverage and specimen volume. Commercial laboratories submit random samples of geographically diverse sequences with limited demographic data to CDC weekly. Specimen sources for these laboratories include retail pharmacies, community testing sites, and inpatient and outpatient § A variant with specific genetic markers that have been associated with changes to receptor binding, reduced neutralization by antibodies generated against previous infection or vaccination, reduced efficacy of treatments, potential diagnostic impact, or predicted increase in transmissibility or disease severity. ¶ A variant with clear evidence of significantly reduced effectiveness of prevention measures or medical countermeasures relative to previously circulating variants. ** https://www.aphl.org/programs/preparedness/Crisis-Management/ Documents/2021.04.09_NS3_REVISED.pdf health care settings served by large commercial laboratories; any type of specimen tested for SARS-CoV-2 by reverse transcription-polymerase chain reaction (RT-PCR) may be submitted. Commercial laboratories use a variety of platforms and approaches to conduct sequencing; all SARS-CoV-2 sequence data are submitted to CDC for quality assessment, genomic analysis, and database upload. Sequences generated by both NS3 and commercial laboratories are deposited into public repositories (National Center for Biotechnology Information [NCBI] and Global Initiative on Sharing All Influenza Data [GISAID]). Data from genomic surveillance based on specimens received from NS3 and commercial laboratories were analyzed weekly to monitor SARS-CoV-2 variants circulating in the United States. This activity was reviewed by CDC and was conducted consistent with applicable federal law and CDC policy. † † The estimated proportions of variant lineages among circulating SARS-CoV-2 viruses are calculated based on specimen collection date. Proportions of all lineages accounting for >1% of sequences nationally during the preceding 12 weeks as well as all VOIs and VOCs identified among circulating viruses are estimated nationally and for all 10 HHS regions and are updated weekly to CDC's COVID Data Tracker. § § Because the proportion of sequenced SARS-CoV-2 infections varies geographically and over time, proportions of variants at the jurisdiction level and by week of specimen collection are weighted to generate population-based national and regional estimates of the proportion of each circulating variant among all SARS-CoV-2 infections. Weighting accounts for the inverse probability that 1) a specimen from a positive RT-PCR test was sequenced (w p ), and 2) a person with SARS-CoV-2 infection was tested by RT-PCR (w i ) (i.e., the infection was diagnosed). To calculate w p, first the number of positive RT-PCR tests is divided by the number of sequences in the sample to obtain a weight to represent all RT-PCR positive cases; this weight is then adjusted for a known sampling bias (oversampling of S-gene target failure [SGTF] results by one laboratory) using a logistic regression model that assumes no sampling bias in the remainder of the laboratories. Second, w i is calculated to account for variations in probability of RT-PCR testing among persons with SARS-CoV-2 infection. SARS-CoV-2 infection incidence is estimated as the geometric mean of the incidence of test- weight is the inverse of the probability that a person with SARS-CoV-2 infection contributes to the sample of sequences and is calculated as w p multiplied by w i . Variance is estimated for 95% confidence intervals (CIs) using a survey design-based approach. HHS regions are designated as survey strata, and data sources within each state are designated clusters (i.e., NS3 or each commercial laboratory).
Because the time from specimen collection to sequence availability currently is approximately 3 weeks, projections extending beyond the time frame of available data are made to enable estimation of current variant proportions during this 3-week interval. These projections, termed "nowcasts," and their 95% prediction intervals, are generated by using a multinomial logistic regression model fit to weighted sequencing data. The nowcast model is a multivariant extension to a twovariant framework previously described (3). Nowcast estimates are projections and might differ from weighted estimates that are subsequently generated for the same periods.*** As of May 6, 2021, a total of 177,044 SARS-CoV-2 viral sequences for specimens collected during December 20, 2020-May 6, 2021 from 55 U.S. states and territories had been generated by NS3 or reported to CDC by contract laboratories; these included 3,275 sequences from specimens collected during the 2-week period ending January 2, 2021, compared with a sixfold increase to 25,000 sequences from specimens collected during the 2-week period ending April 24, 2021 (accounting for 0.1% and 3.1% of positive RT-PCR tests reported to CDC, respectively) ( Figure). The proportion of specimens with sequences varied across states (Supplementary Table 2, https://stacks.cdc.gov/view/cdc/106690); weighting methods generated regional-and national-level estimates of variant proportions over time.
The B.1.1.7 VOC represented an estimated 0.2% of U.S. infections during the 2-week period ending January 2 and increased to 66.0% during the 2-week period ending April 24 (Table). During this period, estimated proportions of B.   (Table). Uncertainty around point estimates, as captured by confidence and prediction intervals, differed substantially by variant, time period, and region (Table).

Discussion
The The findings in this report are subject to at least four limitations. First, although U.S. SARS-CoV-2 genomic sequencing has rapidly expanded in volume and in geographic coverage since late 2020, assessments of the national and regional representativeness of sequence data are needed. Second, although the weighting and variance estimation methods used for this analysis adjust these data to generate population-based estimates of variant proportions and quantify uncertainty, the methods assume that, within strata and clusters, sequence reporting is random. This assumption might be inaccurate; the true representativeness of sequenced specimens within each jurisdiction  0.2 (0.1-0.4)  0.3 (0.1-0.9) 1.2 (0.7-2.1) 4.5 (2. 9-6.9) 11.4 (8.2-15.6) 27.3 (22.1-33.2) 44.6 (39.3-50.1) 59.5 (54.9-64.0) 66.0 (62.0-69.8) 72.4 (67.4-77.1 is unknown. Linking sequencing with epidemiologic data, for example from national case-based surveillance, might provide a better understanding of representativeness, so that specimen selection and weighting methods can be further adjusted as needed. Analyses at state and local levels have demonstrated the utility of linking sequencing with sentinel or population-based surveillance data to characterize new SARS-CoV-2 variants (6,7). Third, sequencing data from many state and local public health laboratories that are conducting SARS-CoV-2 surveillance sequencing apart from NS3 § § § are not yet available for inclusion in national estimates. Efforts to integrate such state and local genomic surveillance data into national surveillance and further improve national and regional surveillance are in progress. Finally, as sequence data become more complete over time, national and regional weighted estimates might change.

Summary
What is already known about this topic? SARS-CoV-2 variants have the potential to affect transmission, disease severity, diagnostics, therapeutics, and natural and vaccine-induced immunity.
What is added by this report? CDC's genomic surveillance for SARS-CoV-2 variants generates population-based estimates of the proportions of variants among all SARS-CoV-2 infections in the United States. During April 11-24, 2021, the B.1.1.7 and P.1 variants represented an estimated 66.0% and 5.0% of U.S. infections, respectively, demonstrating the potential for new variants to emerge and become predominant.
What are the implications for public health practice?
Robust genomic surveillance can help guide prevention strategies (e.g., enhanced vaccination coverage efforts) and clinical management decisions (e.g., monoclonal antibody distribution) to control the COVID-19 pandemic in the United States.
circulating SARS-CoV-2 variants nationally and regionally. Along with efforts to characterize the clinical and public health impact of variants, surveillance can help guide interventions to mitigate the COVID-19 pandemic in the United States by informing prevention strategies (e.g., enhanced vaccination coverage efforts) and clinical management decisions (e.g., monoclonal antibody distribution).