2020 Project: Broad Institute of MIT and Harvard

Real-time SARS-CoV-2 genomic surveillance to support clinical and public health response and monitor functionally relevant mutations

What to know

The Broad Institute of MIT and Harvard, Massachusetts General Hospital, and the Massachusetts Department of Public Health sought to enhance SARS-CoV-2 genomic surveillance in Massachusetts. They coupled a logical sampling plan with epidemiological and clinical data to better understand regional transmission patterns and link genetic variants to clinical outcomes. Awarded in 2020, this project developed bioinformatics tools for open data analysis and easy data sharing.

Decorative image with words "2020" and "SARS-CoV-2"

Findings on SARS-CoV-2 surveillance and investigations

This project:

  • Assisted the Massachusetts Department of Public Health with an urgent investigation of a local outbreak, which was one of the first to identify and characterize an outbreak of the P.1 (Gamma) variant outside of northern Brazil.1
  • Partnered with the Massachusetts Department of Public Health and local county public health to investigate a Delta variant outbreak among vaccinated persons in Provincetown, MA.2
  • Investigated comparative dynamics of the SARS-CoV-2 Delta and Alpha variants in the New England area, with Yale collaborators.34
  • Developed a simple and scalable method to track samples and identify contamination in high throughput sequencing workflows. Validated the method on >7000 samples and assisted the Mass General Brigham hospital network with real-time outbreak investigations and integration of genomic sequencing into infection control analysis, finding cross-contamination was not a reason for close relatedness among sequenced samples.5
  • Highlighted the transmissibility of the Omicron variant, its propensity to rapidly dominate small populations, and the ability of robust asymptomatic surveillance programs to offer early insights into the dynamics of pathogen arrival and spread.6

Findings on SARS-CoV-2 variant characteristics

This project developed PyR0, a multinomial logistic regression model that inferred relative fitness of SARS-CoV-lineages and forecasted growth of new lineages from their mutational profile.7

New SARS-CoV-2 dashboards and websites

This project developed a visualization dashboard, Cases, Deaths, and Testing in Massachusetts, for real time tracking of SARS-CoV-2 genomic surveillance and epidemiology data in the New England region.

New and improved sequencing software tools

This project:

  • Developed new tools to facilitate the deposition, sharing, aggregation and visualization of SARS-CoV-2 genomes from US public health labs through tool development, training, and support:
    • Adapted Terra, the Broad's cloud-based genomic data platform, as a data environment for academic and public health labs to manage and analyze genomic data. With additional support from CDC and in partnership with Theiagen Genomics. Terra enabled more than 40 US public health labs to securely manage and analyze their SARS-CoV-2 sequencing data with ease.
    • Shared a collection of vetted tools for viral genomic data analysis in Dockstore, the open-source bioinformatics tool repository. These tools could be exported directly to Terra, the Broad's cloud-based genomic data platform, to perform viral genomic data analysis and quality control analyses.
    • Developed tutorials demonstrating use of viral genomics tools in the Dockstore collection and Terra workspaces, hosted on the Terra YouTube channel.
  • Brought together publicly available data and tools in Terra, the Broad's cloud-based genomic data platform, for analysis including:
    • A workspace that enabled users to analyze data from NCBI's SRA or Genbank or data from their own research. Once the data was imported, researchers could perform a variety of analyses and data preparation steps, such as demultiplexing, reference-based assembly, QC, and calling of pango lineages. The workspace also created visualizations using NextClade and NextStrain.
    • A workspace that demonstrated an analysis of the introduction and spread of SARS-CoV-2 in Massachusetts and New England. The analysis began with raw reads and produced a phylogenetic tree, using tools from the Viral Genomics collection in Dockstore.