2021 Project: University of California, Santa Cruz (UCSC)

Empowering comprehensive SARS-CoV-2 strain surveillance and transmission pattern inference for public health practitioners

What to know

UCSC increased the understanding of SARS-CoV-2 by developing a genomic data platform for both scientists and public health officials. The platform simplified interactions with genomic data and enabled users to rapidly cross-reference and identify genomic variation data sets in a unified setting. The platform provided downloads of SARS-CoV-2 evolutionary maps in both scientific diagrams and in plain language to facilitate widespread use. All platform efforts ensured that users have access to complete, accurate, and understandable genome sequence data for all SARS-CoV-2 variants.

Decorative image with words "2021" and "SARS-CoV-2"

New and improved sequencing software tools

This project:

  • Expanded and improved data aggregation, display, and visualization within the UCSC SARS-CoV-2 Genome Browser through development of the following tools:
    • Cluster-Tracker rapidly identified strains that have recently been introduced into, and transmitted within, a region. The tool also identifies the likely geographic origin of the strain. In addition, it also provides documentation to allow other jurisdictions to build their own versions of the tool.1
    • MatUtils toolsuite enabled rapid queries, interpretation, and manipulation of mutation-annotated phylogenetic trees.2
    • Big Tree Explorer allowed effective analysis of global SARS-CoV-2 and other pathogen phylogenies.
  • Developed a database of SARS-CoV-2 phylogenic trees updated daily to provide a comprehensive view of the virus' evolutionary history using public data.2
  • Developed matOptimize, a method enabling the online study of phylogenetics, which is the study of evolutionary relationships among living matter, of SARS-CoV-2. The method allows for significantly increased workloads (e.g., extremely large data sets daily) to refine SARS-CoV-2 phylogenetic trees, using and maintaining the same libraries as UShER, a program using algorithms to infer what mutations might occur. matOptimize can also be installed with workflows available on a free, and open-source platform (e.g., Conda, Dockstore).3
  • Developed ShUShER, a private, client-side port of UShER, a program using algorithms to infer what mutations might occur, for phylogenetic placement of private genome sequence data for analysis behind a firewall. Codebase and documentation are available.4

New SARS-CoV-2 dashboards and websites

RIVET is a platform for exploring putatively recombinant SARS-CoV-2 lineages. The platform uses the RIPPLES algorithm, run daily, to identify potential lineages and provides exhaustive quality control to support their exploration.