Data Science and Public Health

At a glance

The CDC Injury Center uses a strategic approach to data science and data systems to protect the public's health and safety.

Hand reaching towards a data science screen

Data science and public health

Data science blends techniques from computer science, statistics, and epidemiology, and other areas. Data science often focuses on large or new data sources. It can apply sophisticated mathematical methods such as machine learning or natural language processing.

New data science approaches show promise in addressing critical public health needs, including injury and violence prevention. These approaches can help improve the timeliness of health information, respond to public health threats earlier, and increase the efficiency and effectiveness of prevention campaigns.

The Injury Center data science strategy

The Injury Center's data science strategy outlines the goals and activities to improve CDC's data science work in injury and violence prevention. To advance these goals, the Injury Center will strengthen its internal data science workforce, expand public health partnerships, advance information technology infrastructure, and increase investments in data science activities.

The Injury Center's specific goals are listed below, along with key progress made in each goal.

Goal 1: Expand the availability and utility of more timely data for injury and violence prevention

Injury Center has increased the number and diversity of data sources being used to understand trends nationwide. We have developed, validated, and published machine-learning-based models for real-time estimation of national suicide, opioid overdose, and firearm homicide fatality trends with predictive accuracy of 99%.

Goal 2: Improve rapid identification of health threats and response to communities

We published scientific work demonstrating how new online data sources and natural language processing methods can improve early tracking of overdose trends.

Goal 3: Increase access to accurate health information and prevent misinformation

We collaborated with industry, non-profit, and academic partners to study safe-reporting guidelines for suicide information, evaluate media campaigns for help-seeking during mental health crises, and identify common overdose-related health misinformation.

Goal 4: Enhance the usefulness of current data systems by improving data linkage

We funded recipients to link nonfatal and fatal overdose data and fatal firearm data with additional data sources to better understand the context leading to these events. Additionally, the Division of Violence Prevention linked Social Determinants of Health data to the National Violence Death Reporting System.

Goal 5: Share information in compelling, useful, and accessible ways

We developed dashboards and enhanced the Web based Injury Statistics Query and Reporting System (WISQARS) to provide access to more timely injury and violence data.

Goal 6: Advance ethical practices for data science for injury and violence prevention

The Injury Center met with researchers and ethicists to identify best practices in using social media data for public health research and supported research that leverages innovative data science methods to advance health equity. We are developing a manuscript that details ethical considerations for data science in injury and violence prevention.

Goal 7: Increase efficiency of analytic and scientific processes for injury and violence prevention

We developed automated methods to clean, validate, and remove personally identifiable information from injury and violence datasets, which has increased the timeliness and availability of data.

Goal 8: Evaluate promising state and local data science efforts for injury prevention and expand the capacity and local health partners in data science methodologies

We funded over 30 state and local health departments and their partners to increase data science capacity and improve injury and violence prevention in localities across the United States.

Keep Reading: Data Science Strategy

Featured resources and projects

Using Data and Research to Save Lives

Public health data science blends techniques from computer science, statistics, and epidemiology to extract insights from data. It often focuses on novel, large, and complex data sources and applies methods such as machine learning and natural language processing. Data science can help public health efforts to prevent injury by improving data timeliness, identifying emerging health trends, and using novel data sources.

Advancing Safe Reporting on Suicide

CDC partnered with Facebook to better understand adherence to suicide reporting guidelines in news articles shared on social media. The work resulted in the first scientific evidence that greater adherence to suicide reporting guidelines in news articles not only is beneficial for the health of individuals but also increases publisher reach and engagement—helping to motivate better use of safe reporting practices for suicide news online.