Mining Contract: Mine Health and Safety Big Data Analysis and Text Mining by Machine Learning Algorithms

Keywords: Data analysis
Contract # 75D30121C12375
Start Date 9/1/2021
Research Concept

The mining industry has experienced a significant reduction in fatal accidents in the United States over the last two decades. While encouraging, these statistics are somewhat misleading since the total number of mine workers and employee hours worked have also declined over this period, especially in the coal sector. Consequently, the overall incident rate of fatal accidents has remained essentially unchanged since 2015. In light of this problem, the mining industry and government have both shown a sincere commitment to actively promote efforts to reduce the incident rates for fatalities and nonfatal lost-time injuries by implementing new initiatives, processes, and technologies, including advances in safety and health management and data-driven decision making.

Contract Status & Impact

This contract is ongoing. For more information on this contract, send a request to mining@cdc.gov.

The objective of this contract research is to direct Big Data analytics and machine learning techniques to use leading data indicators (i.e., as opposed to lagging indicators) to help identify, predict, and control risks in mining. This work represents a collaborative effort with Michigan Technological University, the Colorado School of Mines, and private and governmental agencies.

This research will address three fundamental questions:

  1. What leading indicator data from State Workers’ Compensation Claim and Mine Safety and Health Administration (MSHA) databases are best suited to predict health and safety risks or adverse outcomes?
  2. What machine learning techniques are best suited to analyze health and safety Big Data in the mining industry?
  3. What health and safety outcomes (i.e., risks) can be predicted using available leading indicator data?

To achieve the research goal, the following objectives will be addressed: (a) identify and extract possible leading indicator variables and health and safety (H&S) risks from MSHA databases; (b) develop machine learning models to uncover the most influential leading indicators for H&S risks; and (c) develop a software tool for using developed models for H&S risk assessment.


Page last reviewed: February 27, 2023
Page last updated: February 27, 2023