NIOSHTIC-2 Publications Search

Development and evaluation of an auto-coding model for coding unstructured text data among workers' compensation claims.

Bertke SJ; Meyers AR; Wurzelbacher SJ; Bell J; Lampl ML; Robins D
Use of workers' compensation data for occupational safety and health: proceedings from June 2012 workshop. Utterback DF, Schnorr TM, eds. Cincinnati, OH: U.S. Department of Health and Human Services, Public Health Service, Centers for Disease Control and Prevention, National Institute for Occupational Safety and Health, DHHS (NIOSH) Publication No. 2013-147, 2013 May; :153-156
Work-related musculoskeletal disorders caused by ergonomic risk factors (MSDs) such as overexertion and repetitive motion and injuries caused by a slip, trip or fall (STF) are common among workers and result in pain, disability, and substantial cost to workers and employers (Bureau of Labor Statistics, 2011; Liberty Mutual Research Institute for Safety, 2011). The majority of work-related occupational injuries and illnesses can be categorized as a MSD or a STF (Bureau of Labor Statistics, 2011). Improved surveillance of occupational illnesses and injuries (II) classified as MSDs and STFs has been a high national priority, as determined by the National Occupational Research Agenda (NORA). In fact, ninety percent of the time, surveillance of MSDs and STFs were included as strategic goals among the ten NORA sectors' (e.g. manufacturing, construction, wholesale/retail trade [WRT]) agendas. Tracking the incidence and prevalence of MSDs and STFs among Ohio workers is one aim of the partnership between the National Institute for Occupational Safety and Health (NIOSH) and the Ohio Bureau of Workers' Compensation (OBWC). The OBWC collects claims data primarily to manage claims and determine future workers' compensation premiums. Prior to 2007, OBWC had no systematic way of tracking events or exposures (i.e. causation) such as ergonomic risk factors and slips, trips, or falls. Causation was only recorded in a free-text field (unstructured data) used to describe the work-related cause of the claim. Tracking the incidence and prevalence of MSDs and STFs among Ohio workers would therefore require coding causation for millions of unstructured fields and to do this manually was not feasible. Recently, Lehto et al (Lehto et al 2009; Wellman et al, 2004) demonstrated that computer learning algorithms using Bayesian methods could auto-code injury narratives into different causation groups, without any manual intervention, efficiently and accurately. The authors demonstrated that the algorithms could code thousands of claims in a matter of minutes or hours with a high degree of accuracy by "learning" from claims previously coded by experts, referred to as a training set. Furthermore, these algorithms provided a score for each claim that reflected the algorithm's confidence in the prediction and, therefore, claims with low confidence scores could be flagged for manual review. The main goal of this project was to develop and evaluate an auto-coding method which could be used to aid the manual coding of OBWC claim causations as MSD, STF, or other (OTH).
Workers; Work-environment; Injuries; Accidents; Risk-factors; Hazards; Health-protection; Surveillance-programs; Preventive-medicine; Traumatic-injuries; Humans; Men; Women; Health-care; Statistical-analysis; Ergonomics; Risk-factors; Repetitive-work; Construction; Construction-industry
Publication Date
Document Type
Conference/Symposia Proceedings
Utterback DF; Schnorr TM
Fiscal Year
Identifying No.
NIOSH Division
Priority Area
Wholesale and Retail Trade
Source Name
Use of workers' compensation data for occupational safety and health: proceedings from June 2012 workshop
Page last reviewed: April 1, 2022
Content source: National Institute for Occupational Safety and Health Education and Information Division