Collecting and Using Industry and Occupation Data
Things to Consider Before You Analyze Your Data
Here are important things to consider when analyzing industry and occupation data.
If you have questions while going through this process, please email us for help at NIOSHIOCoding@cdc.gov.
Prepare your dataset for analysis once you get your coded output
- Determine if your cell sizes are large enough for a meaningful analysis.
First, perform a frequency analysis for all industry or occupation groups to see if cell sizes are large enough for a meaningful analysis (e.g., 5 or more per cell to protect respondent identities). If they are not, group or categorize the industry and/or occupation codes into higher level groups.Categorizing data into higher level groups
- ensures sufficient cell size, and
- protects identity of survey participants.
- If cell sizes are large enough, but there are still too many industry or occupation groups to practically analyze the data, group the data into even broader industry or occupation categories using SAS, R or Epi Info 7.
The following can be used to create industry or occupation categories:
Keep in mind that the level of the detail needed for data categories depends on the purpose of the study.
If using NAICS and SOC Classification System codes:
NAICS and SOC are hierarchical.
- Industry codes using NAICS may be grouped into 20 two-digit codes, which represent the 20 large industry sectors (e.g. Construction, Manufacturing, etc).
- Occupation codes using SOC may be grouped into 23 two-digit codes, which represent the 23 major categories of occupations (Protective Services, Management, etc).
If using Census Industry and Occupation Classification System codes:
Group the Census industry and occupation codes into categories such as those developed by the National Center for Health Statistics (NCHS) for the National Health Interview Survey (NHIS) or by the Census Bureau for the Current Population Survey (CPS). See the SAS and R code, and how Epi Info can be used to create such groupings.
- National Health Interview Survey (NHIS)
From 2004 through 2018, NCHS published two sets of industry and occupation groupings for the NHIS.One set uses broader or “simple” categories, the other uses detailed categories.
- 21 simple industry groups and 23 simple occupation groups
- 79 detailed industry groups and 94 detailed occupation groups
The NHIS simple categories roughly match the NAICS two-digit codes (plus an additional code for armed forces) and SOC major groups
- Current Population Survey (CPS)
Another convenient, preexisting classification system for combining Census industry and occupation codes into either major (broad) or detailed groups is that used by the Census Bureau for the CPS and other datasets.This system combines related ranges of codes into
- 14 major industry groups 11 major occupation groups
- 52 detailed industry and 23 detailed occupation groups
- Select a comparison population or a denominator for analyses.
When calculating rates and proportions, consider whether you will use data from an external or internal source as a denominator. Some possible external sources for denominator data include (among others)
- American Community Survey (ACS)
Data from the ACS are available for national and individual lower-level geographies through census.gov.
If you need specific categories defined in the denominator, the ACS Public Use Microdata Sample (PUMS) files can be used to produce custom population estimates.
- Current Population Survey (CPS)
The CPS is a large, complex monthly survey. Creating annual or custom population estimates from the CPS dataset requires advanced analytical skills. If using CPS data for denominators, the NIOSH Employed Labor Force (ELF) query system easily generates CPS-based data tables.
- American Community Survey (ACS)
Things to Consider When Using Denominator Data from an External Source
Make sure your analytic dataset and the external denominator data are categorized using the same classification system (NAICS, SOC or Census).
If the external denominator data was coded using a different classification system than your analytic dataset, reclassify either the external data or your data so that they both are coded using the same classification system. Reclassify either dataset by cross-walking to a common classification system. Visit the Census Bureau’s Industry and Occupation Codes Lists and Crosswalks website for instructions for creating crosswalks.
Be aware that each of the classification systems also have their own limitations which may impact the analyses.
What do we mean by version?
All classification systems are updated periodically to assign standardized codes to new occupations and industries and remove codes for occupations and industries that no longer exist. Each time an update occurs, a new version is created.
Versions are referred to by the year the update occurred (e.g., Census 2002, Census 2010, etc.).
Use the same version for your analytic dataset and the denominator data.
If the denominator data are coded using a different version than your analytic dataset, reclassify either the denominator data or your analytic dataset so that they both use the same version. Align the versions by cross-walking your analytic dataset.
Visit the Census Bureau’s Industry and Occupation Codes Lists and Crosswalks website for instructions for creating crosswalks.
Grouping level refers to how detailed the codes are. If your analytic dataset are coded to the 20 major industry codes (e.g. construction, manufacturing), then the external denominator data also need to be coded to the 20 major industry codes to compare them.
Categorize the industry and occupation codes at the same grouping level for your analytic dataset and the denominator data (e.g., major industry and occupation groups).
If the level of grouping differs between the datasets, one or both must be recoded so that the categories match.
Complete Your Analysis
Once cell sizes are large enough for analysis, and your analytic dataset and the denominator data use the same classification system, version, and grouping level, assess outcomes of interest.
In the table below, we provide information about published examples of how NIOSH investigators have used industry and occupation data to analyze health, injury, exposures, fatalities, illnesses, and economic risk factors.
|Topic||Data Source (year/s)||I/O categories used||Result highlights||Link to publication|
|Workplace secondhand smoke (SHS) exposure||NHIS (2015)||78 detailed industry recode categories and some specific Census industry codes that were within recode categories with high reported prevalence of SHS exposure that had adequate sample sizes (data accessed through RDC)||Nonsmoking workers employed in the commercial and industrial machinery and equipment repair and maintenance industry reported the highest prevalences of any workplace SHS exposure (65.1%), whereas the construction industry had the highest reported number of exposed workers (2.9 million)||Link to article
|Low Back Pain (LBP)||NHIS (2015)||22 occupation groups (military excluded)||The prevalence of any LBP and work-related LBP was highest in construction and extraction occupations.||Link to article|
|Health Insurance Coverage||NHIS (2015)||4 broad occupational categories (see Appendix)||Workers in service and farming and production occupations were least likely to have health insurance in 2010 and 2015.||Link to article|
|Overdose deaths||NOMS||26 occupation groups||Construction occupations had the highest PMRs for drug overdose deaths and for both heroin-related and prescription opioid–related overdose deaths. The occupation groups with the highest PMRs from methadone, natural and semisynthetic opioids, and synthetic opioids other than methadone were construction, extraction (e.g., mining, oil and gas extraction), and health care practitioners.||Link to article|
|Opioid prescriptions||MEPS||8 occupation groups||Workers in occupations at higher risk for injury and illness – including construction and extraction; farming; service; and production, transportation, and material moving occupations – were more likely to obtain opioid prescriptions.|
|Asthma||BRFSS (2013)||21 industry groups and 23 occupation groups||State-specific prevalence of current asthma was highest among workers in the information industry (18.0%) in Massachusetts and in health care support occupations (21.5%) in Michigan.||Link to article|
|Asthma Mortality||NOMS (1999–2016)||U.S. Census 2000 Industry and Occupation Classification System||By industry, asthma mortality was significantly elevated among males in food, beverage, and tobacco products manufacturing, other retail trade, and miscellaneous manufacturing, and among females in social assistance. By occupation, asthma mortality was significantly elevated among females in community and social services.||Link to article|
|Workplace Smokefree policies and cessation programs||TUS-CPS (tobacco use supplement-CPS): 2014–2015||21 industry groups and 23 occupation groups||The proportion of indoor workers reporting 100% smoke-free varied by sociodemographic characteristics, industry, and occupation. The proportion of indoor workers reporting a 100% smoke-free policy at their workplace was highest in the education services industry and lowest in the agriculture, forestry, fishing, and hunting industry and by occupation highest proportion was in education training and library occupations (92.2%) and lowest in the farming, fishing, and forestry occupations (63.6%). Overall, 27.2% of all working adults reported having employer-offered cessation programs||Link to article|
|COPD among those who never smoked||NHIS 2013–2017||21 industry groups and 23 occupation groups||During 2013–2017, an estimated 2.4 million (2.2%) U.S. working adults aged ≥18 years who never smoked had COPD. The highest COPD prevalences among persons who never smoked were in the information (3.3%) and mining (3.1%) industries and office and administrative support occupation workers (3.3%). Women had higher COPD prevalences than did men.||Lnk to article|
|Tobacco Use Among Working Adults||NHIS 2014–2016||21 industry groups and 23 occupation groups||During, 2014–2016, 22.1% currently (every day or some days) used any form of tobacco product; 15.4% used cigarettes, 5.8% used other combustible tobacco products, 3.0% used smokeless tobacco, and 3.6% used electronic cigarettes; overall, 4.6% used two or more tobacco products among workers. By industry, any tobacco product use ranged from 11.0% among education services to 34.3% among construction workers; use of two or more tobacco products was highest among construction industry workers. By occupation, any tobacco use ranged from 9.3% among life, physical, and social science workers to 37.2% among installation, maintenance, and repair workers; use of two or more tobacco products was highest among installation, maintenance, and repair workers.||Link to article|
|Tobacco product use among workers in the construction industry||NHIS 2014–2016||Major industry code“04”was used to identify workers in the construction industry. Seven categories of construction occupations within the sector was identified: management; office, and administrative support; supervisors, construction, and extraction trade; installation, maintenance, and repair; production, transportation, warehousing, and repair; and all other construction workers||Over one-third of U.S. construction workers use some form of tobacco product, and use varies by worker and workplace characteristics. An estimated 43% of workers in the installation, maintenance and repair occupations used some form of tobacco products||Link to article|
|Airflow obstruction||NHANES 2007-2012||264 detailed industry codes recoded into 44 industry groups and 501 detailed occupation codes recoded into 57 occupation groups (detailed data accessed through the Research Data Center)||High airflow obstruction prevalence and significant PORs were reported in mining; manufacturing; construction; and services to buildings industries as well as extraction; bookbinders, prepress, and printing; installers and repairers; and construction occupations.||Link to article|