Key points
- After collecting and coding your occupation and industry data, prepare for analysis.
- Ensure your cell sizes are large enough for a meaningful analysis.
- Ensure your analytic dataset and the denominator data use the same classification system, version, and grouping level.
Are your cell sizes large enough to analyze?
Perform a frequency analysis for all occupation and/or industry groups. This will determine if the number of respondents in each group is large enough for a meaningful analysis and protects respondent identities (e.g., 5 or more per cell).
If the groups are not large enough:
Organize the occupation or industry codes into broader groups, such as 3-digit Standard Occupational Classification (SOC) codes or North American Industry Classification System (NAICS) codes.
If large enough, but too many groups to analyze:
Organize the codes into even broader occupation or industry groups, such as 2-digit SOC or NAICS codes.
Making broader groups
The way you group depends on whether you coded using SOC/NAICS or Census Occupation and Industry Classification.
Learn about SOC, NAICS, and Census classification systems
When using SOC and NAICS codes
For SOC and NAICS, the more digits in the code means the more detailed the occupation or industry is.
- SOC occupation codes may be grouped into 23 two-digit codes. These represent the 23 major categories of occupations (e.g., Construction and Extraction Occupations, Management Occupations).
- NAICS industry codes may be grouped into 20 two-digit codes. These represent the 20 large industry sectors (e.g., Construction, Manufacturing).
When using Census codes
We suggest organizing Census occupation and industry codes into groups like those developed by the National Health Interview Survey or the Current Population Survey.
National Health Interview Survey (NHIS)
From 2004 through 2018, the National Center for Health Statistics (NCHS) published two sets of occupation and industry groupings for the NHIS. One set uses broader or "simple" groups, the other uses detailed groups.
- The simple set includes 23 occupation groups and 21 industry groups.
- The detailed set includes 94 occupation groups and 79 industry groups.
The NHIS simple set codes to Census occupation and industry codes. They roughly match the SOC major groups and NAICS two-digit codes (plus an additional code for armed forces).
The following can be used to organize Census occupation and industry codes into groups like those used by NHIS:
Current Population Survey (CPS)
The U.S. Census Bureau uses another convenient classification system. It combines Census occupation and industry codes into either broad (aka "major") or detailed groups:
- The broad ("major") groupings include 11 occupation groups and 14 industry groups.
- The detailed groupings include 23 occupation groups and 52 industry groups.
Select a comparison population
When calculating rates and proportions, consider whether you will use data from an external or internal source as a denominator.
Some possible external sources for denominator data are the CPS or the American Community Survey (ACS).
- CPS is a large and complex monthly survey. Creating annual or custom population estimates from the CPS dataset requires advanced analytical skills. If using CPS data for denominators, the NIOSH Employed Labor Force (ELF) query system easily generates CPS-based data tables.
- ACS data are available for national and individual lower-level geographies through census.gov. If you need specific categories defined in the denominator, the ACS Public Use Microdata Sample (PUMS) files can be used to produce custom population estimates.
When using denominator data from an external source:
Make sure your analytic dataset and the denominator data use the same classification system (SOC, NAICS, or Census).
If they are not coded using the same system, reclassify one of them so that they are both coded using the same classification system.
You can reclassify a dataset by crosswalking to a common classification system:
- NIOSH's Industry and Occupation Computerized Coding System (NIOCCS) can be used to crosswalk data.
- You can also visit the Census Bureau's Industry and Occupation Codes Lists and Crosswalks website for instructions for creating crosswalks.
What is crosswalking?
Crosswalking is the mapping of a code (1) from one occupation or industry classification system to another, or (2) to a different code within the same occupation and industry classification system for a different year.
Classification systems are updated regularly, so crosswalk files help link data across classification systems to enable analysis.
What do we mean by version?
All classification systems are updated periodically to:
- Assign standardized codes to new occupations and industries.
- Remove codes for occupations and industries that no longer exist.
Each time an update occurs, a new version is created. Versions are referred to by the year the update occurred (e.g., Census 2002, Census 2010).
If your denominator data and your analytic data are not coded to the same version, reclassify one so they match.
You can align the versions by crosswalking your data using NIOCCS. In this case, crosswalking is the mapping of occupation and industry codes in the same classification system from one year to a different year.
Grouping level refers to how detailed the codes are. If your analytic dataset is coded to the 20 major industry codes, then the denominator data also need to be coded to the 20 major industry codes to compare them.
If the level of grouping differs between the datasets, one or both must be recoded so that the categories match.