Collecting and Using Industry and Occupation Data



You’ve got text descriptions of industry and occupational data, now what? You’ll need to code your data. Here are the most important things to understand and consider when coding your data. Also check out our blog, “Making Industry and Occupation Information Useful for Public Health: A guide to coding industry and occupation text fields.”

Things to Understand Before You Code Your Data

  1. Industry and occupation coding is the process of converting word descriptions (free text) of individuals’ type of work and type of business into a numeric code.
    1. Codes are standardized. This means each occupation and each industry has a unique number associated with it to enable grouping and better data analysis.
    2. Coding the data allows public health officials and other researchers to assess patterns and trends in work-related diseases, injuries, and exposures.
  2. In the US, there are a few different classification systems that can be used to assign codes associated with each specific occupation and each specific industry.
    The two main options are:

NAICS can be used to find the standard codes for industries in North America. It is maintained by the Office of Management and Budget.

SOC can be used to find the standard codes for occupations in North America. It is maintained by the Bureau of Labor Statistics.

NAICS and SOC codes are not related, meaning the industry codes in NAICS do not have any influence on the occupation codes in SOC, and vice versa. Therefore, NAICS and SOC can be used separately, if you only have industry data OR occupation data, or they can be used together to create industry AND occupation codes.

The Census Industry and Occupation Classification System is maintained by the U.S. Census Bureau. It was created to code both occupation and industry text obtained from surveys.

Census industry and occupation codes are derived from NAICS and SOC. The detailed industry codes from NAICS were grouped to make a smaller number of broader industry codes for Census. This is similar for SOC – the occupations were grouped into broader categories to create a smaller set of occupations for Census occupation codes.

The broader industry and occupation categories in the Census coding system better ensure protection of the privacy of survey respondents.

  1. You will need to choose a coding system to code your data. To do this, decide if you are going to compare your data to another dataset.


    If comparing your data to another dataset (e.g., for calculating rates), the classification system you use depends on the data you want to use for comparison. If the dataset for your comparison population is coded using Census, code your data using Census.If the comparison set is coded using NAICS and SOC, then use NAICS and SOC. Most public health data are coded using Census. These include:

    • National Health Interview Survey
    • Current Population Survey

    However, some employer-based surveys use NAICS/SOC for business or economic analysis to code their industry and occupation data. These include:

    • Quarterly Census of Employment Wages survey
    • Occupational Information Network (O*NET)
    • Bureau of Labor Statistic (BLS) Annual Survey of Occupational Illness and Injury

    Not Comparing

    If you are not comparing your data to another dataset, we recommend you use the Census Industry and Occupation Classification system for a few reasons:

    • Most public health data are in this format.
    • Generally, Census industry and occupation codes are best for analyzing health or population data.

How Do I Code My Data?

Coding industry and occupation data using the classification systems (NAICS, SOC, Census) requires training and experience. Luckily, autocoders are available to help with this process. Autocoders are software applications that assign industry and occupation codes to your descriptive data, using the classification system you choose.

There are many advantages when autocoding

  1. There is consistency in codes from one record to another, which means less random error.
  2. Autocoding is faster than manual coding.
  3. Autocoding is less costly.

NIOCCS, the NIOSH Industry and Occupation Computerized Coding System, is a free, web-based autocoder that can be used by anyone to code data. NIOCCS can look up a single industry and occupation code or you can create an account or login to your account to code a large file of survey responses.


NIOCCS was developed to code industry and occupation free-text descriptions to Census codes. NIOCCS can also code text descriptions to NAICS and SOC codes by first coding to Census and then crosswalking the data to NAICS and SOC codes.

Crosswalking is the mapping of a code from one industry and occupation classification system to another, or to a different code within the same industry and occupation classification system for a different year. Classification systems are updated regularly, so crosswalk files help link data across classification systems to enable analysis. Use NIOCCS to crosswalk a single record or login to crosswalk an entire file.

If you want to code to NAICS/SOC using NIOCCS, it is a seamless process; when you enter the text and select to code to NAICS/SOC, NIOCCS will simply output NAICS and SOC codes.

To learn more, please visit the NIOCCS page.

Page last reviewed: February 9, 2021