Skip directly to search Skip directly to A to Z list Skip directly to navigation Skip directly to site content Skip directly to page options
CDC Home

INDUSTRY AND OCCUPATION CODING

NIOCCS Logo

SYSTEM OVERVIEW



Introduction

The NIOSH Industry and Occupation Computerized Coding System (NIOCCS) is a web-based software tool designed to translate industry and occupation (I&O) text to standardized I&O codes. It is used by occupational researchers, federal government agencies, state health departments and other organizations that collect and/or evaluate information using I&O. Its purpose is to provide a tool that reduces the high cost of manually coding I&O information while simultaneously improving uniformity of the codes.

NIOCCS is available free of charge and requires only internet access and a web browser for use. Users are required to register for a NIOCCS account if they wish to upload files of records for coding.

NIOCCS software can be accessed using the following URL:
http://wwwn.cdc.gov/niosh-nioccs/

NIOCCS Primary System Features:
 

- Industry and Occupation Coding

  • Single Record or Batch File Processing
  • Automatic and Computer-Assisted Coding
  • Selection of I&O Classification Scheme (Census 2000, 2002, 2010)
    • NIOCCS codes text to the Census Industry and Occupation classification schemes with option to include associated NAICS and SOC code in output file
  • Crosswalk Coding (Census 1990, 2000, 2002, 2010)
    • Associated NAICS and/or SOC codes can be included in the crosswalked output file
    • Ability to crosswalk forward or backward

- File History Reporting

- User Support

  • NIOCCS User Manual and supporting documentation
  • Frequently Asked Questions
  • Industry and Occupation Support website
  • NIOCCS Email Contact for Questions and to Provide Feedback

NIOSH Training Recommendations

NIOSH strongly recommends that users be trained in I&O coding prior to using the NIOCCS system. NIOCCS is not intended to take the place of trained I&O coders. Using the computer-assisted features of this system will still require trained I&O coders with the knowledge needed to use the system for selecting the appropriate I&O codes.

NIOSH provides I&O coding training classes several times a year. Requests for training can be made on the NIOSH I&O Coding website at:

http://www.cdc.gov/niosh/topics/coding/training.html

If attending a training class is not possible, it is recommended that a copy of the instruction manuals for using 2000 or 2002 Census coding schemes be reviewed (also found on the above website). The instruction manuals were developed for the I&O training class and can be used as a guide for determining industry and occupation codes when using the NIOCCS computer-assisted feature.

Contact Information

For more information about NIOCCS, users can contact the NIOCCS Support Team in one of three ways:

  1. Submit a question or suggestion using the following NIOCCS form: http://wwwn.cdc.gov/niosh-nioccs/ContactNotLoggedIn.aspx
  2. Send an email to NIOCCS@CDC.gov
  3. Contact one of the following NIOSH staff:
    1. Pam Schumacher, pschumacher@cdc.gov 513-458-7133
    2. John Lu, jlu@cdc.gov 513-841-4565
    3. Susan Nowlin, snowlin@cdc.gov 513-841-4467

NIOCCS Coding Engine

NIOCCS codes industry and occupation text based on the Census Industry and Occupation Classification system supplemented with special codes developed by CDC/NIOSH for non-paid workers, non-workers, and the military (see NIOSH I&O coding documentation for more information:

http://www.cdc.gov/niosh/topics/coding/NIOCCSUserDocumentation.html

The NIOCCS Coding Engine design has processes that cover phrase-based and word-based, exact match and proximity match, and weighted and not-weighted matching. Each process has its specialty of best-fit coding areas, so the combined coding ability is enhanced.

A high level view of the NIOCCS coding engine is illustrated in the diagram below.

 

NIOCCS coding engine diagram

The NIOCCS Knowledgebase (KB)is designed to handle common industry and occupation combinations and common miss spellings. It is the first process in the coding engine. Input records that have an exact match in the KB will be automatically coded and will not need to be processed through further coding algorithms. The NIOCCS KB was developed using one million records coded by the Bureau of Census on Census surveys and 260,000 death certificate records coded by NIOSH. These records were reviewed by expert NIOSH I&O coders for inclusion in the KB. The initial NIOCCS KB has approximately 40,000 records.

NIOCCS makes use of Confidence Levels CL) to decide the coding path, i.e. Autocoding or Computer-Assisted coding. Records that meet the user specified autocode confidence level setting will be automatically coded. Records that fall below the confidence level setting are made available in the computer-assisted coding module.

Confidence Level (CL) Setting options

High

If records are processed using the HIGH confidence level setting, then only matched candidates where NIOCCS has 90% or greater confidence of accuracy will be automatically coded.

Medium

If records are processed using the MEDIUM confidence level setting, then only matched candidates where NIOCCS has 70% or greater confidence of accuracy will be automatically coded.

NOTE: The higher confidence level (CL) setting will normally result in higher accuracy of the coded results however it may reduce the number of records automatically coded. See Chapter 5 in the NIOCCS User Manual for more information about the NIOCCS Autocoding Confidence Levels.

The I&O Restriction Filter is an inter-dependency arbitrator. The industry code and occupation code sometimes are inter-dependent, in that one industry title may map to more than one industry code, and the most accurate one can be decided only by considering the occupation information; likewise, one occupation title may map to more than one occupation code, only the industry code can help to narrow them down to the most appropriate one. Thus, NIOCCS first assigns the industry code, and then the occupation code, because in most cases the occupation codes are restricted by industry codes. If there is still more than one set of industry and occupation codes that cannot be further screened, they will be output as all possible candidates together with their confidence levels. See Chapter 6.5.2.4 in the NIOCCS User Manual for more information on industry restriction rules.

Crosswalk Coding Engine

Crosswalk coding is the mapping of a code from one I&O classification coding scheme to another I&O classification coding scheme or to a different code within the same I&O coding scheme for a different year.

The crosswalk coding engine uses stored tables that have the code mappings for each year and each scheme. Only exact match processing is used.

Autocoding Results

Benchmarks for NIOCCS autocoding are based on accuracy rates of the data that is autocoded by the system.Accuracy is tested using large sets of records that have been coded and verified by NIOSH trained I&O coders.The benchmark goals for NIOCCS are:

High Confidence Level: 10% or less error rate found in autocoded data

Medium Confidence Level: 25% or less error rate found in autocoded data

Production rates are determined by calculating the percent of records coded automatically by NIOCCS.NOTE:The quality of data input for coding can result in very different autocoding production rates. Using the benchmarks set for coding accuracy, the average NIOCCS production rates for autocoding has demonstrated the following:

HIGH Confidence Level, Both Industry and Occupation Autocoded

 

Data Type

Year 2013

Year 2014

Death Certificates

64%

64%

Surveys

49%

50%

Other

52%

57%

Average of All Data Types

51%

56%

 

Medium Confidence Level settings will typically result in a 10-15% increase in the number of records autocoded depending on the quality of the data.
 

NOTE: The higher confidence level (CL) setting will normally result in higher accuracy of the coded results however it may reduce the number of records automatically coded.

Continual Improvement

The NIOCCS project team continually works to identify adjustments that can be made to the system to improve autocoding and accuracy rates. User feedback is welcome and is used to identify and prioritize improvements to be made to the system. NIOCCS system architecture was developed to enable the following types of ongoing system improvements:

Knowledgebase (KB)
The NIOCCS KB will be continually evaluated as NIOSH coding and IT staff analyze more coded data to identify the refinements that could be made to the knowledgebase to improve accuracy and efficiency.

Coding Engine
As more data have been processed and studied, the internal parameters (such as the weight of process, weight of keywords, etc.) will be adjusted to the optimal values, thus accuracy and production are increased.

Special Coding Rules
Specific rules for unique industry or occupation titles will be added or modified as needed to improve coding accuracy. Each rule will be tested and approved by expert coders before adding into the system, and will be periodically validated, so that invalid or obsolete rules are removed.

Data Quality

Coding results will vary and depend upon overall quality of the source data. Different data sources may render significantly different accuracy and production rates. Structured and detailed data sources will have higher accuracy and production rates than data sources with liberal text, insufficient information, or numbers or symbols included in the text.

NIOCCS uses only the industry and occupation text to assign codes. Records that contain employer name and/or job duties will not code at the same rate of accuracy as records containing only industry and occupation. This is because the additional pieces of information (employer and job duties) can conflict and/or provide more detailed information that could alter the I&O codes assigned. Including this information can be helpful however when using the computer-assisted coding module to ensure that appropriate codes are assigned manually.

Limitations

Performance

Internet bandwidth will significantly affect the interactivity of the computer-assisted coding.

The Autocoding process may take a significant amount of time when the volume of the data is significantly large. The turnaround time for autocoding may also depend on the traffic in the queue of coding jobs.

File Size Limitations

Upload file size is currently (September 2014) limited to 2.5 mg. The number of records this equates to will vary depending on how many of the optional fields on the input file format are used. Files uploaded using the expanded file format will equate to approximately 10,000 – 20,000 records. For files that use slim file format, it equates to approximately 20,000 – 25,000 records.

Coding directly to NAICS and SOC

NIOCCS coding is based on the Bureau of Census I&O Classification schemes. NAICS and SOC codes can be obtained through NIOCCS, however the NAICS and SOC codes will be limited to the detail provided in the Census Alphabetic Indexes. Users can not code directly to NAICS and SOC codes.

 
Contact Us:
  • Page last reviewed: September 17, 2013
  • Page last updated: September 30, 2013
USA.gov: The U.S. Government's Official Web PortalDepartment of Health and Human Services
Centers for Disease Control and Prevention   1600 Clifton Rd. Atlanta, GA 30329-4027, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC-INFO