Task 1: Key concepts about the limitations of CMS data: Medicaid

It is helpful to understand the contents of the MAX files. Refer to Course 1, Module 4, Task 1 for more information regarding the types of services which appear in the different claims files.  A useful document to reference is the “Documentation and Analytic Guidelines for NCHS surveys linked to Medicaid Analytic eXtract (MAX) files”, which appears in the Resources section, below.  You are also encouraged to visit the ResDAC website for more information on Medicaid data.  See the Resources section for the link to the website.


Info icon Information

Much of the information for this module was based on a NCHS technical paper, Documentation and Analytic Guidelines for NCHS surveys linked to Medicaid Analytic eXtract (MAX) files.  The link is provided in the Resources section.


Key concepts about the limitations of Medicaid data

The advantages of Medicaid data are that they are population-based, not subject to recall bias, and can be linked to NCHS population health surveys to expand the analytic potential of both data sources.  However because Medicaid data were collected for the purpose of making healthcare payments, and not for research, there are limitations to the data that you should consider when constructing an analytic data file and conducting analyses.

How to identify multiple records on the Person Summary (PS) file

Many survey participants will be linked to multiple MAX file PS records. Most often, this is because a respondent is linked to several years of MAX data. However, less frequently, a survey respondent may be linked to multiple PS records within the same year. This occurs in a small percentage of records. For example, for 1999-2004 NHANES, 9.5% of observations that linked to MAX files had linkages to more than one PS record in one or more years. There are multiple explanations for this situation.

You may find more information on data anomalies for each state in the Medicaid Statistical Information Systems Anomalies/Issues Report, produced by CMS.  See the Resources section for the link to this report.

Respondents with multiple PS records per year were generally due to MAX file records coming from multiple states.  Among observations with multiple records in 1999-2004 NHANES years, 81.7% came from different states. 

Another source of multiple PS records within the same year could be from false matches due to misreporting of personally identifiable information or issues with linkage methodology. The validity of multiple records in the same year can be difficult to ascertain. While some records show eligibility in different states in non-overlapping months, others show eligibility in different states in the same months of the same year.

The presence of multiple PS records within a year leading to overlapping months of Medicaid enrollment data between the multiple PS records can complicate analyses.  To identify multiple PS records within a year, you can use the Medicaid monthly enrollment variables (MAX_ELG_CD_MO_1 - MAX_ELG_CD_MO_12) in each record. By determining whether a person was enrolled in each month across the multiple records within a year, you can obtain the number of total months of enrollment across records. 

To help identify enrollees with multiple PS records within the same year, NCHS has provided a set of variables that act as flags on the data files to identify these observations. These variables can help identify enrollees with multiple PS records in a year, whether they occurred in the same or different state, and help determine the number of enrollees with multiple PS records per year.

The variable FLG_YEAR_MULT_RECS identifies enrollees with multiple records in the same year:
0 = no multiple records
1 = multiple records in the same state
2 = multiple records in different states
3 = multiple records in the same and different states

You may choose to exclude these records, depending on the research question being explored.  

Medicaid services and payments not included on the MAX files

There are some limitations to the information contained in the MAX files. Because these files contain only Medicaid-paid services, they do not capture service use or expenditures during periods of non-enrollment, services paid by other payers, or services provided at no charge.
The following Medicaid services and payments are not included on the MAX files:

How to identify Dual Eligibles

Service information may be missing or incomplete in MAX files for certain groups of enrollees. This is particularly important for individuals enrolled in both Medicaid and Medicare, often referred to as dual eligibles. Because Medicare is the first payer for services used by dual enrollees that are covered by both Medicare and Medicaid, MAX captures such service use only if additional Medicaid payments are made on behalf of the enrollee for Medicare cost sharing or for shared services. Medicare premiums paid by Medicaid on behalf of duals are not included in MAX.
Dual eligibles can be identified several ways in the MAX.  However, the suggested method to identify dual eligibles is to use the annual dual eligibility indicator (EL_MDCR_DUAL_ANN) provided on the PS file.


We recommend that you use values 50-59 to identify dual eligibles as the Medicare enrollment database is the preferred indicator of dual enrollment.

How to identify Children’s Health Insurance Program (CHIP) enrollees

Each state has the option of expanding Medicaid eligibility to children who previously had been ineligible due to their income (M-CHIP) and/or creating a program distinct from its existing Medicaid program (S-CHIP). All persons with Medicaid or M-CHIP are included in the MAX files.  However, states have the option of not reporting information on S-CHIP enrollees to Medicaid Statistical Information Statistics. Therefore, only some S-CHIP enrollees are included in the MAX files.  Further, S-CHIP enrollees that are included in the MAX files may have incomplete data. There are variables (EL_CHIP_FLAG_1 - EL_CHIP_FLAG_12) on the MAX files that provide monthly information on CHIP eligibility, as well as whether an enrollee was enrolled in M-CHIP or S-CHIP.

EL_CHIP_FLAG_1 - EL_CHIP_FLAG_12 (CHIP Monthly Eligibility):
0 = Not eligible for Medicaid or CHIP during this month
1 = Enrolled in Medicaid during this month
2 = M-CHIP during this month
3 = S-CHIP during this month

For S-CHIP enrollees in the files, some data elements contain no information. Therefore, variables that are counts of months (as noted above) are suspect for persons enrolled in S-CHIP for one or more months since those months are not counted in the total counts. Although S-CHIP enrollees may be a group of particular interest for some researchers, it should be noted that they account for a small percent of respondents linked to the MAX files.

How to identify Medicaid enrollees

Estimates of the number of Medicaid enrollees differ by data source. In general, estimates from population surveys such as the NHANES tend to yield lower estimates than Medicaid enrollment data collected by the states. There are likely several reasons for these differences. Some enrollees may respond incorrectly to population surveys, survey populations may differ from the population included in administrative records, and different reference periods for data from administrative records and the population surveys may account for some of these differences.

A multi-phase research project referred to as the Medicaid Undercount Project was undertaken to explain why discrepancies exist between survey estimates of enrollment in Medicaid and the number of enrollees reported in state and national administrative data. This project, also called the SNACC project (SNACC is an acronym for the agencies conducting the project: the University of Minnesota's State Health Access Data Assistance Center (SHADAC), the National Center for Health Statistics (NCHS), the Agency for Healthcare Research and Quality (AHRQ), the U.S. Department of Health and Human Services Assistant Secretary for Planning and Evaluation (ASPE), the Centers for Medicare and Medicaid Services (CMS), and the U.S. Census Bureau).  More information on this project can be found on the U.S. Census Bureau website. See the Resources section for the link to this material.

The best method for identification of respondents who were Medicaid enrollees depends on the exact research question. In general, however, the variable MSNG_ELG_DATA on the PS file provides information on their enrollment status.

MSNG_ELG_DATA on the PS file indicates enrollment status:
.  = enrolled in Medicaid during the year
2 = enrolled in S-CHIP
1 = enrolled in neither Medicaid nor S-CHIP.  See section H below for further discussion of observations where MSNG_ELG_DATA is coded as “1”.

Of note, the variable MSNG_ELG_DATA exists on each of the MAX files (PS, RX, IP, LT, OT) and at times, different values are assigned in the different files for the same person (in the same year). However, the value assigned on the PS file is the most valid of these and should be used for all of the data for that person (and year), regardless of which file it originates. 

How to identify Managed Care vs. Fee-For-Service (FFS) enrollees

Many Medicaid and CHIP enrollees are enrolled in managed care plans, and enrollment in these programs has expanded over time. Managed care enrollment also varies markedly across states.  For enrollees in Medicaid managed care plans, information in MAX is restricted to premium payments and some service-specific utilization information. While records for services delivered (including diagnoses and procedures) are uniformly provided for recipients with fee-for-service coverage, encounter records for those with comprehensive managed care plans are not provided by all states. In some states, only a portion of managed care recipients have encounter data recorded. When included in the files, managed care encounter data list $0 as the amount paid for the services provided, even when the services are covered by the managed care plan.  

The Person Summary file contains a variable (EL_PPH_PLN_MO_CNT_CMCP) that can be used to identify beneficiaries enrolled in any type of managed care plan and the number of months of enrollment in the plan. 

Twelve additional variables (EL_PHP_TYPE_1_1 - EL_PHP_TYPE_4_12) on the PS file identify each of up to 4 different types of managed care plans that a beneficiary could be enrolled in during each month of the year.

The following types of managed care plans can be identified using EL_PHP_TYPE_1_1 through EL_PHP_TYPE_4_12:

How to identify enrollees who are eligible for waivers

Section 1115 of the Social Security Act provides the Secretary of Health and Human Services broad authority to authorize experimental, pilot, or demonstration projects likely to assist in promoting the objectives of the Medicaid statute. These projects are intended to demonstrate and evaluate a policy or approach that has not been widely used. Some states expand eligibility to individuals not otherwise eligible under the Medicaid program, provide services that are not typically covered, or use innovative service delivery systems. Examples include expanding care for children in foster care, providing specialty mental health care and expanding Medicaid eligibility for family planning services to women of child-bearing age not otherwise eligible for Medicaid. Medicaid enrollees that are eligible for Medicaid as a result of one of these programs are referred to as eligible through a waiver or waiver program. General information about the Medicaid/CHIP waiver programs can be found on the CMS website.  See the Resources section for the link to this material.

For data years before 2005, it is only possible to know the Maintenance Assistance Status (MAS) and the Basis of Eligibility (BOE) of each enrollee by month using MAX_ELG_CD_MO_1 - MAX_ELG_CD_MO_12. Waivers for specific groups make up one of the MAS categories, although the specific type of waiver cannot be identified.

Starting in 2005, MAX files include three elements for each month (MAX_WAIVER_TYPE_1_MO_1 - MAX_WAIVER_TYPE_3_MO_12) that give detailed information on the type of waivers under which enrollees are eligible for Medicaid.

Child survey participants

Survey participants under 18 years of age at the time of the survey, are considered linkage-eligible, the criteria by which survey participants can be potentially linked to CMS data,  if consent is provided by their parent or guardian.  Linkages to CMS administrative data are conducted linking survey data to multiple years of administrative data.  Consequently, linkage-eligible child survey participants can be under 18 years of age for some years of linked administrative data and 18 years of age or older for later years.  For example, a 15-year old 2003-2004 NHANES participant can be linked to CMS data for 2006 and earlier years as a child, but would be an adult in 2007 (approximately) and later years. 

In accordance with NCHS Ethics Review Board (ERB) guidelines, for survey participants younger than 18 years of age at the time of the survey, NCHS will only provide linked CMS data generated for program participation, claims and other events that occurred prior to the participant’s 18th birthday.  The linkage of NHANES to the CMS Medicaid data potentially has a large number of child survey participants linked to one or more years of Medicaid data collected after age 18.  This should be taken into consideration by analysts when estimating their potential sample size for RDC proposals.  Analysts requiring more information about potential sample sizes for RDC proposals or more information on this NCHS ERB guidance should contact the NCHS Data Linkage team (datalinkage@cdc.gov). 




close window icon Close Window to return to module page.