Gaps in Survey Data on Cancer in American Indian and Alaska Native Populations: Examination of US Population Surveys, 1960–2010

Introduction Population-based data are essential for quantifying the problems and measuring the progress made by comprehensive cancer control programs. However, cancer information specific to the American Indian/Alaska Native (AI/AN) population is not readily available. We identified major population-based surveys conducted in the United States that contain questions related to cancer, documented the AI/AN sample size in these surveys, and identified gaps in the types of cancer-related information these surveys collect. Methods We conducted an Internet query of US Department of Health and Human Services agency websites and a Medline search to identify population-based surveys conducted in the United States from 1960 through 2010 that contained information about cancer. We used a data extraction form to collect information about the purpose, sample size, data collection methods, and type of information covered in the surveys. Results Seventeen survey sources met the inclusion criteria. Information on access to and use of cancer treatment, follow-up care, and barriers to receiving timely and quality care was not consistently collected. Estimates specific to the AI/AN population were often lacking because of inadequate AI/AN sample size. For example, 9 national surveys reviewed reported an AI/AN sample size smaller than 500, and 10 had an AI/AN sample percentage less than 1.5%. Conclusion Continued efforts are needed to increase the overall number of AI/AN participants in these surveys, improve the quality of information on racial/ethnic background, and collect more information on treatment and survivorship.


Introduction
The interplay of complex individual and social factors has resulted in cancer-related disparities, including higher cancer incidence and deaths among American Indian and Alaska Natives (AI/ANs) (1). Substantial geographic differences in cancer deaths exist among AI/ANs (2). AIs in the Northern Plains region have a higher incidence of cancer than AI/ANs in other regions and the US population. Among the US AI/AN population, Northern Plains AIs have the highest cancer death rate and one of the highest cancer incidence rates (3).
Comprehensive cancer control (CCC) programs are pooling community resources to reduce the cancer burden through risk reduction, early detection, better treatment, and enhanced survivorship (4). Population-based data are essential for establishing the baselines and measuring the progress made by these programs. The Centers for Disease Control and Prevention (CDC) funds 50 states, 7 tribes or tribal organizations, and 7 US-affiliated Pacific Islands or territories to establish CCC coalitions; assess the burden of cancer; determine priorities; and develop, implement, and evaluate CCC plans (4). However, tribal and territorial CCC programs often experience challenges finding public health information that is specific to their population groups, relevant to their community priorities, or both.
To increase the understanding of cancer data gaps, we systematically reviewed survey data related to cancer and the AI/AN population. The 3 objectives of this study were to 1) identify major population-based surveys conducted in the United States that contain questions related to cancer, 2) document the sample size of the AI/AN population in these surveys, and 3) identify gaps in the types of cancer-related information collected by these surveys.

Data sources
We identified major survey data sources by conducting an Internet query of US Department of Health and Human Services agency websites. Websites included Agency for Healthcare Research and Quality, CDC, Centers for Medicare and Medicaid Services, Indian Health Service, and the National Institutes of Health, specifically, the National Cancer Institute. We searched for information about data collection systems sponsored by these agencies and reviewed their reports to identify data sources related to cancer. We then conducted a Medline search to identify English-language articles using the terms "cancer" and "survey," which resulted in more than 10,000 records. We narrowed the search by specifying the author address to include "USA." On the basis of authors' knowledge of population-based surveys, most population-based surveys with national or close-to-national coverage were expected to be done sometime after 1960, which was confirmed with the results of the search. We reviewed the abstracts of approximately 5,000 articles to identify additional surveys that were not identified through the Internet search. Most articles reported results from a research study that used a survey as the method for data collection. Some articles reported results from national surveys that were already identified in the website query. The Medline search helped us identify 2 additional sources for potential inclusion: The Health and Retirement Study (HRS) (5) and the National Mortality Followback Survey (NMFS) (6). Although many data collection activities occur at the local or state level, this analysis focused on gaps in existing data with wide geographic coverage.

Selection of surveys
Surveys were subject to further analysis using 3 inclusion criteria. First, the survey data had to include information from at least 1 of the following categories: cancer risk or protective factors (eg, smoking, obesity, physical activity, vaccination for human papillomavirus), cancer incidence and death, or cancer screening and treatment use. Second, the survey had to be population-based. That is, the purpose of the data collection was to produce a representative sample of the defined target population. If, for example, data collection was conducted on the basis of a convenience sample of patients seen at 1 health clinic, the survey would not meet this inclusion criterion. Third, the survey was required to have national coverage or be based on a national initiative to collect multistate data.

Data extraction
The unit of analysis was a survey. A data extraction form was created in Microsoft Excel (Microsoft Corporation, Redmond, Washington) to collect detailed information about each survey, including purpose and description, type of cancer information collected, race and tribal affiliation, sample size, data collection methods, and period of data collection. We sought sample size information for the most recently available data at the time of data extraction. To determine the type of information collected by the surveys, we used the continuum of cancer care model used by CCC programs because we wanted to examine the data sources from the perspective of cancer prevention and control to understand their usefulness and limitations. CCC programs include activities across the continuum of cancer care to encourage people to live a healthy lifestyle (prevention), promote cancer screening tests (early detection), increase access to good cancer care (treatment), and improve the quality of life for people who survive cancer (survivorship) (4). To extract information for these 4 phases or categories, we reviewed survey forms and manuals published on the websites of the organizations that administer the surveys. We reviewed all the available forms and manuals across multiple years of data collection.

Sampling and data collection methods
Because many of the surveys reviewed target specific groups such as students and new mothers, it is important to be aware of differences in age distribution between the US AI/AN population and the US population. According to the 2010 US census, the median age of the US population was 35.8 years for both men and women, and the median age of the AI/AN population was 29.4 years for men and 31.0 years for women. In 2010, 24% of the US population and 30% of the AI/AN population were younger than 18 years (7). The American Indian Adult Tobacco Survey (AI-ATS) (8), the Alaska Native Adult Tobacco Survey (AN-ATS) (9), and the South Dakota Tribal Pregnancy Risk Assessment Monitoring System (SD-Tribal PRAMS) (10) are all cross-sectional surveys that sample exclusively from AI/AN communities (Table 1). These surveys provide tribe-and community-specific data. The AI-ATS and AN-ATS handbooks provide detailed information about different sampling methods so that individual tribes and communities can choose the method best suited to their need.
Three surveys (Health Behavior in School-aged Children [HBSC], the National Youth Tobacco Survey [NYTS], and the Youth Risk Behavior Survey [YRBS]) are school-based, cross-sectional surveys that target children and adolescents. The HBSC was the only cross-national survey reviewed in this study. All 3 surveys collect data on smoking and other risk behaviors among children and youth. The Tobacco Use Supplement to the Current Population Survey (TUS-CPS) (11) is a cross-sectional telephone survey of smoking and other tobacco uses among people aged 15 years or older. The TUS-CPS produces state-specific data and a nationally representative sample by using a multistage stratified sample of households based on US census data. The National Immunization Survey (NIS) (12) also focuses on children. The NIS is a cross-sectional survey that collects information about vaccination history among infants and young children (aged 19-35 months) and teens (aged 13-17 years). Unlike data collected in the HBSC, NYTS, and YRBS, the NIS data are collected from parents and vaccination providers.
The Behavioral Risk Factor Surveillance System (BRFSS) (13), the National Health Interview Survey (NHIS) (14), and the National Health and Nutrition Examination Survey (NHANES) (15) are among the most comprehensive crosssectional health surveys conducted in the United States. The BRFSS focuses exclusively on adults, and the NHIS and the NHANES include children. The NHIS and NHANES, which conducts laboratory tests, are 2 of the longest-running surveys sponsored by the National Center for Health Statistics. The Medical Expenditure Panel Survey (MEPS) (16) is an ongoing panel survey that collects information on health conditions, use of and satisfaction with medical services, and medical cost. The survey sample is from a nationally representative subsample of households that participated in the prior year's NHIS.
The Health Information National Trends Survey (HINTS) (17) is a cross-sectional survey that collects nationally representative data about the public's use of cancer-related information. The first HINTS survey was conducted in 2003, making it one of the newest population-based surveys on cancer.
The NMFS (6) is a cross-sectional survey that uses death certificate data and information provided by proxy to study the etiology of disease and demographic trends in mortality. The HRS (5) is a longitudinal survey that focuses on issues among adults aged 51 years or older.
Cross-sectional surveys of ambulatory medical care -the National Ambulatory Medical Care Survey (NAMCS) (18) and the National Hospital Ambulatory Medical Care Survey (NHAMCS) (19) -annually collect data on the provision and use of ambulatory medical care services. The NAMCS focuses on office-based physicians, and the NHAMCS focuses on general and short-stay hospitals.

Race, tribal affiliation, and AI/AN sample size
Most of the data sources reviewed collect general population data from people of different racial/ethnic groups. The exceptions are the AI-ATS, the AN-ATS, and the SD-Tribal PRAMS, which primarily focus on AI/ANs. Eleven surveys collect race/ethnicity information from the survey respondents, 3 (NAMCS, NHAMCS, NMFS) rely on race/ethnicity information found in health care facility administrative data, 2 (AI-ATS, AN-ATS) use tribal membership information provided by the participating tribes, and 1 (SD-Tribal PRAMS) uses race/ethnicity reported in birth certificate data provided by a parent. Tribal affiliation information is found in only 1 data source reviewed in this study: SD-Tribal PRAMS.
Of the 15 surveys for which sample size information was available (AI-ATS and AN-ATS are not included because the data are owned by participating tribes that require approval to access the information), only 6 had an AI/AN sample size larger than 500: BRFSS, NHIS, TUS-CPS, NYTS, SD-Tribal PRAMS, and HBSC (Table 2). Three surveys reported an AI/AN sample size smaller than 100: HINTS (n = 61), NHANES (n = 86), and NAMCS (n = 93). The percentage of AI/AN respondents in the total sample ranged from 0.4% for NAMCS to 8.5% for NYTS. In the US 2000 census, 1.5% of people self-identified as AI/AN; only 4 surveys (NYTS, HBSC, YRBS, BRFSS) exceeded a percentage higher than 1.5%. Currently, 4 surveys (HINTS, HRS, NHIS, and NHANES) oversample minorities, but none of them oversample the AI/AN population.

Cancer information collected
A total of 15 surveys reviewed in this project contain information about cancer risk and protective factors (Table 3), primarily health behaviors or lifestyle factors. Use of commercial tobacco (ie, tobacco products available to the general public) is addressed by 14 surveys, but the AI-ATS is the only survey that collects data on use of sacred tobacco (ie, traditional tobacco used by American Indians for ceremonial purposes). Physical activity data are collected in 8 surveys, and nutrition information is obtained by 6 surveys. Other less commonly collected information includes tobacco policies (AI-ATS, AN-ATS, BRFSS, TUS-CPS), sexual behavior (NHANES, YRBS), occupational exposure (NHIS, NMFS), and household pesticide use (BRFSS, NHANES).
Nine surveys obtain information about cancer screening tests or diagnosis. Five surveys (BRFSS, HINTS, HRS, MEPS, NHIS) provide information about the receipt of cancer screening tests based on self-reported data. Two surveys (NAMCS, NHAMCS) collect information from health care providers to determine types of cervical cancer screening tests offered at the facility. These surveys also collect information on the Pap test guideline used at the facility. Seven surveys (BRFSS, HINTS, HRS, MEPS, NHANES, NHIS, NIS) collect self-reported cancer diagnosis information. NAMCS collects information about availability of cancer diagnostic tests at a given health care facility.
Treatment and survivorship information data are less frequently collected than data on prevention and early detection.

Discussion
To our knowledge, this is the first review to identify the limitations and strengths of cancer survey data sources for the AI/AN population. Our review identified strengths and limitations of existing population-based surveys that obtain cancer-related information. Many surveys date before the 1990s and provide data that can be used to look at trends in cancer risks, protective behaviors, incidence, and outcomes. Many of the surveys reviewed in this study provide data for the Healthy People program to establish national objectives and track progress in cancer and other health issues. However, estimates specific to the AI/AN population are often lacking because of inadequate AI/AN sample size.
Because approximately 1.5% of the US population identify as AI/AN, oversampling is required to produce a representative sample of AI/AN respondents. However, achieving an adequate AI/AN sample size is challenging, even with oversampling. For instance, some surveys, such as NHANES, have a limited total sample size and are administered in selected geographic areas each year. In addition, oversampling may not have a substantial effect on AI/AN sample sizes for facility-based surveys such as the NAMCS and the NHAMCS, which exclude federally employed physicians and federal hospitals. Another option, which may be used alone or in conjunction with oversampling, is to use poststratification weighting to ensure that AIs are adequately represented in survey samples.
Nevertheless, the findings that 9 national surveys reviewed in this study had an AI/AN sample size smaller than 500 and that 10 had an AI/AN sample percentage smaller than 1.5% were surprising. The current practice in designing and implementing survey data collection for race and ethnicity needs to be rethought. For instance, the sample size of surveys that are used to develop health indicators should be large enough to produce AI/AN-specific estimates. Some questions are age-and sex-specific (eg, colorectal and breast cancer screening use) and require an even larger sample of AI/AN respondents to produce reliable estimates after stratification. Conducting the surveys less frequently could offset the cost of oversampling.
We identified surveys that collect a range of cancer-related data, from prevention to survivorship. In terms of prevention, smoking, physical activity, and nutrition are addressed by many surveys. NHANES is unique because selfreported measures of cancer risk can be validated by laboratory test data. HINTS, a newer survey, collects data on cancer information communication. These types of data can be useful for examining variations in information use and communication style across different population groups. In general, surveys reviewed in this study put much greater emphasis on risk factors and screening use than on treatment and survivorship. More people are living with cancer and surviving longer. Population-based data on access to and use of cancer treatment and follow-up care as well as barriers to receiving timely and quality care should be collected more consistently. For AI/ANs, who have a lower cancer survival rate than some other racial/ethnic groups (20), this information is especially important. Because of the complexity of cancer therapy, medical records and claims data may be a better source of information on treatment than surveys. Medical records and claims data also provide comorbidity and cost information. The work of the Office of the National Coordinator for Health Information Technology on the meaningful use of data (21) will increase the usability of clinical data and can supplement information collected from surveys. This study has limitations. This review was designed to inform federal policy and practice for collection of chronic disease surveillance data. Therefore, we focused on major national health surveys. The decision to restrict our analysis to the examination of population health data meant that potentially relevant data from research studies may have been overlooked. Also, we did not examine other cancer data sources such as registries and administrative databases that supplement treatment information, which is often missing in survey data. Future research may examine issues unique to these data sources.
The Affordable Care Act, Section 4302, will address health inequalities by improving collection of data on race and ethnicity, sex, and primary language (22). The implementation of new data standards began in 2012, and the new survey questions will be standard on most surveys sponsored by the US Department of Health and Human Services (23). The revised survey question for race will continue to list AI/AN separately. The new survey questions may improve the quality of data on race and ethnicity, which could lead to more accurate estimates of all racial/ethnic categories. These improvements may have the added benefit of increased rates for the AI/AN population.  1992-1993, 1995-1996, 1998-1999, 2000, 2001-2002, 2003, 2006-2007   Abbreviation: AI/AN, American Indian/Alaska Native; NA, not available. The American Indian Adult Tobacco Survey (AI-ATS) and Alaska Native Adult Tobacco Survey (AN-ATS) are not included in this summary because the data are owned by participating tribes that require approval to access the information.

Tables
The date in parentheses is the data collection year of the sample size information presented in this table. We sought to obtain sample size information for the most recently available data for each survey at the time of data extraction.