Methodology for Calculating County-Level Estimates and 95% Confidence Intervals for Immunization Indicators
The need for small area estimates of health indicators is well documented and important for efficient public health planning. The lack of available direct data below the state level for many indicators poses difficulties to effective planning. In the absence of directly measured data, statistical modelling can provide useful information. Various approaches are available to estimate county-level estimates from higher geographic-level data. The Epidemiology and Surveillance Branch in the CDC’s Division of Population Health employs an estimation method using multilevel logistic regression and post-stratification. County-level prevalence estimates were derived using data from the 2019 Behavioral Risk Factor Surveillance System (BRFSS), 2015–2019 American Community Survey (ACS), and 2019 Census Population estimates as described in this document. All immunization indicators are dichotomous and defined as follows:
- The proportion of the county population aged 18 years or older who have received an Influenza vaccination within the past 12 months.
- The proportion of the county population aged 18 to 64 years who are considered at increased risk and who have received a Pneumococcal vaccination.
- The proportion of the county population aged 65 years or older who have received a Pneumococcal vaccination.
- The proportion of the county population aged 18 years or older who have received a Tetanus shot within the past 10 years.
- The proportion of the county population aged 18 years or older who have received a Tetanus, Diphtheria, and Pertussis (Tdap) Vaccines shot within the past 10 years.
Steps in Calculation Prevalence Estimates
Calculating county-level prevalence estimates were conducted using a multilevel regression and post-stratification method. This method has been used in the 500 Cities Project 2016–2019, which in 2020, expanded into the PLACES Project. The steps used to create the estimates and their 95% confidence intervals (CIs) include the following:
- Model construction. A multilevel logistic regression model was constructed for each indicator, which included data from the 2019 BRFSS that included unit-level age (13 categories), sex (2 categories), race/ethnicity (8 categories), education level (4 categories); ACS data for county-level percentage of adults below 150% of the poverty line from 2015 to 2019; and state- and county-level data for random effects.
- Population linkage. All the parameters from the model were linked with 2019 Census population estimates by county, age, sex, race/ethnicity, and with the 2015–2019 county-level percentage of adults below 150% of the poverty line. The 2019 Census county-level population data by age, sex, and race/ethnicity were not available for the stratified 4 education levels. Therefore, the distribution of education levels for each category of population by age, sex, and race/ethnicity were imputed by bootstrapping (a resampling approach) using ACS’s county-level percentage of adults by 4 education categories.
- Estimates and 95% CIs calculation. Predicted probability for the risk of each indicator was calculated using the model parameters and the county population data using the multilevel logistic model formula. County-level prevalence estimates were obtained by multiplying the predicted probability with the county-level population over 208 categories. These steps were repeated 1,000 times using Monte Carlo simulation, which generated 1,000 sets of estimates for each indicator. The final estimate of each indicator by county was reported as the mean among the 1,000 sets of estimates and its 95% CI was the 2.5th and 97.5th
Limitations of the Procedure
Several limitations should be noted using this procedure: the modeling process would carry over all the known bias from BRFSS data to the estimates, such as the recall bias and report bias. Additionally, the multilevel regression tended to narrow the highest and lowest rates towards the global mean of the dataset. As county-level population counts are unavailable, the estimates provided by the Census were used instead, which may affect the precision of the estimates as well. Finally, the 95% CI of estimates were generated through Monte Carlo simulation. Compared with Hierarchical Bayesian estimation, these intervals are relatively narrower. Thus, caution should be taken when using CIs to compare estimates between two or more counties. Finally, estimates were not provided for counties in New Jersey because their 2019 BRFSS data were not available. For Tdap vaccination estimation, as 31% of respondents answered, “received tetanus shot but not sure what type”, they were treated as missing. Therefore, cautions should be taken when using county-level Tdap estimates because they could be underestimated.
Multilevel Regression and Poststratification for Small-Area Estimation of Population Health Outcomes: A Case Study of Chronic Obstructive Pulmonary Disease Prevalence Using the Behavioral Risk Factor Surveillance System. American Journal of Epidemiology. 2014;179:1025–1033.
Validation of Multilevel Regression and Poststratification Methodology for Small Area Estimation of Health Indicators from the Behavioral Risk Factor Surveillance System. American Journal of Epidemiology. 2015;182:127–137.
Comparison of Methods for Estimating Prevalence of Chronic Diseases and Health Behaviors for Small Geographic Areas: Boston Validation Study, 2013. Preventing Chronic Disease. 2017;14:170281.
Using 3 Health Surveys to Compare Multilevel Models for Small Area Estimation for Chronic Diseases and Health Behaviors. Preventing Chronic Disease. 2018;15:180313.