Methodology for Calculating County-Level Estimates and 95% Confidence Intervals for Immunization Indicators
The need for small area estimates of health indicators is well documented and important for efficient public health planning. The lack of available direct data below the state level for many indicators poses difficulties to effective planning. In the absence of directly measured data, statistical modelling can provide useful information. Various approaches are available to estimate county-level estimates from higher geographic-level data. The Epidemiology and Surveillance Branch in the CDC’s Division of Population Health employs an estimation method using multilevel logistic regression and post-stratification. County-level prevalence estimates were derived using data from the 2021 Behavioral Risk Factor Surveillance System (BRFSS), 2017–2021 American Community Survey (ACS), and 2021 Census Population estimates as described in this document. All immunization indicators are dichotomous and defined as follows:
- The proportion of the county population aged 18 years or older who have received an Influenza vaccination within the past 12 months.
- The proportion of the county population aged 18 to 64 years who are considered at increased risk and who have received a Pneumococcal vaccination.
- The proportion of the county population aged 65 years or older who have received a Pneumococcal vaccination.
- The proportion of the county population aged 18 years or older who have received an Influenza vaccination in non-medical settings.
Steps in Calculation Prevalence Estimates
Calculating county-level prevalence estimates were conducted using a multilevel regression and post-stratification method. This method has been used in the 500 Cities Project 2016–2019, which in 2020, expanded into the PLACES Project. The steps used to create the estimates and their 95% confidence intervals (CIs) include the following:
- Model construction. A multilevel logistic regression model was constructed for each indicator, which included data from the 2021 BRFSS that included unit-level age (13 categories), sex (2 categories), race/ethnicity (8 categories), education level (4 categories); ACS data for county-level percentage of adults below 150% of the poverty line from 2017 to 2021; and state- and county-level data for random effects.
- Population linkage. All the parameters from the model were linked with 2021 Census population estimates by county, age, sex, race/ethnicity, and with the 2017–2021 county-level percentage of adults below 150% of the poverty line. The 2021 Census county-level population data by age, sex, and race/ethnicity were not available for the stratified 4 education levels. Therefore, the distribution of education levels for each category of population by age, sex, and race/ethnicity were imputed by bootstrapping (a resampling approach) using ACS’s county-level percentage of adults by 4 education categories.
- Estimates and 95% CIs calculation. Predicted probability for the risk of each indicator was calculated using the model parameters and the county population data using the multilevel logistic model formula. County-level prevalence estimates were obtained by multiplying the predicted probability with the county-level population over 208 categories. These steps were repeated 1,000 times using Monte Carlo simulation, which generated 1,000 sets of estimates for each indicator. The final estimate of each indicator by county was reported as the mean among the 1,000 sets of estimates and its 95% CI was the 2.5th and 97.5th Note: Before 2023, we applied the parameters of the multilevel regression models for the measures to the census population data categorized by age, sex, and race/ethnicity and used Monte Carlo simulation to draw 1000 random samples to generate the distribution of the estimates and construct 95% CIs. The simulation was assumed that the random error for the random effects varied within each of the population category. In 2023, we still use Monte Carlo simulation approach, but we assume that the random error for the random effects varies only within counties. In this way, the estimation of CIs is more conservative, and the width is similar to the 95% credible intervals by using hierarchical Bayesian estimation via Markov Chain Monte Carlo for the same datasets. Please refer to the publication, Constructing Statistical Intervals for Small Area Estimates Based on Generalized Linear Mixed Model in Health Surveys – Open Journal of Statistics – SCIRP.
Limitations of the Procedure
Several limitations should be noted using this procedure: the modeling process would carry over all the known bias from BRFSS data to the estimates, such as recall bias and report bias. Additionally, the response rate of BRFSS data is relatively low, small area estimation is not intended to correct the non-response rate. As county-level population counts are unavailable, the estimates provided by the Census were used instead, which may affect the precision of the estimates as well. Finally, as sample size of some counties in BRFSS data are very small (e.g., <10), the accuracy of these counties’ estimates may be affected.
Multilevel Regression and Poststratification for Small-Area Estimation of Population Health Outcomes: A Case Study of Chronic Obstructive Pulmonary Disease Prevalence Using the Behavioral Risk Factor Surveillance System. American Journal of Epidemiology. 2014;179:1025–1033.
Validation of Multilevel Regression and Poststratification Methodology for Small Area Estimation of Health Indicators from the Behavioral Risk Factor Surveillance System. American Journal of Epidemiology. 2015;182:127–137.
Comparison of Methods for Estimating Prevalence of Chronic Diseases and Health Behaviors for Small Geographic Areas: Boston Validation Study, 2013. Preventing Chronic Disease. 2017;14:170281.
Using 3 Health Surveys to Compare Multilevel Models for Small Area Estimation for Chronic Diseases and Health Behaviors. Preventing Chronic Disease. 2018;15:180313.
Constructing Statistical Intervals for Small Area Estimates Based on Generalized Linear Mixed Model in Health Surveys. Open Journal of Statistics. 2022; 12.