Using the Data

There are a number of methods for small area estimation (SAE). The multilevel regression modeling with poststratification framework, which was used in the 500 Cities Project, is one methodology that communities might consider when embarking on generating their own small area estimates. Additional information on the methodology is available on the 500 Cities web site. Some communities have already generated their own direct survey estimates or small area estimates, and they are encouraged to use their local estimates as their primary data. However, the estimates from the 500 Cities Project may provide additional insights into the health issues affecting residents for those communities.

The SAE code used in the 500 Cities Project was developed specifically for the 500 Cities outcomes and for the census tract and city levels, using the entire BRFSS dataset for all states and DC and including variables in the model for state and local levels. The use of the code by other communities may or may not be appropriate without some modification.

In addition, use of the SAE code assumes that the end user has access to geocoded (in the case of 500 Cities this was the county) survey data. Restricted BRFSS data, which includes substate geographic identifiers (county) is available through the Research Data Centers (RDCs) by way of a formal data hosting agreement on a case-by-case basis for research purposes. Learn more about the proposal process.

Unfortunately, at this time CDC does not have the capacity or resources to respond to all individual requests for technical assistance on the modeling process, modifying the code, or running special data analyses (e.g., computing estimates for portions of census tracts not contained within the city boundaries). Requests for such assistance will have to be handled on a case-by-case basis and will depend on existing resources and workload.

We cannot include policy or program intervention effects, which would occur locally, in the modeling process. Therefore, the estimates for local areas are the statistically expected prevalence of the risk factor, health outcome, or preventive service use based on the associations that we observe through the overall model. It is possible that a community may have a program intervention that has a substantial effect, such that the resulting prevalence of a health risk factor (for example) is lower or preventive service is higher than what is statistically predicted by our model. In that case, if a community relies solely on the small area estimates, the effect of that local intervention would be underestimated. Thus, without reliable local information about public health programs, model-based local estimates should not be used to evaluate the effect of local public health programs, policies, or interventions. We would suggest using the model-based estimates for the baseline and communities conducting their own surveys to evaluate the effect of their interventions.

The data can be used to:

  • Identify the health issues facing a city or neighborhoods within cities.
  • Identify emerging health problems.
  • Establish key health objectives.
  • Develop and implement effective and targeted prevention activities.

Because these are modeled and not direct estimates, the data should not be used for ranking the overall health of the 500 cities. The 500 Cities Project does not provide a weighted composite score for the included cities; therefore, the data should not be used to rank the overall health of a city. However, cities can be compared on individual measures.

The current modeling procedure does not support using the estimates to track changes at the local level over time.

Estimates depend on two main components: 1) the survey responses in the given survey year; and 2) the detailed population distribution within the cities and tracts. Because we use the 2010 US Census as the poststratification dataset, we cannot incorporate year-to-year population change in the modeled results. So the assumption for any given point-in-time estimate is that the city and census tract population in that year is the same as it was measured in 2010. We are exploring options with the US Census to derive detailed population estimates for the intercensal years (e.g., 2015) and we may incorporate such population estimates in future years’ modeled estimates.