Sampling Methodology

First Stage Sampling

In the first stage of CASPER sampling, 30 clusters are selected with their probability proportional to the estimated number of households in each cluster. Therefore, it is required to have a count of all households divided into sections, or clusters. For this reason, U.S. Census blocks are ideal. Essentially, you will list all the census blocks in your sampling frame with the corresponding number of households. This can be downloaded from the Census website. You then number each household and select 30 clusters using probability proportional to the number of households within the cluster. This is done by randomly choosing 30 numbers and selecting the entire cluster in which that random number (i.e., household) is located. Some clusters may be chosen twice. Maps should then be developed via the Census website or through Geographic Information System (GIS) software so teams can easily navigate to the selected clusters. For more information and detail on selecting clusters, please see the CASPER Toolkit, Section 2.4 pdf icon[PDF – 24 MB] Cdc-pdf.

Second Stage Sampling

While there are many ways to conduct the second stage of sampling – selecting 7 households for interview – CDC’s Health Studies recommends systematic random sampling. This is done by counting (or estimating) the number of households within the selected cluster, dividing that number by 7 (this will be your n) then starting at a random point and traveling through the cluster in a serpentine method to select every nth household for interview.

While the most scientific and representative way is to select the seven households and continue to return until an interview is completed, it is important to balance the scientifically ideal with the real-world or disaster situation. Therefore, interview teams should attempt to revisit the selected household three times, but then may replace the household if an interview wasn’t successful (e.g., household refused, nobody answered after three attempts, language barrier). Apartments should be treated as separate households. Interview teams can approximate the number of households by counting on one floor (or building) and multiplying by the number of floors (or buildings) and select your n as described above. However, in some situations, having such a large n is not timely or efficient. For example, if your cluster is a large highrise building, it can be tedious for teams to continue to snake up and down multiple floors, especially in a disaster situation if the power is out and teams are required to use stairs. In such situations, teams can randomly select 7 floors in the highrise and then try to get an interview on each floor. Overall, keeping the sample as complete and representative as possible requires sound judgment and quality training of interview teams.

For more information, please see the CASPER Toolkit, Section 3.4pdf iconCdc-pdf.

Modifications

Modifications to the traditional 30 x 7 design may be warranted.

Increase cluster selection
If you are worried that there may be clusters that are inaccessible due to storm damage or restricted entries, you may consider increasing the number of clusters selected a priori. This may only be done prior to the selection of clusters as clusters should be chosen without substitution—meaning that the clusters originally selected are the clusters that are assessed. However, you could consider oversampling clusters with a 35×7 design (sample size of 245). It is important to remember that this does not improve response rates, but can increase the sample size. For more information, please see the CASPER toolkit, Section 2.5.1pdf icon.

Another common problem, especially in more rural areas, is that clusters may have fewer than seven households making it impossible for teams to interview the needed number from that cluster. Generally, this is not too much of an issue as smaller clusters have lower probability of being selected and therefore those with fewer than seven households will be kept to a minimum. However, if the sampling frame consists of a larger proportion of small clusters (i.e., fewer than 15-20 households), interview teams may have difficulty finding seven households to interview resulting in a low completion rate. To avoid this, check the frequency of households within the chosen sampling frame to identify if this may be a problem. If there appears to be many clusters with a small number of households, you may use the “block group” census variable or adjoin census blocks together using GIS software to create larger clusters.  For more information, please see the CASPER toolkit, Section 2.4.1pdf icon.

Housing unit vs occupied housing unit
Situations may occur in which the area to be sampled contains a high proportion of second homes or vacation rental properties. In these situations, you may consider using the “occupied housing unit” variable as the U.S. Census defines occupied housing as the usual place of residence of the person or group of people living in it at the time of enumeration. Therefore, vacation homes would be counted as “housing units” but not as “occupied housing units”.

For more information on modifying CASPER sampling, please see CASPER toolkit Section 2.5pdf icon or contact CASPER@cdc.gov or Amy Schnall (GHU5@cdc.gov) at Health Studies. You may also review Annex C from the World Health Organization’s Expanded Program on Immunization (EPI) Immunization coverage cluster survey reference manual Cdc-pdfpdf icon[PDF – 818 KB]Externalexternal icon.

Geographic Information System (GIS) CASPER Toolbox

Using GIS software, such as ArcGIS developed by the Environmental Systems Research Institute, Inc. (ESRI), provides more flexibility in the selection of the sampling frame by allowing the user to select portions of a county, or counties, to assess. Your sampling frame is then not limited to just counties but can be delineated by zip codes, cities, key landmarks, storm tracks, highways, or multiple other options. CDC’s Health Studies, in conjunction with the Agency for Toxic Substances and Disease Registry’s (ATSDRs) Geographic Research, Analysis and Services Program (GRASP), developed an ArcGIS CASPER Toolbox. This toolbox allows for the freedom to select any sampling frame within the United States and is faster and less time-consuming than the traditional Census Website method. If GIS capabilities are not available, CDC’s Health Studies is available to provide sampling and mapping using the ArcGIS CASPER Toolbox. Please contact CASPER@cdc.gov if you would like sampling and mapping assistance or if GIS capabilities are available in your state and you would like to receive the GIS CASPER Toolbox.

Things to Avoid

Convenience sampling
Convenience sampling is a form of non-probability sampling that involves selection based on availability, opportunity, or convenience. For example, going to households where there are people outside or where another interviewee told you to go since they know they would answer.

Target sampling
Target sampling is a form of non-probability sampling that involves intentionally sampling a certain population or group. For example, going to the household that looks the most damaged or like it will get “best” results”.

Sequential sampling