Task 2: Key Concepts about Analyzing Subgroups

Sometimes you may wish to analyze only a certain demographic subgroup of interest, such as a particular age range or gender, or only those survey participants who were tested for a particular diet-related lab analyte, such as serum carotenoids. 

As a general rule, when working in any survey analysis software package, such as SUDAAN or SAS, the dataset used as input to all procedures should contain all individuals in the sample with non-missing or non-zero values of the appropriate sample weighting variable.  That is, you should use the entire dataset (instead of creating smaller subset of the data) and then use coding statements to select the subpopulation of interest.  Although estimates of descriptive statistics might be the same if you used a subset of the entire file, the estimated standard errors would not be appropriately calculated.  This is particularly true if the subset is based on a characteristic measured in the survey.  For example, it would not be appropriate to create a smaller data file comprised of only those who are diabetic or those who are hypertensive. 

The only time that you can create separate datasets for smaller subgroups is when those subgroups are based on specific values of the variables used in constructing the sample weight (e.g., gender, race/ethnicity, age). It should be noted that if a smaller dataset is created based on these demographic characteristics, the standard errors may not differ greatly from the standard errors from the full dataset.  However, as a general rule, the full data set should be used with the subgroups defined in the following manner:




close window icon Close Window to return to module page.