Task 2b: How to Identify Important Food Group Sources of Nutrients Using SAS

This section describes how to use SAS to identify food group sources of nutrients along with standard errors.  To illustrate this, food sources of calcium are identified for the whole population, ages 2 and older, for 2001-2004.   In this example, a simplistic food grouping scheme based on the first digit of the USDA  food codes, was used for illustrative purposes.


Step 1: Create Folder

Create a folder to save the dataset, list the contents of each dataset, and create a dataset comprised of 4 years of data. (Program not shown.  See the full program in Additional Resources for more information.)


Step 2: Sort and Merge Datasets

Sort and then merge the demographic and individual food intake datasets.  Create new variables, as needed.  Note that the food groups are simply characterized by first digit of individual food code: milk and milk products; meat, poultry, fish and mixtures; eggs; legumes, nuts and seeds; grain products; fruits; vegetables; fats, oils and salad dressings; and sugar, sweeteners and beverages.  (Program not shown.  See the full program in Additional Resources for more information.)


Step 3: Calculate the Weighted Contribution of Calcium from Each Food Group

Calculate the weighted contribution of calcium from each food group using the PROC SURVEYFREQ procedure in SAS.




Identifying Food Group Sources of Calcium

Identifying Food Group Sources of Calcium

Sample Code

* The SURVEYFREQ procedure in SAS calculates the weighted contribution of ;
* calcium from each food group.                                           ;
*                                                                         ;
* Note that for this analysis, only the data for INCOH=1 is of interest.  ;
* However, this code will also generate data for INCOH=0.                 ;

proc surveyfreq data=FDSRC;
    strata SDMVSTRA;
    cluster SDMVPSU;
    weight WTD_CALC;
    tables FOODGRP*INCOH;
    title "Percent calcium by food group, using PROC SURVEYFREQ" ;
run ;


Output of Program

The SURVEYFREQ Procedure                            
                                 Data Summary                                  
             Number of Strata                                  30              
             Number of Clusters                                60              
             Number of Observations                        274168              
             Number of Observations Used                   257658              
             Number of Obs with Nonpositive Weights         16510              
             Sum of Weights                            2.49766E11              
              Broad food grp based on 1st digit of USDA food code              
                                                     Weighted    Std Dev of    		Std Error of
                          FOODGRP     Frequency     Frequency      Wgt Freq	Percent       Percent 
             Milk & Milk Products         40207     1.1581E11    6226455948       46.3673        0.4185
   Meat, Poultry, Fish & Mixtures         29350    1.77625E10     994172188        7.1117        0.2178 
                             Eggs          4137    4642738173     225930307        1.8588        0.0680
          Legumes, Nuts and Seeds          6102    4129696674     251569993        1.6534        0.0707
                   Grain Products         63548    7.35244E10    3425294440       29.4373        0.3696
                           Fruits         21721    8017836305     405334291        3.2101        0.1207
                       Vegetables         41477    1.20417E10     582319600        4.8212        0.1317
     Fats, Oils & Salad Dressings          9075     792221049      51116840        0.3172        0.0162
    Sugar, Sweeteners & Beverages         42041     1.3045E10     607625180        5.2229        0.1254
                            Total        257658    2.49766E11     1.1986E10      100.000 

Highlights from the output include:


The frequency counts in this analysis represent the number of reports of foods that contain calcium, by food group. It is important to note that the frequencies in the SAS output to do not match those in the SUDAAN output because of special procedures required in SAS to conduct this analysis (see Task 3 in “Module 11: Weighting” in the Continuous NHANES Tutorial for more information). However, the unweighted frequencies are not important to this analysis and they do not represent an estimate for the U.S. population. Therefore, they can be ignored.


close window icon Close Window to return to module page.