Task 1: How to Estimate the Distribution of Usual Intake for a Single Ubiquitously-consumed Dietary Constituent for One Population or Subpopulation using the NCI Method

The following example shows how the distribution of calcium from foods and beverages can be estimated for women ages 19 years and older.

This example uses the demoadv dataset (download at Sample Code and Datasets).  The variables w0304_0 to w0304_16 are the weights (dietary weights and Balanced Repeated Replication [BRR] weights) used in the analysis of 2003-2004 dietary data that require the use of BRR to calculate standard errors. The model is run 17 times, including 16 runs using BRR (see Module 18, Task 4 for more information).  BRR uses weights w0304_1 to w0304_16

 

Info iconIMPORTANT NOTE

Note: If 4 years of NHANES data are used, 32 BRR runs are required. Additional weights are found in the demoadv dataset.

 

A SAS macro is a useful technique for rerunning a block of code when you want only to change a few variables; the macro BRR201 is created and called in this example. The BRR201 macro calls the MIXTRAN macro and the DISTRIB macro, and calculates BRR standard errors of the parameter estimates.  The MIXTRAN macro obtains preliminary estimates for the values of the parameters in the model, and then fits the model using PROC NLMIXED. It also produces summary reports of the model fit. 

Modeling the complex survey structure of NHANES requires procedures that account for both differential weighting of individuals and the correlation among sample persons within a cluster.  The SAS procedure NLMIXED can account for differential weighting by using the replicate statement.  The use of BRR to calculate standard errors accounts for the correlation among sample persons in a cluster.  Therefore, NLMIXED (or any SAS procedure that incorporates differential weighting) may be used with BRR to produce standard errors that are suitable for NHANES data without using specialized survey procedures. The DISTRIB macro estimates the distribution of usual intake, producing estimates of percentiles and the percent of the population below a cutpoint.

 

Info iconIMPORTANT NOTE

Note that the DISTRIB macro currently requires that at least 2 cutpoints be requested in order to calculate the percent of the population below a cutpoint.

 

The effect of the sequence of the 24-hour recall is removed from the estimated nutrient intake distribution (Day 1 or Day 2 24-hour recall). An adjustment is also made for day of the week the 24-hour recall was collected, dichotomized as weekend (Friday-Sunday) or weekday (Monday-Thursday). (See Module 18, Task 3 for more information on covariate adjustment.) BRR (Module 18, Task 4) is used to calculate standard errors.

The MIXTRAN and DISTRIB macros used in this example were downloaded from the NCI website.  Version 1.1 of the macros was used.  Check this website for macro updates before starting any analysis.  Additional details regarding the macros and additional examples also may be found on the website.

 

Step 1: Create a dataset so that each row corresponds to a single person day and define variables if necessary

Statements Explanation

data demoadv;
  set nh.demoadv;
if w0304_0 ne . ;
run ;

 

First, select only those people with dietary data by selecting those without missing BRR weights.

data day1;
set demoadv;
if riagendr= 2 and ridageyr>= 19 ;
DRTCALC=DR1TCALC;
day= 1 ;
run ;

 

data day2;
set demoadv;
if riagendr= 2 and ridageyr>= 19 ;
DRTCALC=DR2TCALC;
day= 2 ;
run ;

The variables DR1TCALC and DR2TCALC are NHANES variables representing total calcium consumed on days 1 and 2, respectively, from all foods and beverages (other than water). 

To create a dataset with 2 records per person, the demoadv dataset is set 2 times to create 2 datasets, one where day=1 and one where day=2.  The same variable name, DRTCALC, is used for calcium on both days.  It is created by setting it equal to DR1TCALC for day 1 and DR2TCALC for day 2.  Adult women ages 19 years and older are selected for analysis.

data calcium;
set day1 day2;
if DAY_WK in ( 1 , 6 , 7 ) then weekend= 1 ;

  else if DAY_WK in ( 2 , 3 , 4 , 5 ) then weekend= 0 ;
run ;

Finally, these data sets are appended, and day of the week dummy variables are created.  To use the NLMIXED procedure, dummy variables must be created (there is no CLASS statement). 

 

Step 2: Sort the dataset by respondent and day

It is important to sort the dataset by respondent and intake day (day 1 and 2) because the NLMIXED procedure uses this information to estimate the model parameters.

 

Step 3: Create the BRR201 macro

The BRR201 macro calls the MIXTRAN macro and DISTRIB macro and computes standard errors of parameter estimates.  After creating this macro and running it 1 time, it may be called several times, each time changing the macro variables.

 

 

Statements Explanation

%include   'C:\NHANES\Macros\mixtran_macro_v1.1.sas' ;

%include   'C:\NHANES\Macros\distrib_macro_v1.1.sas' ;

 

This code reads the MIXTRAN and DISTRIB macros into SAS so that these macros may be called.

%macro BRR201(data, response, foodtype, subject, repeat, covars_prob, covars_amt, outlib, pred, param, modeltype, lambda, seq, weekend, vargroup, numvargroups ,subgroup, start_val1, start_val2, start_val3, vcontrol, nloptions, titles, printlevel, cutpts, ncutpts, nsim_mc, byvar, final);

The start of the BRR201 macro is defined.  All of the terms inside the parentheses are the macro variables that are used in the macro.

%MIXTRAN (data=&data, response=&response, foodtype= &foodtype, subject=&subject, repeat=&repeat, covars_prob= &covars_prob, covars_amt=&covars_amt, outlib=&outlib, modeltype=&modeltype, lambda=&lambda, replicate_var= w0304_0, seq=&seq, weekend=&weekend, vargroup=&vargroup, numvargroups=&numvargroups, subgroup=&subgroup, start_val1=&start_val1, start_val2=&start_val2, start_val3=&start_val3, vcontrol=&vcontrol, nloptions= &nloptions, titles=&titles, printlevel=&printlevel)

 

Within the BRR201 macro, the MIXTRAN macro is called.  All of the variables preceded by & will be defined by the BRR201 macro call.  The only variable without an & is the replicate_var macro variable; it is set to w0304_0 for the first run.

%DISTRIB (seed= 0 , nsim_mc=&nsim_mc, modeltype=&modeltype, pred= &pred, param= &param, outlib=&outlib, cutpoints= &cutpts, ncutpnt=&ncutpts, byvar=&byvar, subgroup= &subgroup, subject=&subject, titles=&titles, food= &foodtype);

Within the BRR201 macro, the DISTRIB macro is called.  All of the variables preceded by & will be defined by the BRR201 macro call.  The seed for generating the distribution has been set to 0, which will use the clock to randomly start a sequence.  The datasets defined by the macro variables pred and param (_pred_unc_&foodtype and _param_unc_&foodtype) are created in the MIXTRAN run.

data dist;

set & outlib..d escript_&foodtype._w0304_0;

mergeby= 1 ;

keep &subgroup mergeby numsubjects mean_mc_t  tpercentile1-tpercentile99 cutprob1-cutprob&& ncutpts. mergeby;

run;

The dataset descript_&foodtype_w0304_0 is defined in the DISTRIB macro.  This data step keeps the parameters of interest from that dataset and defines a variable mergeby that will be used later.

%do run= 1 %to 16 ;

This code starts a loop to run the 16 BRR runs.

options nonotes;

Notes are turned off to save room in the log.

%put ~~~~~~~~~~~~~~~~~~~ Run &run ~~~~~~~~~~~~~~~~~~~~;

The run number is printed to the log.

%MIXTRAN (data=&data, response=&response, foodtype=&foodtype, subject=&subject, repeat=&repeat,               covars_prob=&covars_prob, covars_amt=&covars_amt, outlib=&outlib, modeltype=&modeltype, lambda=&lambda, replicate_var=w0304_&run, seq=&seq, weekend=&weekend, vargroup=&vargroup, numvargroups=&numvargroups, subgroup=&subgroup, start_val1=&start_val1, start_val2= &start_val2, start_val3=&start_val3, vcontrol=&vcontrol,                

nloptions=&nloptions, titles=&titles, printlevel= &printlevel)   

Within the BRR201 macro, the MIXTRAN macro is called.  All of the variables preceded by & will be defined by the BRR201 macro call.  The only variable without an & is the replicate_var macro variable; it is set to w0304_&run where &run equals 1 to 16.

%DISTRIB (seed= 0 , nsim_mc=&nsim_mc, modeltype=&modeltype, pred=&pred, param=&param, utlib=&outlib, cutpoints=&cutpts, ncutpnt=&ncutpts, byvar=&byvar, subgroup=&subgroup, subject=&subject, titles=&titles, food=&foodtype);

Within the BRR201 macro, the DISTRIB macro is called.  All of the variables preceded by & will be defined by the BRR201 macro call.  The seed for generating the distribution has been set to 0, which will use the clock to randomly start a sequence.  The datasets defined by the macro variables pred and param (_pred_unc_&foodtype and _param_unc_&foodtype) are created in the MIXTRAN run.

data distbrr;

set & outlib..d escript_&foodtype._w0304_&run;

rename numsubjects=bnumsubjects  mean_mc_t=bmean_mc_t  tpercentile1-tpercentile99=btpercentile1-btpercentile99

cutprob1-cutprob&& ncutpts. =bcutprob1-bcutprob&& ncutpts. ;

run=&run;

mergeby= 1 ;

data distbrr;

set distbrr;

keep &subgroup bnumsubjects bmean_mc_t  btpercentile1-btpercentile99 bcutprob1-bcutprob&& ncutpts. mergeby;

run;

The dataset descript_&foodtype_w0304_&run is defined in the DISTRIB macro.  This data step keeps the parameters of interest from that dataset and renames the variables.  It defines a variable mergeby that will be used later.

proc append base=brr_runs data=distbrr;
run;

The BRR datasets are appended into a dataset called brr_runs.

proc datasets nolist; delete distbrr; run;

After appending the information to brr_runs, distbrr can be deleted.

%end ;

The BRR runs end.

proc sort data=dist; by &subgroup mergeby;

proc sort data=brr_runs; by &subgroup mergeby;

The data are sorted before merging.

data distall;

merge dist  brr_runs; by &subgroup mergeby;

array bvar (*) bmean_mc_t  btpercentile1-btpercentile99   bcutprob1-bcutprob&& ncutpts. ;

array varo (*) mean_mc_t   tpercentile1-tpercentile99    cutprob1-cutprob&& ncutpts. ;

array dsqr (*) dbmean_mc_t dbtpercentile1-dbtpercentile99  dbcutprob1-dbcutprob&& ncutpts. ;

do i= 1 to dim(bvar);

dsqr[i]=(bvar[i]-varo[i])** 2 ;

end;

run;

The datasets brr_runs and distbrr are merged, and the squared difference between the BRR estimate and the parameter from the first run are created.

proc means data=distall sum;  by &subgroup mergeby;

var dbmean_mc_t  dbtpercentile1-dbtpercentile99   dbcutprob1-dbcutprob&& ncutpts. ;

output out=sums sum= sum_dbmean_mc_t 

sum_dbtpercentile1-sum_dbtpercentile99

sum_dbcutprob1-sum_dbcutprob&& ncutpts. ;

run;

The sum of squares is computed.

data brr;

set sums;

array sumt (*) sum_dbmean_mc_t 

sum_dbtpercentile1-sum_dbtpercentile99

sum_dbcutprob1-sum_dbcutprob&& ncutpts. ;

array se  (*) se_mean_mc_t 

se_tpercentile1-se_tpercentile99

se_cutprob1-se_cutprob&& ncutpts. ;

do j= 1 to dim(sumt);

se[j]=- 1 *sqrt((sumt[j])/( 16 * .49 ));

end;

mergeby= 1 ;

keep se_mean_mc_t  se_tpercentile1-se_tpercentile99 se_cutprob1-se_cutprob&& ncutpts. &subgroup mergeby;

run;

The standard errors are computed. Each SE is multiplied by -1 to make it print out in parentheses in the final step.

data toprint1;

set dist;

line= 1 ; * These are the point estimates;

keep &subgroup numsubjects mean_mc_t  tpercentile1-tpercentile99 cutprob1-cutprob&& ncutpts. line;

run;

To create the final dataset, the point estimates are saved in a file called toprint1. The variable line will identify them as estimates.

data toprint2;
set brr;
line= 2 ;
keep &subgroup mean_mc_t  tpercentile1-tpercentile99
cutprob1-cutprob&& ncutpts. line;
run;

The standard errors are saved in a dataset called toprint2.  The variable line will identify them as standard errors.

data &final;
set toprint1 toprint2;
run;

The final dataset is created by appending toprint1 and toprint2.

proc sort data=&final;
by &subgroup line;
run;

The final dataset is sorted.

proc print data=&final split= ' ' noobs;

var &subgroup line tpercentile5 tpercentile10 tpercentile25 tpercentile50

tpercentile75 tpercentile90 tpercentile95;

format line line.   mean_mc_t  tpercentile1-tpercentile99  negparen10.1 cutprob1-cutprob&& ncutpts. negparen6.2 ;

title 'Usual Intake of Calcium' ;

title2 'NHANES 2003-04' ;

run;

The final dataset is printed. The format negparen will make the standard errors print in parentheses.

%mend BRR201;

The end of the BRR201 macro is indicated.

 

Step 4: Run the BRR201 macro to obtain parameter estimates for the covariates of interest from the model used in the NCI Method

Use the BRR201 macro to obtain parameter estimates.  It is possible to call the BRR201 macro several times, varying the values of the parameters each time. For example, the variables of interest could be changed.  This merely requires calling the macro again (using a call similar to that below), not redefining the macro each time.

 

Statements Explanation

%BRR201(data=calcium, response=DRTCALC, foodtype=Calcium, subject=seqn,
repeat=day,
seq=day,
covars_amt=,
weekend=weekend,
outlib=work,
pred = work._pred_unc_Calcium,
param = work._param_unc_Calcium, modeltype=amount,
titles=
1 ,
printlevel=
2 ,
cutpts=
500 1000 1500 ,
ncutpts=
3 ,
nsim_mc=
100 , final=nh.m20task1)  

 

This code calls the BRR201 macro.  The dataset calcium defined in Step 1 is used; the macro variable response for which we want to model the distribution is DRTCALC.  The macro variable foodtype is used to label the pred and param datasets.  The variable seqn identifies the subject, and the macro variable repeat defines the variable that identifies the repeats on the subject, which is day.  No covariates are included in the model, although they could be specified with the covars_amt macro variable.

The weekend macro variable includes a weekend effect in the model, and it calculates the distribution by 4/7 for weekdays and 3/7 for weekends.  It must be set equal to a variable called weekend in the dataset.

The macro variable outlib specifies the library where the data are to be stored.  In this case, the working directory, work, was used.  It is important to note that the macro variables pred and param must specify the outlib directory, and they must use foodtype to identify the food modeled. Because the example presented here is a ubiquitously-consumed dietary constituent and the amount model was used, the dataset from MIXTRAN has the term _unc_ as part of the dataset name for pred and param.

Because this is a ubiquitously-consumed dietary constituent, modeltype= amount  is specified.  This fits the amount model.

The macro variable titles saves 1 line for a title supplied by the user.  The printlevel is 2, which prints the output from the NLMIXED runs and the summary.

By specifying the cutpoints (cutpts) of 500, 1000, and 1500 mg, the macro will produce an estimate of the proportion of the population below these values.  Because there are 3 cutpoints, this is specified in the ncutpts macro variable.

Info iconIMPORTANT NOTE

Note that the DISTRIB macro currently requires that at least 2 cutpoints be requested in order to calculate the percent of the population below a cutpoint.

The macro variable nsim_mc is used to specify the number of pseudo-individuals for which the distribution is simulated per respondent.

The variable final specifies the name of the final dataset produced.

 

Step 5: Interpret parameter estimates for the variable of interest

Info iconIMPORTANT NOTE

Note: Your results may vary slightly, as a random seed is used to estimate the distribution of usual intake. However, they would not be expected to vary by more than 1%.

 

close window icon Close Window to return to module page.