Task 2: How to Evaluate the Effects of Covariates on Usual Intake of a Single Episodically-Consumed Dietary Constituent

In this example, the relationship between race/ethnicity and age on dairy intake in adult women (older than age 50 years) is modeled.

This example uses the demoadv dataset (download at Sample Code and Datasets).  The variables w0304_0 to w0304_16 are the weights (dietary weights and Balanced Repeated Replication [BRR] weights) used in the analysis of 2003-2004 dietary data that requires the use of BRR to calculate standard errors. The model is run 17 times, including 16 runs using BRR (see Module 18 "Model Usual Intake Using Dietary Recall Data", task 4 for more information).  BRR uses weights w0304_1 to w0304_16.

Info iconIMPORTANT NOTE

Note: if 4 years of NHANES data are used, 32 BRR runs are required.

 

A SAS macro is a useful technique for rerunning a block of code when the analyst only wants to change a few variables; the macro BRR192 is created and called in this example. The BRR192 macro calls the MIXTRAN macro, and calculates BRR standard errors of the parameter estimates.  The MIXTRAN macro obtains preliminary estimates for the values of the parameters in the model, and then fits the model using PROC NLMIXED. It also produces summary reports of the model fit.

Recall that modeling the complex survey structure of NHANES requires procedures that account for both differential weighting of individuals and the correlation among sample persons within a cluster.  The SAS procedure NLMIXED can account for differential weighting by using the replicate statement.  The use of BRR to calculate standard errors accounts for the correlation among sample persons in a cluster.  Therefore, NLMIXED (or any SAS procedure that incorporates differential weighting) may be used with BRR to produce standard errors that are suitable for NHANES data without using specialized survey procedures.

The MIXTRAN macro used in this example was downloaded from the NCI website.  Version 1.1 of the macro was used.  We recommend that you check this website for macro updates before starting any analysis.  Additional details regarding the macro and additional examples also may be found on the website and in the users’ guide.

 

Step 1: Create a dataset so that each row corresponds to a single person day and define indicator variables if necessary

First, select only those people with dietary data by selecting those without missing BRR weights.

data demoadv;

set nh.demoadv;

if w0304_0 ne . ;  

run ;

 

The variables d_milk_d1 and d_milk_d2  are derived variables representing total milk consumed (cup equivalents) on days 1 and 2 respectively using My Pyramid Equivalences (see Module 4 "Resources for Dietary Data Analysis" and Module 9 "Review Data and Create New Variables", Task 4).  To create a dataset with 2 records per person, the demoadv dataset is set 2 times to create 2 datasets, one where day=1 and one where day=2.  The same variable name, d_milk, is used for dairy on both days.  It is created by setting it equal to d_milk_d1 for day 1 and d_milk_d2  for day 2.  This code also selects women older than age 50 years.

 

data day1;
set demoadv;
if riagendr= 2 and ridageyr>= 51 ;
d_milk=d_milk_d1;
day= 1 ;
run ;

data day2;
set demoadv;
if riagendr= 2 and ridageyr>= 51 ;
d_milk=d_milk_d2;
day= 2 ;
run ;

 

Finally, these data sets are appended, and dummy variables are created.  To use the NLMIXED procedure, dummy variables must be created (there is no CLASS statement to create dummy variables as in other SAS procedures).  In this example, the following code was used:

 

data calcium;
set day1 day2;
eth1=(ridreth1= 1 );
eth2=(ridreth1= 2 );
eth3=(ridreth1= 3 );
eth4=(ridreth1= 4 );
run ;

 

Finally, these data sets are appended, and dummy variables are created.  To use the NLMIXED procedure, dummy variables must be created (there is no CLASS statement).  This example uses the following code:

 

data milk;
set day1 day2;
eth1=(ridreth1= 1 );
eth2=(ridreth1= 2 );
eth3=(ridreth1= 3 );
eth4=(ridreth1= 4 );
run ;

 

Because ridreth1 has 5 levels, 4 dummy variables are needed.  This type of programming creates a variable called, for example, eth1 if the variable ridreth1 is equal to 1, and it is coded as 0 otherwise. 

Info iconIMPORTANT NOTE

Note: if the variable you are using has missing values, these will be coded to zero using the above code. Additional code would need to be added to set these to missing.  Also, if you use the “<” symbol in SAS to create a dummy variable, note that missing data are automatically assigned negative values of very large magnitude, so they must always considered to be <0 and will be coded as missing.).

 

Step 2: Sort the dataset by respondent and day

It is important to sort the dataset by respondent and day because the NLMIXED procedure uses this information to estimate the model parameters

 

Step 3: Create the BRR192 macro

The BRR192 macro calls the MIXTRAN macro and computes standard errors of parameter estimates.  After creating this macro and running it one time, you may call it several times, each time changing the macro variables.

 

Create the BRR192 Macro

Statements Explanation

%macro BRR192(data, response, foodtype, subject, repeat, covars_prob, covars_amt, outlib, modeltype, lambda,seq, weekend, vargroup, numvargroups, subgroup, start_val1, start_val2, start_val3, vcontrol, nloptions, titles, printlevel, final);

The start of the BRR192 macro is defined.  All of the terms inside the parentheses are the macro variables that are used in the macro.

%MIXTRAN

(data=&data, response=&response, foodtype=&foodtype, subject= &subject, repeat=&repeat, covars_prob=&covars_prob, covars_amt= &covars_amt, outlib=&outlib, modeltype=&modeltype, lambda=&lambda, replicate_var=w0304_0, seq=&seq, weekend=&weekend, vargroup= &vargroup, numvargroups=&numvargroups, subgroup=&subgroup,

start_val1=&start_val1, start_val2=&start_val2, start_val3= &start_val3, vcontrol=&vcontrol, nloptions=&nloptions, titles= &titles, printlevel=&printlevel)

Within the BRR192 macro the MIXTRAN macro is called.  All of the variables preceded by “&” will be defined by the BRR192 macro call.  The only variable without an “&” is the replicate_var macro variable; it is set to w0304_0 for the first run.

data _null_;

format old varA $255. ;

%let I=1;

%let varamtu= %upcase (INTERCEPT &covars_amt);

%do %until ( %qscan (&varamtu,&I, %str ( ))= %str ());

%let varb&I= %qscan (&varamtu,&I, %str ( ));

%if %eval (&i) lt 9 %then %let znum = "0";

  %else %let znum= %str ();

num= %eval (&i);

varA= strip( 'A' ||strip(&znum)||strip(num)|| '_' || strip( "&&varb&i." ));

old =  trim(old)|| ' ' ||trim(varA);

%let I= %eval (&I+1);

%end ;

%let cnt= %eval (&I-1);

%if &covars_ata _null_;

format old varA $255. ;

%let I=1;

%let varamtu= %upcase (INTERCEPT &covars_amt);

%do %until ( %qscan (&varamtu,&I, %str ( ))= %str ());

%let varb&I= %qscan (&varamtu,&I, %str ( ));

%if %eval (&i) lt 9 %then %let znum = "0"; %else %let znum= %str () ;

num= %eval (&i);

varA= strip( 'A' ||strip(&znum)||strip(num)|| '_' ||strip( "&&varb&i." ));

old =  trim(old)|| ' ' ||trim(varA);

%let I= %eval (&I+1);

%end ;

%let cnt= %eval (&I-1);

%if &covars_amt= %str () %then %let cnt=1;

call symput( 'old' ,old);

run;

amt= %str () %then %let cnt=1;

call symput( 'old' ,old);

run;

This data step defines macro variables that will be used in the next step of the macro.

This code recreates the way that the MIXTRAN macro defines the parameter names, and makes a list of parameter names that are stored in the _param_&foodtype (called &old) for the amount part of the model. It also counts the number of parameters (&cnt).

data parms_amt;

set & outlib.._ param_unc_&foodtype;

array old (&cnt) &old;

array new (&cnt) &varamtu;

do k= 1 to dim(new);

new[k]=old[k];

end;

keep &varamtu;

run;

The dataset _param_unc_&foodtype is defined in the MIXTRAN macro.  This data step sets the dataset _param_&foodtype and renames the amount parameters to their variable names.

data _null_;

format oldpr varP $255. ;

%let I=1;

%let varprobu= %upcase (INTERCEPT &covars_prob);

%do %until ( %qscan (&varprobu,&I, %str ( ))= %str ());

%let varp&I= %qscan (&varprobu,&I, %str ( ));

%if %eval (&i) lt 9 %then %let znum = "0"; %else %let znum= %str (); num= %eval (&i);

varP= strip( 'P' ||strip(&znum)||strip(num)|| '_' ||strip( "&&varp&i." ));

oldpr =  trim(oldpr)|| ' ' ||trim(varP);

%let I= %eval (&I+1);

%end ;

%let cntp= %eval (&I-1);

%if &covars_amt= %str () %then %let cntp=1;

call symput( 'oldpr' ,oldpr);

run;

This data step defines macro variables that will be used in the next step of the macro.

This code recreates the way that the MIXTRAN macro defines the parameter names, and makes a list of parameter names that are stored in the _param_&foodtype (called &old) for the probability part of the mdoel. It also counts the number of parameters (&cnt).

data parms_prob;

set & outlib.._ param_&foodtype;

array old (&cntp) &oldpr;

array new (&cntp) &varprobu;

do k= 1 to dim(new);

new[k]=old[k];

end;

keep &varprobu;

run;

    

The dataset _param_&foodtype is defined in the MIXTRAN macro.  This data step sets the dataset _param_&foodtype and renames the probability parameters to their variable names.

*save lambda;

data _null_;

set & outlib.._ param_&foodtype;

call symput ( 'lamb' ,a_lambda);

run;

 

Lambda (the Box-Cox transformation parameter) is fixed in the BRR runs.  The lambda value from the first run is saved in a macro variable called &lamb.

*start BRR runs;

%do run= 1 %to 16 ;

This code starts a loop to run the 16 BRR runs.

%MIXTRAN                                                          

(data=&data, response=&response, foodtype=&foodtype, subject= &subject, repeat=&repeat, covars_prob=&covars_prob, covars_amt= &covars_amt, outlib=&outlib, modeltype=&modeltype, lambda=&lamb, replicate_var=w0304_&run, seq=&seq, weekend=&weekend, vargroup= &vargroup, numvargroups=&numvargroups, subgroup=&subgroup, start_val1=&start_val1, start_val2=&start_val2, start_val3= &start_val3, vcontrol=&vcontrol, nloptions=&nloptions, titles=&titles, printlevel= 2 )

Within the BRR192 macro, the MIXTRAN macro is called for the BRR run.  All of the variables preceded by “&” will be defined by the BRR192 macro call.  The only variable without an “&” is the replicate_var macro variable; it is set to w0304_&run where &run=1 to 16. Notice that the &lamb from the previous dataset is fixed for lambda.

data _null_;

format old var new varA $255. ;

%let I=1;

%do %until ( %qscan (&varamtu,&I, %str ( ))= %str ());

%let varb&I= %qscan (&varamtu,&I, %str ( ));

%if %eval (&i) lt 9 %then %let znum = "0"; %else %let znum= %str () ;

num= %eval (&i);

varA= strip( 'A' ||strip(&znum)||strip(num)|| '_' ||strip( "&&varb&i." ));

old =  trim(old)|| ' ' ||trim(varA);

var= strip(strip( "&&varb&i." )|| '_' ||strip( "&run" ));

new = trim(new)|| ' ' ||trim(var);

%let I= %eval (&I+1);

%end ;

%let cnt= %eval (&I-1);

%if &covars_amt= %str () %then %let cnt=1;

call symput( 'old' ,old);

call symput( 'new' ,new);

run;

This data step defines macro variables that will be used in the next step of the macro.

As before, this code recreates the way that the MIXTRAN macro defines the parameter names, and makes a list of parameter names that are stored in the _param_&foodtype (called &old).  It also creates a list of the intercept and the other variables in the model (called &var).

data parmsbrr_amt;

set & outlib.._ param_&foodtype;

array old (&cnt) &old;

array new (&cnt) &new;

do k= 1 to dim(new);

new[k]=old[k];

end;

keep &new;

run;

The dataset  _param_&foodtype is from the MIXTRAN macro.  This data step sets the dataset _param_&foodtype and renames the amount parameters to their variable names with the run number.

data parms_amt;

merge parms_amt parmsbrr_amt;

run;

The point estimates of the parameters are merged with the BRR runs for the amount variables.

proc datasets nolist; delete parmsbrr_amt;

After merging, the information parmsbrr_amt can be deleted.

data _null_;

format oldpr varpr newpr varP $255. ;

%let I=1;

%do %until ( %qscan (&varprobu,&I, %str ( ))= %str ());

%let varp&I= %qscan (&varprobu,&I, %str ( ));

%if %eval (&i) lt 9 %then %let znum = "0"; %else %let znum= %str () ;

num= %eval (&i);

varP= strip( 'P' ||strip(&znum)||strip(num)|| '_' ||strip( "&&varp&i." ));

oldpr =  trim(oldpr)|| ' ' ||trim(varP);

varpr = strip(strip( "&&varp&i." )|| '_' ||strip( "&run" ));

newpr = trim(newpr)|| ' ' ||trim(varpr);

%let I= %eval (&I+1);

%end ;

%let cntp= %eval (&I-1);

%if &covars_amt= %str () %then %let cntp=1;

call symput( 'oldpr' ,oldpr);

call symput( 'newpr' ,newpr);

run;

The dataset  _param_&foodtype is from the MIXTRAN macro.  This data step sets the dataset _param_&foodtype and renames the probability parameters to their variable names.

data parmsbrr_prob;

set & outlib.._ param_&foodtype;

array old (&cntp) &oldpr;

array new (&cntp) &newpr;

do k= 1 to dim(new);

new[k]=old[k];

end;

keep &newpr;

run;

The dataset  _param_&foodtype is from the MIXTRAN macro.  This data step sets the dataset _param_&foodtype and renames the probability parameters to their variable names with the run number.

data parms_prob;

merge parms_prob parmsbrr_prob;

run;

 

The point estimates of the parameters are merged with the BRR runs for the probability variables.

proc datasets nolist; delete parmsbrr_prob;

 

After merging, the information parmsbrr_prob can be deleted.

%end ;

The end of the BRR runs.

%let I=1;

  %do %until ( %qscan (&varamtu,&I, %str ( ))= %str ());

    %let varb&I= %qscan (&varamtu,&I, %str ( ));

This code starts a loop where the following code is evaluated for the intercept and the other variables in the amount model one at a time until all variables are evaluated.

data _null_;

format var call $255. ;

  set parms;

  call= "" ;

   %do r= 1 %to 16 ;

   var = strip(strip( "&&varb&i." )|| '_' ||strip( "&r" ));

   call = strip(strip(call)|| ' ' ||strip(var));

   %end ;

  call symput ( 'call' ,call);

run;

 

This code creates a macro variable with the BRR run number appended to the variable name.

data brr_amt;

format variable $32. ;

set parms_amt;

 array reps ( 16 ) &call;

   do m= 1 to 16 ;

    reps[m] = reps[m] - &&varb&i;

   end;

estimate=&&varb&i;

brrse=sqrt(uss(of &call)/( 16 * .49 ));

variable= "&&varb&i" ;

type= 'AMOUNT' ;

keep variable estimate brrse type;

run;

 

For the 16 BRR runs, the value of the point estimate is subtracted from the estimate of the parameter from the BRR run.  The standard error is calculated.

proc append base=amts data=brr_amt;

The datasets for each variable is appended to the dataset amts.

proc datasets nolist; delete brr_amt; run;

The dataset brr_amt is deleted.

%let I= %eval (&I+1);

%end ;

The variable I is incremented, and the end of the variable loop is defined.

%let I=1;

  %do %until ( %qscan (&varprobu,&I, %str ( ))= %str ());

    %let varp&I= %qscan (&varprobu,&I, %str ( ));

This code starts a loop where the following code is evaluated for the intercept and the other variables in the probability model one at a time until all variables are evaluated.

data _null_;

format var callp $255. ;

  set parms_prob;

  callp= "" ;

   %do r= 1 %to 16 ;

   var = strip(strip( "&&varp&i." )|| '_' ||strip( "&r" ));

   callp = strip(strip(callp)|| ' ' ||strip(var));

   %end ;

  call symput ( 'callp' ,callp);

run;

This code creates a macro variable with the BRR run number appended to the variable name.

data brr_prob;

  format variable $32. ;

  set parms_prob;

     array reps ( 16 ) &callp;

     do m= 1 to 16 ;

      reps[m] = reps[m] - &&varp&i;

     end;

     estimate=&&varp&i;

     brrse=sqrt(uss(of &callp)/( 16 * .49 ));

     variable= "&&varp&i" ;

     type= 'PROB' ;

     keep variable estimate brrse type;

run;

For the 16 BRR runs, the value of the point estimate is subtracted from the estimate of the parameter from the BRR run.  The standard error is calculated.

proc append base=probs data=brr_prob

 

The datasets for each variable is appended to the dataset probs.

p roc datasets nolist; delete brr_prob; run;

The dataset brr_prob is deleted.

%let I= %eval (&I+1);

%end ;

The variable I is incremented, and the end of the variable loop is defined.

data brr;

format type $6. ;

set probs amts;

run;

The probability and amount datasets are appended.

proc print; var variable estimate brrse t pvalue; run;

The final dataset is printed.

proc datasets nolist; delete parms; run;

The dataset parms is deleted.

%mend BRR192;

The end of the BRR192 macro is indicated.

 

 

Step 4: Run the BRR192 macro to obtain parameter estimates for the covariates of interest from the model used in the NCI method

Use the BRR192 macro to obtain parameter estimates.  Once the macro has been run, it is possible to call the macro multiple times, varying the values of the parameters each time. For example, the variables of interest could be changed.  This merely requires calling the macro again (using a call similar to that below), not redefining the macro each time.

 

Run the BRR191 Macro

Statements Explanation

%BRR192(data=milk, response=d_milk, foodtype=milk, subject=seqn, repeat=day, covars_amt=ridageyr eth1 eth2 eth3 eth4, covars_prob=eth1 eth2 eth3 eth4, outlib=work, modeltype=corr, titles= 1 ,printlevel= 2 ,final=nh.m19task2)  

 

This code calls the BRR192 macro.  The dataset milk defined in Step 1 is used; the macro variable response for which you want to model the distribution is d_milk.  The macro variable foodtype is used to label the param dataset.  The variable seqn identifies the subject, and the macro variable repeat defines the variable that identifies the repeats on the subject, which is day.  The covariates ridageyr eth1 eth2 eth3 eth4 are included in the amount part of the model, and the covariates eth1 eth2 eth3 eth4 are included in the probability part of the model.

The macro variable outlib specifies the library where the data are to be stored.  In this case, the working directory, work, was used.

Because this is a food model, modeltype=corr is specified.  This fits the two-part model with correlated random effects.

The macro variable titles saves 1 line for a title supplied by the user.  The printlevel is 2, which prints the output from the NLMIXED runs and the summary.

The variable final specifies the name of the final dataset produced.

 

Step 5: Interpret parameter estimates for the covariates of interest

 

close window icon Close Window to return to module page.