In this example, you will assess the association between high density lipoprotein (HDL) cholesterol and selected covariates in NHANES 1999-2002. These covariates include gender (riagendr), race/ethnicity (ridreth1), age (ridageyr), body mass index (bmxbmi), smoking (smoker, derived from SMQ020 and SMQ040; smoker =1 if non-smoker, 2 if past smoker and 3 if current smoker) and education (dmdeduc).
Because version 9.1 of SAS Survey Procedures does not have a domain statement for subpopulation analyses (a domain statement is being added to proc surveyreg in SAS v9.2), you will need to use a macro provided on the SAS website. Download the file, save it to your computer, and make sure to note the location, as you will use SAS code to refer to this file later.
Unlike SUDAAN (See Task 1a: "How to Use SUDAAN to Perform Linear Regression"), there is no statement in the SAS Survey Procedure version 9.1 %sregsub macro to change the reference level for a discrete variable. Therefore, to change the reference category, recode the variable so that the desired reference category has the highest level.
The variable riagendr was recoded to make men the reference category. The name of the recoded variable is sex.
If RIAGENDR EQ 1 then SEX=2;
Else if RIAGENDR EQ 2 THEN SEX=1;
The variable ridreth1 was recoded to make non-Hispanic Whites the reference group. The recoded variable is ethn.
ETHN= RIDRETH1;
If RIDRETH1 eq 3 then ETHN=5;
Else if RIDRETH1 eq 4 then ETHN=2;
Else if RIDRETH1 eq 2 then ETHN=3;
Else if RIDRETH1 eq 3 then ETHN=4;
The variable bmicat was recoded to make normal weight the the reference group. The recoded variable is bmicatf.
if 0 le BMXBMI lt 18.5 then BMICATF=1;
else if 18.5 le BMXBMI lt 25 then BMICATF=4;
else if 25 le BMXBMI lt 30 then BMICATF=2;
else if BMXBMI ge 30 then BMICATF=3;
The dependent variable should be a continuous variable and will always appear on the left hand side of the equation. The variables on the right hand side of the equation are the independent variables and may be discrete or continuous.
When interactions are included in the model, they are denoted with an asterisk, *, between the two variables. An interaction can occur between a discrete and a continuous variable, or between two discrete variables. An interaction term always will always appear on the right hand side of an equation.
The summary table below provides steps for performing linear regression analyses using SAS Survey procedures.
These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial. |
Statements | Explanation |
---|---|
%include 'C:\NHANES\fusion24985_1_sregsub_sas.txt'; | Use the %include function to include the macro text file. In this example, the file is named sample00483_1_sregsub.sas.txt and is saved in the C:\NHANES\ directory. |
%SREGSUB( |
This statement names and opens the macro, %sregsub. |
DATA= analysis_data, |
Use the data statement to call in the dataset (analysis_data). |
STRATA= sdmvstra, |
Use the strata statement to specify the strata (sdmvstra) and account for design effects of stratification. |
CLUSTER= sdmvpsu, | Use the cluster statement to specify PSU (sdmvpsu) to account for design effects of clustering. |
WEIGHT= wtmec4yr, |
Use the weight statement to account for the unequal probability of sampling and non-response. In this example, the MEC weight for 4 years of data (wtmec4yr) is used. |
MODEL= lbdhdl= bmxbmi/CLPARM, |
Use a model statement to specify the dependent variable for HDL cholesterol (lbdhdl) as a function of the independent variable (BMI category). Body mass index (bmxbmi) is treated as continuous variable. The clparm option requests confidence limits for the parameters which are not automatically provided by the SAS %sregsub macro. This model will show the relationship between a unit increase in BMI and cholesterol level. |
SUBPOP= eligible=1, | Use the subpop=eligible=1 statement to restrict the analysis to individuals with complete data for all the variables used in the final multiple regression model. Because only those 20 years and older are of interest in this example, use the subpop survey procedure to select this subgroup. Please note that for accurate estimates, it is preferable to use subpop in SAS Survey Procedures to select a subgroup for analysis, rather than select the study subgroup in the SAS program while preparing the data file. |
TITLE=
'Linear regression model for high density lipoprotein and
selected covariates: NHANES 1999-2002' ); |
Use the title statement to label the output. |
Statements | Explanation |
---|---|
%SREGSUB( DATA= analysis_data, STRATA= sdmvstra, CLUSTER= sdmvpsu, WEIGHT= wtmec4yr, MODEL= lbdhdl= bmicat/CLPARM, SUBPOP= eligible=1, TITLE= 'Linear regression model for high density lipoprotein and body mass index: NHANES 1999-2002' ); |
Use the SAS Survey macro, %sregsub to run multiple regression. This model will show the relationship between each unit increase in BMI category and cholesterol level. |
Statements | Explanation |
---|---|
%SREGSUB( DATA= analysis_data, STRATA= sdmvstra, CLUSTER= sdmvpsu, WEIGHT= wtmec4yr, CLASS= bmicatf, MODEL= lbdhdl= bmicatf/CLPARM, SUBPOP= eligible=1, TITLE= 'Linear regression model for high density lipoprotein and body mass index: NHANES 1999-2002' ); |
Use the SAS Survey macro, %sregsub, to run multiple regression. This model uses the normal BMI category as a reference category for cholesterol level. Use the class statement to denote the discrete variables included in the model; all other variables are treated as continuous. In this example, bmicatf is treated as a discrete variable. |
Highlights from the output include:
These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial. |
Statements | Explanation |
---|---|
%include 'C:\NHANES\fusion24985_1_sregsub_sas.txt''; | Use the %include function to include the macro text file. In this example, the file is named sample00483_1_sregsub.sas.txt and is saved in the C:\NHANES\ directory. |
%SREGSUB( |
This statement names and opens the macro, %sregsub. |
DATA= analysis_data, |
Use the data statement to call in the dataset (analysis_data). |
STRATA= sdmvstra, |
Use the strata statement to specify the strata (sdmvstra) and account for design effects of stratification. |
CLUSTER= sdmvpsu, | Use the cluster statement to specify PSU (sdmvpsu) to account for design effects of clustering. |
WEIGHT= wtmec4yr, |
Use the weight statement to account for the unequal probability of sampling and non-response. In this example, the MEC weight for 4 years of data (wtmec4yr) is used. |
CLASS= sex ethn smoker dmdeduc bmicatf, | Use the class statement to denote the discrete variables included in the model; all other variables are treated as continuous. In this example sex, ethnicity (ethn), smoking status (smoker), education (dmdeduc), and BMI (bmicatf) are treated as discrete variables. |
MODEL= lbdhdl= sex ethn ridageyr dmdeduc smoker bmicatf/CLPARM, |
Use a model statement to specify the dependent variable for HDL cholesterol (lbdhdl) as a function of the independent variable (BMI category). The clparm option requests confidence limits for the parameters which are not automatically provided by the SAS %sregsub macro. This model will show the relationship between BMI category and cholesterol level. |
ESTIMATE = 'Never vs past smoker' smoker 1 -1 0, | Use the estimate statement to test for differences in HDL cholesterol between non-smokers and past smokers. |
SUBPOP= eligible=1, | Use the subpop=eligible=1 statement to restrict the analysis to individuals with complete data for all the variables used in the final multiple regression model. Because only those 20 years and older are of interest in this example, use the subpop survey procedure to select this subgroup. Please note that for accurate estimates, it is preferable to use subpop in SAS Survey Procedures to select a subgroup for analysis, rather than select the study subgroup in the SAS program while preparing the data file. |
TITLE=
'Linear regression model for high density lipoprotein and
selected covariates: NHANES 1999-2002' ); |
Use the title statement to label the output. |
SAS Survey Procedures %sregsub macro prints the Wald statistic and its p-value. It does not produce the Satterthwaite chi square or the Satterthwaite F statistics and their corresponding p-values. Additionally, SAS Survey Procedures %sregsub macro does not have a least squared means statement so you will not be able to obtain the means and their standards errors from this macro. For these reasons, we recommend that you use proc regress in SUDAAN for multiple linear regression. |
In this step, the SAS Survey procedures output is reviewed.