## Task 2b: How to Use SAS 9.1 Survey Procedures to Perform Linear Regression

In this example, you will assess the association between high density lipoprotein (HDL) cholesterol and selected covariates in NHANES 1999-2002. These covariates include gender (riagendr), race/ethnicity (ridreth1), age (ridageyr), body mass index (bmxbmi), smoking (smoker, derived from SMQ020 and SMQ040; smoker =1 if non-smoker, 2 if past smoker and 3 if current smoker) and education (dmdeduc).

Because version 9.1 of SAS Survey Procedures does not have a domain statement for subpopulation analyses (a domain statement is being added to proc surveyreg in SAS v9.2), you will need to use a macro provided on the SAS website. Download the file, save it to your computer, and make sure to note the location, as you will use SAS code to refer to this file later.

### Step 2: Recode Discrete Variables

Unlike SUDAAN (See Task 1a: "How to Use SUDAAN to Perform Linear Regression"), there is no statement in the SAS Survey Procedure version 9.1 %sregsub macro to change the reference level for a discrete variable.  Therefore, to change the reference category, recode the variable so that the desired reference category has the highest level.

The variable riagendr was recoded to make men the reference category. The name of the recoded variable is sex.

If  RIAGENDR EQ 1 then SEX=2;

Else if RIAGENDR EQ 2 THEN SEX=1;

The variable ridreth1 was recoded to make non-Hispanic Whites the reference group. The recoded variable is ethn.

ETHN= RIDRETH1;

If RIDRETH1 eq 3 then ETHN=5;

Else if RIDRETH1 eq 4 then ETHN=2;

Else if RIDRETH1 eq 2 then ETHN=3;

Else if RIDRETH1 eq  3 then ETHN=4;

The variable bmicat was recoded to make normal weight the the reference group. The recoded variable is bmicatf.

if 0 le BMXBMI lt 18.5 then BMICATF=1;

else if 18.5 le BMXBMI lt 25 then BMICATF=4;

else if 25 le BMXBMI lt 30 then BMICATF=2;

else if BMXBMI ge 30 then BMICATF=3;

### Step 3: Set up SAS Survey Procedures Macro for Simple Linear Regression

The dependent variable should be a continuous variable and will always appear on the left hand side of the equation. The variables on the right hand side of the equation are the independent variables and may be discrete or continuous.

When interactions are included in the model, they are denoted with an asterisk, *, between the two variables. An interaction can occur between a discrete and a continuous variable, or between two discrete variables. An interaction term always will always appear on the right hand side of an equation.

The summary table below provides steps for performing linear regression analyses using SAS Survey procedures.

 These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.

#### Option 1. Use SAS %sregsub Macro for Simple Linear Regression

Statements Explanation
%include 'C:\NHANES\fusion24985_1_sregsub_sas.txt';

Use the %include function to include the macro text file. In this example, the file is named sample00483_1_sregsub.sas.txt and is saved in the C:\NHANES\ directory.

%SREGSUB(

This statement names and opens the macro, %sregsub.

DATA=    analysis_data,

Use the data statement to call in the dataset (analysis_data).

STRATA=  sdmvstra,

Use the strata statement to specify the strata (sdmvstra) and account for design effects of stratification.

CLUSTER= sdmvpsu,

Use the cluster statement to specify PSU (sdmvpsu) to account for design effects of clustering.

WEIGHT=  wtmec4yr,

Use the weight statement to account for the unequal probability of sampling and non-response.  In this example, the MEC weight for 4 years of data (wtmec4yr) is used.

MODEL= lbdhdl= bmxbmi/CLPARM,

Use a model statement to specify the dependent variable for HDL cholesterol (lbdhdl) as a function of the independent variable (BMI category). Body mass index (bmxbmi) is treated as continuous variable. The clparm option requests confidence limits for the parameters which are not automatically provided by the SAS %sregsub macro. This model will show the relationship between a unit increase in BMI and cholesterol level.

SUBPOP= eligible=1,

Use the subpop=eligible=1 statement to restrict the analysis to individuals with complete data for all the variables used in the final multiple regression model.

Because only those 20 years and older are of interest in this example, use the subpop survey procedure to select this subgroup. Please note that for accurate estimates, it is preferable to use subpop in SAS Survey Procedures to select a subgroup for analysis, rather than select the study subgroup in the SAS program while preparing the data file.

TITLE= 'Linear regression model for high density lipoprotein and selected covariates: NHANES 1999-2002'
);

Use the title statement to label the output.

#### Option 2. Use SAS %sregsub Macro for Simple Linear Regression with BMI Categorical Variable

Statements Explanation
%SREGSUB(
DATA= analysis_data,
STRATA= sdmvstra,
CLUSTER= sdmvpsu,
WEIGHT= wtmec4yr,
MODEL= lbdhdl= bmicat/CLPARM,
SUBPOP= eligible=
1,
TITLE=
'Linear regression model for high density lipoprotein and body mass index: NHANES 1999-2002'
);

Use the SAS Survey macro, %sregsub to run multiple regression. This model will show the relationship between each unit increase in BMI category and cholesterol level.

#### Option 2. Use SAS %sregsub Macro for Simple Linear Regression with BMI Categorical Variable with Reference Level

Statements Explanation
%SREGSUB(
DATA= analysis_data,
STRATA= sdmvstra,
CLUSTER= sdmvpsu,
WEIGHT= wtmec4yr,
CLASS= bmicatf,
MODEL= lbdhdl= bmicatf/CLPARM,
SUBPOP= eligible=
1,
TITLE=
'Linear regression model for high density lipoprotein and body mass index: NHANES 1999-2002'
);

Use the SAS Survey macro, %sregsub, to run multiple regression. This model uses the normal BMI category as a reference category for cholesterol level.

Use the class statement to denote the discrete variables included in the model; all other variables are treated as continuous. In this example, bmicatf is treated as a discrete variable.

Highlights from the output include:

• The results from the first model indicate that for each 1 unit increase of BMI, on average, HDL decreases by 0.69 mg/dl.
• The results from the second model indicate that, on average, HDL levels decrease by 5.6 mg/dl between the underweight BMI category and the normal weight BMI category, or the normal weight  BMI category to the overweight BMI category.
• The results from the third model indicate that the relationship is not linear and the difference in HDL is between underweight and normal is 3.2 compared to a 7.5 difference between normal weight and overweight.

### Step 4: Set up SAS Survey Procedures Macro for Multiple Linear Regression

 These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.

#### Use SAS %sregsub Macro for Multiple Linear Regression

Statements Explanation
%include 'C:\NHANES\fusion24985_1_sregsub_sas.txt'';

Use the %include function to include the macro text file. In this example, the file is named sample00483_1_sregsub.sas.txt and is saved in the C:\NHANES\ directory.

%SREGSUB(

This statement names and opens the macro, %sregsub.

DATA=    analysis_data,

Use the data statement to call in the dataset (analysis_data).

STRATA=  sdmvstra,

Use the strata statement to specify the strata (sdmvstra) and account for design effects of stratification.

CLUSTER= sdmvpsu,

Use the cluster statement to specify PSU (sdmvpsu) to account for design effects of clustering.

WEIGHT=  wtmec4yr,

Use the weight statement to account for the unequal probability of sampling and non-response.  In this example, the MEC weight for 4 years of data (wtmec4yr) is used.

CLASS= sex ethn smoker dmdeduc bmicatf,

Use the class statement to denote the discrete variables included in the model; all other variables are treated as continuous. In this example sex, ethnicity (ethn), smoking status (smoker), education (dmdeduc), and BMI (bmicatf) are treated as discrete variables.

MODEL= lbdhdl= sex ethn ridageyr dmdeduc smoker bmicatf/CLPARM,

Use a model statement to specify the dependent variable for HDL cholesterol (lbdhdl) as a function of the independent variable (BMI category). The clparm option requests confidence limits for the parameters which are not automatically provided by the SAS %sregsub macro. This model will show the relationship between BMI category and cholesterol level.

ESTIMATE = 'Never vs past smoker' smoker 1 -1 0,

Use the estimate statement to test for differences in HDL cholesterol between non-smokers and past smokers.

SUBPOP= eligible=1,

Use the subpop=eligible=1 statement to restrict the analysis to individuals with complete data for all the variables used in the final multiple regression model.

Because only those 20 years and older are of interest in this example, use the subpop survey procedure to select this subgroup. Please note that for accurate estimates, it is preferable to use subpop in SAS Survey Procedures to select a subgroup for analysis, rather than select the study subgroup in the SAS program while preparing the data file.

TITLE= 'Linear regression model for high density lipoprotein and selected covariates: NHANES 1999-2002'
);

Use the title statement to label the output.

 SAS Survey Procedures %sregsub macro prints the Wald statistic and its p-value. It does not produce the Satterthwaite chi square or the Satterthwaite F statistics and their corresponding p-values. Additionally, SAS Survey Procedures %sregsub macro does not have a least squared means statement so you will not be able to obtain the means and their standards errors from this macro. For these reasons, we recommend that you use proc regress in SUDAAN for multiple linear regression.

### Step 5: Review Output and Highlights of the Results

In this step, the SAS Survey procedures output is reviewed.

• HDL cholesterol is 6.55 mg/dL higher for overweight adults compared to normal weight adults, as defined by BMI.
• HDL cholesterol is 12.00 mg/dL higher for obese adults compared to normal weight adults, as defined by BMI.
• HDL cholesterol is 2.30 mg/dL lower for underweight adults compared to normal weight adults, as defined by BMI.
• HDL cholesterol is 9.98 mg/dL higher for women than for men, after adjusting for all other variables in the model.
• HDL cholesterol is 4.95 mg/dL higher for non-Hispanic Blacks compared to non-Hispanic Whites, after adjusting for all other variables in the model.
• HDL cholesterol increases 0.11 mg/dL per unit increase in age.