## Task 2b: How to Use SAS 9.2 Survey Procedures to Perform Linear Regression

In this example, you will assess the association between high density lipoprotein (HDL) cholesterol and selected covariates in NHANES 1999-2002. These covariates include gender (riagendr), race/ethnicity (ridreth1), age (ridageyr), body mass index (bmxbmi), smoking (smoker, derived from SMQ020 and SMQ040; smoker =1 if non-smoker, 2 if past smoker and 3 if current smoker) and education (dmdeduc).

### Step 1: Create Variable to Subset Population

In order to subset the data in SAS Survey Procedures, you will need to create a variable for the population of interest. You should not use a where clause or by-group processing in order to analyze a subpopulation with the SAS Survey Procedures.

In this example, restrict the analysis to individuals with complete data for all the variables used in the final multiple regression model.  Then this variable is used in the domain statement to specify the population of interest.

if (LBDHDL^=. and RIAGENDR^=. and  RIDRETH1^=. and SMOKER^=. and DMDEDUC^=. and BMXBMI^=.) and WTMEC4YR>0 and (RIDAGEYR>=20)

then ELIGIBLE=1;   else ELIGIBLE=2;

### Step 2: Recode Discrete Variables

To change the reference level for a discrete variable, recode the variable so that the desired reference category has the highest level.

The variable riagendr was recoded to make men the reference category. The name of the recoded variable is sex.

If  RIAGENDR EQ 1 then SEX=2;

Else if RIAGENDR EQ 2 THEN SEX=1;

The variable ridreth1 was recoded to make non-Hispanic Whites the reference group. The recoded variable is ethn.

ETHN= RIDRETH1;

If RIDRETH1 eq 3 then ETHN=5;

Else if RIDRETH1 eq 4 then ETHN=2;

Else if RIDRETH1 eq 2 then ETHN=3;

Else if RIDRETH1 eq  3 then ETHN=4;

The variable bmicat was recoded to make normal weight the the reference group. The recoded variable is bmicatf.

if 0 le BMXBMI lt 18.5 then BMICATF=1;

else if 18.5 le BMXBMI lt 25 then BMICATF=4;

else if 25 le BMXBMI lt 30 then BMICATF=2;

else if BMXBMI ge 30 then BMICATF=3;

### Step 3: Set up SAS Survey Procedures for Simple Linear Regression

The dependent variable should be a continuous variable and will always appear on the left hand side of the equation. The variables on the right hand side of the equation are the independent variables and may be discrete or continuous.

When interactions are included in the model, they are denoted with an asterisk, *, between the two variables. An interaction can occur between a discrete and a continuous variable, or between two discrete variables. An interaction term always will always appear on the right hand side of an equation.

The summary table below provides steps for performing linear regression analyses using SAS Survey procedures. IMPORTANT NOTE

These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.

Option 1. Use SAS Survey Procedures for Simple Linear Regression
Statements Explanation
PROC SURVEYREG DATA=analysis_data nomcar;

Use the SAS Survey procedure, proc surveyreg, to calculate significance. Use the nomcar option to read all observations.

STRATA sdmvstra;

Use the strata statement to specify the strata (sdmvstra) and account for design effects of stratification.

CLUSTER sdmvpsu;

Use the cluster statement to specify PSU (sdmvpsu) to account for design effects of clustering.

WEIGHT wtmec4yr;

Use the weight statement to account for the unequal probability of sampling and non-response.  In this example, the MEC weight for 4 years of data (wtmec4yr) is used.

DOMAIN eligible;

Use the domain statement to restrict the analysis to individuals with complete data for all the variables used in the final multiple regression model. WARNING

When using proc surveyreg, use a domain statement to select the population of interest. Do not use a where or by-group statement to analyze subpopulations with the SAS Survey Procedures.

MODEL lbdhdl= bmxbmi/CLPARM VADJUST=none;

Use a model statement to specify the dependent variable for HDL cholesterol (lbdhdl) as a function of the independent variable (BMI category). Body mass index (bmxbmi) is treated as continuous variable. The clparm option requests confidence limits for the parameters. The vadjust option specifies whether or not to use variance adjustment. This model will show the relationship between a unit increase in BMI and cholesterol level.

TITLE 'Linear regression model for high density lipoprotein and selected covariates: NHANES 1999-2002'
;

Use the title statement to label the output.

Option 2. Use SAS Survey Procedures for Simple Linear Regression with BMI Categorical Variable
Statements Explanation

PROC SURVEYREG
DATA analysis_data nomcar;
STRATA sdmvstra;
CLUSTER sdmvpsu;
WEIGHT wtmec4yr;
DOMAIN eligible;
MODEL lbdhdl= bmicat/CLPARM vadjust=none;
TITLE 'Linear regression model for high density lipoprotein and body mass index: NHANES 1999-2002'
;

Use the proc surveyreg procedure to perform linear regressions Use the nomcar option to read all observations. This model will show the relationship between each unit increase in BMI category and cholesterol level.

Option 3. Use SAS Survey Procedures for Simple Linear Regression with BMI Categorical Variable with Reference Level
Statements Explanation

PROC SURVEYREG
DATA analysis_data nomcar;
STRATA sdmvstra;
CLUSTER sdmvpsu;
WEIGHT wtmec4yr;
CLASS bmicatf;
DOMAIN eligible;
MODEL lbdhdl= bmicatf/CLPARM vadjust=none;
TITLE 'Linear regression model for high density lipoprotein and body mass index: NHANES 1999-2002' ;

Use the proc surveyreg procedure to perform linear regression. Use the nomcar option to read all observations. This model uses the normal BMI category as a reference category for cholesterol level.

Use the class statement to denote the discrete variables included in the model; all other variables are treated as continuous. In this example, bmicatf is treated as a discrete variable.

Highlights from the output include:

• The results from the first model indicate that for each 1 unit increase of BMI, on average, HDL decreases by 0.69 mg/dl.
• The results from the second model indicate that, on average, HDL levels decrease by 5.6 mg/dl between the underweight BMI category and the normal weight BMI category, or the normal weight  BMI category to the overweight BMI category.
• The results from the third model indicate that the relationship is not linear and the difference in HDL is between underweight and normal is 3.2 compared to a 7.5 difference between normal weight and overweight.

### Step 4: Set Up SAS Survey Procedures for Multiple Linear IMPORTANT NOTE

These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.

#### Use SAS Survey Procedures for Multiple Linear Regression

Statements Explanation
PROC SURVEYREG DATA=analysis_data nomcar;

Use the SAS Survey procedure, proc surveyreg, to calculate significance. Use the nomcar option to read all observations.

STRATA sdmvstra;

Use the strata statement to specify the strata (sdmvstra) and account for design effects of stratification.

CLUSTER sdmvpsu;

Use the cluster statement to specify PSU (sdmvpsu) to account for design effects of clustering.

WEIGHT wtmec4yr;

Use the weight statement to account for the unequal probability of sampling and non-response.  In this example, the MEC weight for 4 years of data (wtmec4yr) is used.

CLASS sex ethn smoker dmdeduc bmicatf;

Use the class statement to denote the discrete variables included in the model; all other variables are treated as continuous. In this example sex, ethnicity (ethn), smoking status (smoker), education (dmdeduc), and BMI (bmicatf) are treated as discrete variables.

DOMAIN eligible;

Use the domain statement to restrict the analysis to individuals with complete data for all the variables used in the final multiple regression model. WARNING

When using proc surveyreg, use a domain statement to select the population of interest. Do not use a where or by-group statement to analyze subpopulations with the SAS Survey Procedures.

MODEL lbdhdl= sex ethn ridageyr dmdeduc smoker bmicatf/CLPARM vadjust=none;

Use a model statement to specify the dependent variable for HDL cholesterol (lbdhdl) as a function of the independent variable (BMI category). The clparm option requests confidence limits for the parameters. The vadjust option specifies whether or not to use variance adjustment. This model will show the relationship between BMI category and cholesterol level.

ESTIMATE 'Never vs past smoker' smoker 1 - 1 0 ;

Use the estimate statement to test for differences in HDL cholesterol between non-smokers and past smokers.

TITLE 'Linear regression model for high density lipoprotein and selected covariates: NHANES 1999-2002'
;

Use the title statement to label the output. WARNING

SAS Survey Procedures proc surveyreg prints the Wald statistic and its p-value. It does not produce the Satterthwaite chi square or the Satterthwaite F statistics and their corresponding p-values. For these reasons, we recommend that you use proc regress in SUDAAN for multiple linear regression.

### Step 5: Review Output and Highlights of the Results

In this step, the SAS Survey procedures output is reviewed.

• HDL cholesterol is 6.55 mg/dL higher for overweight adults compared to normal weight adults, as defined by BMI.
• HDL cholesterol is 12.00 mg/dL higher for obese adults compared to normal weight adults, as defined by BMI.
• HDL cholesterol is 2.30 mg/dL lower for underweight adults compared to normal weight adults, as defined by BMI.
• HDL cholesterol is 9.98 mg/dL higher for women than for men, after adjusting for all other variables in the model.
• HDL cholesterol is 4.95 mg/dL higher for non-Hispanic Blacks compared to non-Hispanic Whites, after adjusting for all other variables in the model.
• HDL cholesterol increases 0.11 mg/dL per unit increase in age. Close Window