The SAS procedure, proc univariate, generates descriptive and summary statistics that are useful in describing the characteristics of a distribution. These statistics can also be used to determine whether parametric (for a normal distribution) or nonparametric tests are appropriate to use in your analysis. As noted in the Clean & Recode Data module it is advisable to check for extreme weights and outliers before starting any analysis.
Use the SAS procedure, proc univariate, to generate descriptive statistics. The frequency distribution can be presented in table or graphic format. The freq option generates the frequency distribution in tabular form by listing the number of observations for each value of the variable. Due to the large sample size and the possibility of a long list of different values, it is not reasonable to request the freq option for variables that are not nominal or ordinal. The plot option generates the frequency distribution in graphic form (histogram, box, and normal probability plots), and the normal option generates statistics to test the normality of the distribution.
These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial. 
Statements  Explanation  
proc sort data=analysis_data; by riagendr age; run; 
Use the sort procedure to sort data by the same variables used in the by statement of the univariate procedure. In the example, data is sorted by gender (riagendr) and age (age).


PROC UNIVARIATE PLOT NORMAL;

Use the univariate procedure to generate descriptive statistics, which include number of missing values, mean, standard errors, percentiles, and extreme values. Use the plot option to generate histogram, box and normal probability plots, and the normal option to generate statistics to test normality. In this example, plots (plot) and normality test statistics (normal) are requested and the results will be sorted and generated separately for each combination of the variables on the by statement. 

where ridageyr >= 20; 
Use the where statement to select those 20 years and older. 

by riagendr age; 
The by statement determines the groups (all combinations of the variables defined by the var statement) that separate descriptive statistics will be produced. This statement should match the by statement in the sort procedure preceding it. 

VAR lbxtc; 
Use the var statement to indicate variable(s) for which descriptive measures are requested. In this example, the total cholesterol variable (lbxtc) is used. 

FREQ wtmec4yr; run; 
Use the freq option with the appropriate sample weight yields an estimate of the standard deviation whose denominator is the estimated population size. In this example, the 4year examination weight (wtmec4yr) is used.

The univariate procedure generates extensive descriptive statistics, including moments, percentiles, extremes, missing values, basic statistical measures, and tests for location. Below is a snapshot from the extensive output of the SAS program which shows the result of using the plot and normal options.
In some instances, you may not need all of the statistics generated by proc univariate. You can use proc univariate to select a few descriptive statistics and output the results to a SAS dataset to view.
These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial. 
Statements  Explanation  
proc sort data=analysis_data; by riagendr age; run; 
Use the sort procedure to sort data by the same variables that will be used in the by statement of the univariate procedure. In the example, the data are sorted by gender (riagendr) and age (age). 

PROC UNIVARIATE NOPRINT; 
Use the univariate procedure to generate descriptive statistics. Use the noprint option to suppress the detailed default descriptive statistics. 

where ridageyr >= 20; 
Use the where statement to select those 20 years and older. 

by riagendr age; 
The by statement determines the groups (all combinations of the variables defined by the var statement) that separate descriptive statistics will be produced. This statement should match the by statement in the sort procedure preceding it. 

VAR lbxtc; 
Use the var statement to indicate variable(s) for which descriptive measures are requested. In this example, the total cholesterol variable (lbxtc) is used. 

FREQ wtmec4yr;

Use the freq option with the appropriate sample weight yields an estimate of the standard deviation whose denominator is the estimated population size. In this example, the 4year examination weight (wtmec4yr) is used.


OUTPUT out=SASdataset mean=mean Q1=p_25 median=median Q3=p_75; run; 
Use output statement to print the results to the new SAS dataset, SASdataset, which will contain the statistics of interest. The requested statistics are labeled with the names given after the equal sign. In this example, the mean, 25^{th}, 50^{th}, and 75^{th} percentiles are requested. (For a complete list of statistics that can be requested see the proc univariate entry in SAS manual.) 

proc print DATA=SASdataset; run; 
Use proc print to view the results in the new SAS dataset, SASdataset. 
The output is sent to a SAS dataset, which is printed to view. See results below. Note that the new SAS dataset contains only the statistics requested on the output statement.