Data and Documentation

GYTS uses a two-stage sample cluster design. To reflect the complex sample design, there are two sample design variables on your data set named STRATUM and PSU (acronym for Primary Sampling Unit).

STRATUM

The variable STRATUM usually consists of two schools that are paired so that both schools have similar enrollment sizes. However, sometimes a STRATUM may have only one school. For example, if a school has a 100% chance of being in the selected school list (due to large enrollment) it will be the only school in that stratum, and we call this type of school a Certainty School.

PSU

In most cases, the Primary Sampling Unit (PSU) represents a school. If the school is a Certainty School then the PSUs are the classes within the school.

The sampling weight variable is named FINALWGT.

Each student in the data set is assigned a sampling weight, which accounts for the following:

Selection probability of the school
Selection probability of the class
Distribution of the population by grade and sex
Non-responding schools
Non-responding students
Non-responding classes

Point estimates and 95% confidence intervals can be calculated using several software packages for statistical analysis of correlated data. Below are sample codes for EPIINFO, SUDAAN and STATA.

EPIINFO Sample Code:

SUDAAN Sample Code:

< Previous Section: Glossary

Next Section: Using the GTSSData Homepage >