Data and Documentation

GYTS uses a two-stage sample cluster design. To reflect the complex sample design, there are two sample design variables on your data set named STRATUM and PSU (acronym for Primary Sampling Unit).

STRATUM

The variable STRATUM usually consists of two schools that are paired so that both schools have similar enrollment sizes. However, sometimes a STRATUM may have only one school. For example, if a school has a 100% chance of being in the selected school list (due to large enrollment) it will be the only school in that stratum, and we call this type of school a Certainty School.

PSU

In most cases, the Primary Sampling Unit (PSU) represents a school. If the school is a Certainty School then the PSUs are the classes within the school.

The sampling weight variable is named FINALWGT.

Each student in the data set is assigned a sampling weight, which accounts for the following:

  • Selection probability of the school
  • Selection probability of the class
  • Distribution of the population by grade and sex
  • Non-responding schools
  • Non-responding students
  • Non-responding classes

Point estimates and 95% confidence intervals can be calculated using several software packages for statistical analysis of correlated data. Below are sample codes for EPIINFO, SUDAAN and STATA.

EPIINFO Sample Code:

SUDAAN Sample Code: