Analyzing Surveys

The analysis of employee health self-assessment surveys can be as simple as comparing frequencies or means on the questions or more complicated analyses, such as using logistic regression to assess differences in the prevalence of particular health behaviors (e.g., tobacco use) by job type or between different groups of individuals (e.g., men and women). Some steps or considerations in this process include:

  • Create a data set from the survey responses (e.g., in Excel or possibly in a data analysis package such as SAS, SPSS, or STATA) – Create codes for each variable explored within the survey
  • Generate a Response Rate table – by comparison groups and overall. This will allow assessment of the representativeness of the survey sample by answering the following questions:
    • How many employees completed the survey?
    • What percentage of employees completed the survey?
    • Were employees in some units/categories better represented than others?  (See table below for an example.)

Response rate tables

Example: Response Rate Table

Response Rate Table
Job Type
Number of Survey Responses
Total Number of Employees in that Group
Response Rate
20 90 22.2%
60 100 60.0%
180 360 50.0%
Total Organization
330 550 60.0%

Looking at a response table will help take the accuracy of estimates into account. The lower the response rates, and the less representative the sample, the less precise the estimates and comparisons. For example, if the employee population is largely male, or mostly within a particular age range, then one would hope to receive survey responses from roughly the same percentages – if the employee population were 70% male, then receiving only 20% male responses would deliver a non-representative sample. In the above example, the respondents in the management job type category would not necessarily be representative of all managers since their response rate was low (22.2%). However, the overall response rate for the worksite was higher. Although response rates of 80% are considered high, such high response rates usually require the ability for personalized follow-up with non-respondents. For a survey instrument that is administered anonymously, a response rate of 50% seems to be a more reasonable expectation in a well-promoted survey effort.1

  • Define the variables clearly. For example, if there is an interest in alcohol consumption, is the team interested in knowing the number of drinks consumed (i.e., amount) or the number of times one drinks (i.e., frequency). What is the time period for recall (e.g., a week, a month)? If comparing the data to a statewide or national survey, using CDC accepted standards and definitions may be helpful for comparisons
  • Conduct meaningful comparisons to state or national estimates, and by gender, age, business unit/company division, or job type (e.g., management, clerical, manufacturing), for instance. For example, comparisons by age or job type may reveal a need to better target tobacco cessation activities at younger employees or employees in a manufacturing unit rather than a management unit because of higher prevalence in those demographic groups. For example, higher amounts of alcohol intake and cigarette smoking are often considered to be more common habits of those in younger age groups. It is important to take age into consideration when performing comparisons (e.g., comparing to state Behavioral Risk Factor Surveillance Survey [BRFSS] data for a similar age group)
    • Find an appropriate comparison data set with local, state, or national population samples (e.g., BRFSS; statewide surveys). Compare the age, gender and racial/ethnic make-up of such a sample to that of the organization to assess its appropriateness
    • Use meaningful cut-off points for such categories such as age (e.g., 18-29 years, 30-39 years). Match these cut-points to those in the comparison national/state datasets, or use categories that have meaning for the organization. In addition, if certain recommendations or guidelines impact specific age groups (e.g., it is recommended that men and women over 50 years old should begin receiving regular colorectal cancer screenings), it is important to group ages appropriately. For example, data captured on colorectal cancer screening for women age 45-55 years old would make it difficult to know which women were meeting the screening guideline since the data includes both women recommended for screening (those between the ages of 50 and 55 years) as well as younger women (those age 45-49 years old) who do not meet the screening guideline. This would make it challenging to use this data for policy and program decisions

The table below emphasizes the importance of breaking up the data, so trends and rates across given demographic variables, such as age can be examined. Overall 17.9% of workers reported binge drinking. Once this information is broken down by age, not only does the negative association between age and binge drinking appear, but an alarming one out of every three workers ages 18-29 binge drink is found.  Consequentially, interventions should be targeted to the younger workers.

Example: Alcohol Consumption Behavior by Age Group

Alcohol Consumption Behavior by Age Group
Age (in years)
Had 1 or More Alcoholic Drinks in Last 30 Days
82.9% 65.6% 63.1% 60.4% 75.4%
Engaged in Binge Drinking (5+ Drinks in One Night) in Last 30 Days
32.4% 18.0% 15.4% 10.4% 17.9%
  • Analyze and describe major findings by topic area, such as:
    • Health status
    • Use of preventive health services
    • Health behaviors such as tobacco use, diet and physical activity
    • Perceptions and use of existing organization generated health enhancing activities
  • Generate percentages and means (with confidence intervals or standard deviations) depending on the survey item and put them into tables for comparison purposes. For example, the percentage of respondents who consume more than 5 fruits/vegetables per day may be of interest, but the mean number of fruits and vegetables they report they eat (see table below) can also be examined
    • Software – use Excel (or similar spreadsheet programs) to generate descriptive information such as the means and/or frequencies

Example: Health Status Indicators as Percentages and/or Means by Job Type

Health Status Indicators as Percentages and/or Means by Job Type
Health Status Indicator
Job Type
Management Clerical Engineering
Consumes Fruits & Vegetables 5 or More Times/Day
3.3% 23.7% 7.4%
Mean (SD) Number of Times Per Day Consumed Fruits & Vegetables
2.23 (1.28) 3.52 (2.53) 2.93 (3.38)
  • Assess the substantive and/or statistical significance of the differences between the comparison groups. Does it matter substantively (e.g., in terms of the number of fruits and vegetables consumed per day)? The table above should show a statistically significant difference between two of the job types (e.g., clerical v. management)

Software – To assess the statistical significance of group differences formal tests, including possibly t-tests (i.e., difference in means between two groups), ANOVAs (i.e., differences in means among more than two groups), or logistic regression (i.e., differences in the odds or prevalence) are needed. Programs which will perform these analyses include SAS, SPSS, and STATA.

 Top of Page


1.  Groves, RM, Dillman DA, Eltinge JL and Little RJA. 2002. Survey nonresponse, John Wiley & Sons, New York.