Combining Data across NSFG File Releases from 2011-2019
This page is intended to provide information for users who wish to combine NSFG data from multiple 2-year file releases from 2011-2019 for their research. There are no public-use datasets specifically provided for this purpose. Please see below for technical guidance on using the specially designed case weights to create 4-, 6-, and 8-year NSFG files for analysis, and considerations for the use of these combined files.
For information on combinations that include data from 2006-2010 or older, refer to Appendix 2 from the 2013-2015 NSFG User’s Guide pdf icon[PDF – 279 KB].
Based on the continuous NSFG fieldwork period from September 2011 through September 2019, separate four-, six-, and eight-year combined-file case weights have been provided for all possible combinations that data users may wish to analyze. These specially prepared combined-file case weights are designed to represent population totals for men and women aged 15-44 (or 15-49) at the approximate midpoint of data collection over these fieldwork periods, as shown in the table below:
|File Combinations||Final post-stratified, fully adjusted case weight||Reference point for population totals|
|2011 – 2015||WGT2011_2015||July 2013|
|2013 – 2017||WGT2013_2017||July 2015|
|2015 – 2019||WGT2015_2019||July 2017|
|2011 – 2017||WGT2011_2017||July 2014|
|2013 – 2019||WGT2013_2019||July 2016|
|2011 – 2019||WGT2011_2019||July 2015|
There may be analyses that use a combined file where it is more appropriate to use the weights from each individual two-year data release instead of the prepared single case weight for the various combined-file combinations.
If your goal is to compare point estimates (such as percentages or means) between survey periods across your combined NSFG data span and not to estimate population sizes for the full span, use the individual 2-year file weights provided with each data release. For this approach, you will need to create a variable to indicate each of the 2-year survey periods, or their respective midpoints of 2012, 2014, 2016 or 2018.
If your goal is to estimate population sizes, as well as other point estimates, for the full span of your combined file, then use the combined-file weights, which treat the combined survey periods as a single survey period for purposes of statistical inference. Estimating population sizes for a combined file using the separate 2-year case weights will result in inflated, inaccurate population size estimates. The combined-file weights have been appropriately scaled and defined to represent the NSFG population of inference at the midpoint of the full span of years for each combined-file option.
While combining multiple 2-year NSFG data files may be beneficial for analyses of rare events that require a larger sample size, analysts should use caution in interpreting estimates based on data from the entire time period, as it may not be appropriate to estimate or interpret such “weighted averages” across broad spans of years. For example, if the estimates from separate 2-year data files vary significantly, the estimate derived from the combined data may be misleading. Also, variations seen across the separate 2-year data files may be due to changes in the outcome of interest and/or changes in the population composition over time (particularly with age, race, and other factors for which the weights have been adjusted or post-stratified).
Beginning in September 2015, the NSFG expanded its age range from 15-44 to 15-49. Users should note that when combining the 2015-2017 and 2017-2019 NSFG with older data files, respondents aged 45-49 will not have values on the four-year, six-year, and eight-year file weights. If your analysis requires use of NSFG data prior to 2015, you will be limited to the age range of 15-44.
The table below shows the total numbers of respondents with weight values corresponding to each 4-, 6-, and 8-year file combination of NSFG data:
Age range with valid value on weight
|Total respondents||Total male respondents||Total female respondents|
|2011 – 2015||15 -44||20,621||9,321||11,300|
|2013 – 2017||15 -44||19,095||8,505||10,590|
|2015 – 2019||15 -49||21,441||9,746||11,695|
|2011 – 2017||15 -44||29,511||13,320||16,191|
|2013 – 2019||15 -44||29,137||13,129||16,008|
|2011 – 2019||15 -44||39,533||17,944||21,609|
When selecting the variables for your analyses, you may wish to consult Appendixes 4b pdf icon[PDF – 471 KB] and 4c pdf icon[PDF – 101 KB], which provide crosswalks of comparable recodes across female and male data for 2011-2013, 2013-2015, 2015-2017, and 2017-2019. You may also find helpful the summary of questionnaire changes made since the 2015-2017 NSFG (Appendix 5) pdf icon[PDF – 150 KB].
Of note, when combining NSFG data from multiple 2-year file releases, SAS users may receive a warning about variable lengths differing for some variables. Please see Appendix 2 pdf icon[PDF – 146 KB] from the 2015-2017 NSFG User’s Guide for further explanation about what may trigger this warning and how to handle a particular case with the QUARTER variables when combining 2011-2013 data with later file releases.
Below are links for ASCII files—one for male respondents and one for female respondents—containing weights for all available 4-, 6-, and 8-year combined files for the period 2011-2019. The variable CASEID is included for merging these weights into the combined data file a user has created.
• SAS Program Set-up Statement for Female Weights File (2011_2019_FemaleWgtSetup.sas)
• SAS Program Set-up Statement for Male Weights File (2011_2019_MaleWgtSetup.sas)
• SPSS Program Set-up Statement for Female Weights File (2011_2019_FemaleWgtSetup.sps)
• SPSS Program Set-up Statement for Male Weights File (2011_2019_MaleWgtSetup.sps)
For additional information or questions about materials on this page, please contact the NSFG staff at firstname.lastname@example.org.