Output Policies and Procedures
The RDC output policy and procedures are implemented to protect the confidentiality of the data. Researcher generated output may be released if the risk of disclosure is low. The following procedures are designed to minimize disclosure risk:
- The creation of a data set is not appropriate output and will not be released.
- Absolutely NO output will leave the RDC facilities without first being reviewed by an RDC Analyst for possible disclosures of confidential information.
- Output submitted for review MUST match the research questions and the output described in the approved application. If the output described in your approved application is later determined to be intermediate output, then the output will not be released. Review and release of intermediate output is NOT allowed. Your output must be constrained to what is needed for a final research paperor journal article. See sections below for more details.
- RDC Analysts may apply Cell Suppression Criteria. Guidelines may differ by data system and possibly by survey year because of sample size, sample design, and content. Sometimes specific projects have additional disclosure risk and additional or more stringent cell suppression may be applied.
- Approved output is usually returned to the researcher via email three (3) weeks after the date of request. Voluminous output not intended for a standard journal article or presentation may take more than 3 weeks to review. Please plan ahead to allow this amount of time to receive your approved output.
- Although the output files are reviewed by RDC staff for potential disclosures, it is your responsibility to use the output and statistics in a way that will not pose any additional disclosure risks to study participants.
Intermediate Output
Intermediate output poses a disclosure risk. As a result, your output must be constrained to what is needed for a final research paper or journal article. Intermediate output can be created and used onsite at the RDC, but the RDC does not allow intermediate output to be released.
Examples of intermediate output include:
- tables of unweighted n’s,
- tables of preliminary descriptive statistics,
- large volumes of numbers or estimates,
- large volumes of initial and intermediate regression models,
- and large volume of tables based on different subsamples. *
*Similar tables based on different subsamples may cause complementary disclosure problems because comparison across tables could reveal information about the sample and individual characteristics.
Preparing Your Output for Review
The output you generate must match the output described in your approved application. If the output described in your approved application is later determined to be intermediate output, then the output will not be released. For review of your output, follow the steps outlined below and in the Disclosure Manual, otherwise, you may be asked to return to the RDC to redo your output or amend your application so that your output conforms to RDC policy and your approved application.
Before submitting output to your RDC Analyst for review, you must do the following:
- Make sure it is in a form that can be released by the RDC. Your output must be in a human-readable plain text file format (i.e., files that can be opened and are readable in Windows Notepad such as tab delimited text [.txt] files or comma-separated values [.csv]). Output in any other file format will not be accepted.
- You must populate the actual tables that will appear in your publication. They must match what was provided in your approved application.
- Remove any output that you feel could lead to the identification of an individual or institution. If you have questions, please ask your RDC Analyst.
- Output that has individual record level information is not permitted. Remove any individual level data from your output.
- Extreme values or values representing an individual must be removed. Examples include minima, maxima, medians, and modes. If a procedure, such as Proc Univariate creates extreme observations, 0, 1, 99, and 100 percentiles, those extreme values must also be removed.
- Recategorize variables where a category has a frequency less than 5. If you are unable to recategorize, then all cells with a frequency less than 5 should be asterisked before they are submitted to your RDC Analyst.
Submitting Output for Review
- Provide a description of the output. (This can be a title: a regression of…)
- Provide descriptions of any (sub)sample used in the analysis and output. (e.g., black males age 20-29)
- Send a request to your RDC Analysts that your output is ready for review.