Chapter 10 covers the National Health and Nutrition Examination Survey (NHANES). The NHANES has been collected by the National Center for Health Statistics (NCHS) since 1960 to monitor and explore trends in the health status and nutritional status among all individuals in the United States (US).1 It became formally known as the NHANES in 1999. A unique aspect of the NHANES in comparison to other surveys is that the NHANES collects data from both subjective interviews and objective physical examinations and laboratory tests. Objective measures are collected at mobile examination centers.1 This chapter includes details on: how data are collected; how data are made publicly available as machine-actionable data files; what variables must be included to address design features of the complex sample; the strengths and limitations of the survey; and practical tips for conducting statistical analysis; and how to answer research questions using a case study. The practical tips provided for analysis of NHANES data are based on the author’s previous experiences analyzing NHANES data to answer questions related to associations between predisposing and enabling factors that contribute to health behaviors, morbidity, mortality and health services use. The NHANES case study will explore racial and ethnic differences in 24-hour movement guideline adherence, specifically sedentary behavior guideline adherence. This objective is part of a series of research questions designed to evaluate how physical activity, sleep, and sedentary behavior are associated with cognitive health outcomes among adults in the US. The bulk of the chapter will comprise of section 10.6: NHANES Case Study in order to give the readers hands-on practice downloading and cleaning large databases and conducting basic categorical data analysis using PROC SURVEYFREQ and PROC SURVEYLOGISTIC. The syntax provided was created for use with SAS 9.4 .
10.2 Data Collection
The NHANES uses a cross-sectional study design to collect data from personal interviews, physical examinations and laboratory data among noninstitutionalized adults and children. Data are compiled and released in 2-year cycles as public-use data files.2 From 2011-2014, the complex sample design included 13 major strata, 4 minor strata and 8 primary sampling units (PSU).2 The study design oversampled Hispanics, non-Hispanic Blacks and non-Hispanic Asians, persons below 130% of federal poverty level and persons ages 80 years and older. Data were collected from five US regions, with California separated as a distinct group. Starting in 2015, the design changed to 14 major strata, 4 minor strata and 4 PSUs.3 Individuals were oversampled below 185% federal poverty level instead of 130% and California was not separated from all other states in the new design.
10.3 Data Files
The most recent NHANES iterations (2017-2020) included demographic, dietary, medical examination, laboratory, questionnaire, and limited access data.4 Data were collected in participants’ households as well as mobile examination centers.
10.3.1 Demographic Data
The demographic data file includes individual details on the participants’ gender, age, marital status, language preference, race, and ethnicity.4 Questions on place of birth are included and participants who report that they do not live in the US are asked about their citizenship status and how long they have lived in the US. While the NHANES collected data on countries of birth outside of the US, details on the countries of birth are not available to the public. The demographic data file also includes questions related to socioeconomic status, including the highest level of education, income, and questions about military service. Pregnancy status is reported among women ages 20 to 44 years. Primary sampling unit, cluster, and weighting variables are located with the demographic data for the 2017-2020 pre-pandemic data files.4,5
10.3.2 Dietary Data
The NHANES dietary interviews are conducted to obtain dietary data on food and beverage intake 24-hours prior to the first interview.6 The dietary interviews are conducted in-person at mobile examination centers with follow-up phone calls. For participants less than six years old, dietary data are collected from a proxy adult. For children 6-8 and 9-11 years old, interviews are conducted with the child and a proxy adult. Child participants ages 12 and older completed interviews themselves. In-person interviews include several household items to be used for measuring food intake amounts. Participants were provided with these items to take home for the telephone interviews collected 3 to 10 days later.6 There are several data files that include variables from the dietary interviews. NHANES provides data files for first- and second-day individual foods and nutrients. There are also data files that include information on 24-hour and 30-day dietary supplements.6 The individual food data files include comprehensive responses on the types of food combinations eaten (e.g. cereal, soup, salad, tortilla products), its source (e.g. store – grocery/supermarket, restaurant with waiter/waitress, in K-12 school, or childcare center), and nutrients (e.g. total folic acid [mcg], potassium [mg], energy [kcal]).6
10.3.3. Examination Data
The NHANES examination files comprise of data from multiple procedures to measure the health of participants.7 All examinations are conducted in the mobile examination centers. Examples of the examinations conducted include measurements of audiometry, anthropometry, body measures, balance, blood pressure, cardiovascular fitness, dermatology, muscle strength and oral health among others.7
10.3.4 Laboratory Data
The NHANES laboratory files comprise of multiple tests conducted on the biological specimens of participants.8 All laboratory data are collected in the mobile examination centers and sent to a laboratory for testing. Laboratory tests are conducted using blood, urine and other biospecimens (e.g., hair, nasal swab, plasma).8 Examples of laboratory tests conducted include cholesterol, folate, glycohemoglobin, insulin, plasma fasting glucose, and mercury, among others.8
10.3.5 Questionnaire Data
The NHANES laboratory files include self-reported data on questions regarding the health and wellness of participants.9 Questionnaire data are collected in the household, mobile examination centers, and by telephone. Examples of questionnaire data collected range from socioeconomic status (occupation, income), acculturation, weight history, preventive health behaviors (e.g., immunizations, smoking, physical activity), health conditions (e.g., diabetes, kidney conditions, osteoporosis), food security and health care access and utilization.9
10.3.6 Limited Access Data
The NHANES limited access data files measure sensitive topics among youth and adults.10 Data from questionnaires as well as biospecimens are collected with special precautions put in place so ensure confidentiality. Examples of limited access data include biospecimens collected to measure lead in blood and sexually transmitted diseases (e.g., chlamydia, herpes simplex viruses, HIV antibodies) and questionnaire data on drug use, alcohol use, and sexual behaviors among youth and adults.10 Limited access data files are not available for public use. Researchers must apply for access through the National Center for Health Statistics’ Research Data Centers.10
10.3.7 Linked Data
The NHANES can be linked to National Death Index (NDI) and Medicare data.11 Efforts are underway to link NHANES data with Housing and Urban Development (HUD) and Medicaid data.11 The purpose of linking NHANES and NDI data is to examine how multiple risk factors related to health and nutrition are associated with mortality. Linked NHANES/NDI data can be accessed as public-use data files or through a restricted data application. Public-use linked data files are only provided for adults and do not provide specific dates for birth, interviews and death, or specific causes of death beyond standard categories. Linkages with Medicare data allow for research focused on health status, health care costs, health care utilization, and prescription drug use among Medicare enrollees.11 To access more specific details of the linked data, researchers must apply for access through the National Center for Health Statistics’ Research Data Centers.11
10.3.8 Restricted Data
The NHANES restricts data on geography (Census 2010 Block ID), genetics (e.g. BRCA1 associated protein), and the exact dates of participants’ interviews and examinations. To used these data, researchers must apply for access through the National Center for Health Statistics’ Research Data Centers.12
10.4. Strengths and Limitations
There are several advantages to using the NHANES for research. First, the ability to validate self-reported and objective measurements through personal interviews, and physical examinations is a strength. A second strength is the ability to determine undiagnosed diseases with laboratory values, such as diabetes mellitus. A third strength is the ability to use acculturation variables collected among Hispanic and non-Hispanic Asian participants. However, there are some limitations of using NHANES data. First, there is a smaller annual sample size compared to other national surveys such as the National Health Interview Survey (NHIS). Second, the place of birth questions are limited to US- or foreign-born only and no data is collected on country of birth, which limits the ability for data disaggregation among foreign-born groups. Third, the large number of subsection files requires multiple merges of data files for each survey year.
10.5 Design Features
Data analysts must use special procedures to account for the complex sample design used by the NHANES. Survey procedures much include variables to adjust for the primary sampling units, stratification, and weighting of each cycle of continuous data files. Changes were made to the complex design variables for the 2017-2020 pre-pandemic data files because there were two cycles (2017-2018 and 2019-2020) and the 2019-2020 data file was incomplete.5 For 2017-2020, the complex sample design variables are available in the “Demographic Variables and Sample Weight” file. Researchers must decide which weight to use based on the aims of their study using NHANES data. For research studies that only use data from the interviews, the interview weight is most appropriate. For research studies that use outcomes from the examination or laboratory data files, the mobile examination center weight is most appropriate. An overview of the primary sampling unit, stratum, and weighting variables for the 2017-2020 pre-pandemic data files are provided in Table 10.1.
Table 10.1. Overview of complex sample design variables for NHANES 2017-2020 pre-pandemic cycle
|Primary Sampling Unit
|Mobile Examination Center
10.6 NHANES Case Study
In 2018, the Canadian Society for Exercise Physiology convened to develop the Canadian 24-Hour Movement Guidelines for adults ages 18-64 years and ages 65 and older. The guidelines integrate recommendations for sleep, physical activity, and sedentary behavior with the rationale that the combination of these behaviors throughout the day is associated with improved health outcomes.13 Among adults ages 18-64 years old, it is recommended that individuals get 7 to 9 hours of good-quality sleep on a regular basis, with consistent bed and wake-up times. Adults are recommended to perform a variety of intensities and types of physical activity, including: 1) moderate to vigorous aerobic physical activities such that there is an accumulation of at least 150 minutes per week; 2) muscle strengthening activities using major muscle groups at least twice a week; and 3) several hours of light physical activities, including standing.13 It is recommended that adults limit sedentary behavior to 8 hours or less (~480 minutes), including no more than 3 hours of recreational screen time and breaking up long periods of sitting as often as possible.13 Recommendations differ slightly among older adults. Among adults ages 65 and older, it is recommended that individuals get 7 to 8 hours of good-quality sleep on a regular basis, with consistent bed and wake-up times. In addition to the physical activity recommendations for adults ages 18-65 years, older adults are recommended to engage in physical activities that challenge balance.13 There are no differences in recommendations for sedentary behavior among adults ages 18-64 years or 65 years and older. While exercise physiology and health organizations in the US set goals and standards for each of these health behaviors, there is no effort for integration of these movements to examine health disparities. In this case study, we will determine racial and ethnic differences in sedentary behavior among US- and foreign-born Hispanics, non-Hispanic Whites, non-Hispanic Blacks, and non-Hispanic Asians. Future research will incorporate physical activity and sleep behaviors.
10.6.1 Specific Aims
- Aim 10.1: Compare the prevalence of adherence to 24-hour sedentary behavior guidelines in US adults by race, ethnicity, and nativity status
- Aim 10.2: Determine associations between race, ethnicity, and nativity and sedentary guideline adherence among racially and ethnically diverse foreign-born adults compared to their US-born counterparts
Complete the following steps to download, clean, recode and analyze NHANES data to answer the specific aims.
Step 1: Download demographics, questionnaire, and examination datasets
Follow the steps below to download and store the necessary data files for the NHANES case study and import them into SAS 9.4.
- Create a folder in a permanent location to save your data files. I recommend creating a folder on the “C Drive” titled NHANES with a subfolder to identify the years (2017-2020). This will align with the examples in this textbook.
- Go to the NHANES 2017- March 2020 pre-pandemic data website
- Under “Data, Documentation, Codebooks,” click “Demographics Data.”
- Click on “P_DEMO Data [XPT – 3.4 MB]” under “Data File.” This file should automatically be downloaded and show up at the bottom of the browser or your “Downloads” folder. The file is a “SAS Xport Transport File” type. This data file can be opened in SAS and saved as a standard SAS file.
- Go back to the NHANES 2017- March 2020 pre-pandemic data website
- Under “Data, Documentation, Codebooks,” click “Questionnaire Data.”
- In the row for Physical Activity, click on “P_PAQ Data [XPT – 1.3 MB]” under “Data File.” This file should automatically be downloaded and show up at the bottom of the browser or your “Downloads” folder. The file is a “SAS Xport Transport File” type. This data file can be opened in SAS and saved as a standard SAS file
- Go back to the NHANES 2017- March 2020 pre-pandemic data website
- Under “Data, Documentation, Codebooks,” click “Examination Data.”
- In the row for Body Measures, click on “P_BMX Data [XPT – 2.4 MB]” under “Data File.” This file should automatically be downloaded and show up at the bottom of the browser or your “Downloads” folder. The file is a “SAS Xport Transport File” type. This data file can be opened in SAS and saved as a standard SAS file.
Step 2: Open SAS transport files in SAS and save to permanent datasets for merge
Double click to open each of the SAS Xport transport files downloaded and saved on the C Drive in SAS.
- There should be three temporary work files created once these files are opened. The file names should be:
- Open a new SAS syntax editor file by clicking “New” where the blank white page is located.
- Enter the syntax to merge the data files provided in Box 10.1. The variable used to identify each participant is SEQN.
Box 10.1. Sample SAS program to merge 2017-2020 Pre-Pandemic NHANES data files
Step 3: Keep only the variables that you need
Once the data files are merged, it is recommended to keep only the variables needed for the analysis. Removing additional variables will allow the SAS program to run and present results faster. In this case study, I have kept the following variables (Table 10.2) to denote the survey design features and creation of the independent variables, dependent variables, and selected covariates.
Table 10.2. Overview of variables used for NHANES case study
|Respondent sequence number
|Full sample interview weight
|Masked variance pseudo-PSU
|Masked variance pseudo-stratum
|Race/Hispanic origin w/ NH Asian
|Country of birth
|Minutes sedentary activity
|Age in years at screening
|Body Mass Index (kg/m^2)
A sample SAS program with a keep statement that includes only the variables needed is provided in Box 10.2.
Box 10.2. Sample SAS program to keep only variables needed for case study
Step 4: Recode and rename variables
Questionnaire and examination responses often need to be recoded or responses collapsed prior to conducting statistical analysis. For example, the NHANES has response options “7777=Refused” and “9999=Don’t know” for several questions. These responses are often removed and made “missing” prior to analysis. Furthermore, the numbers that represent certain values may need to be changed for easier interpretation of statistical analysis results. For example, NHANES has response options “1=Yes” and “2=No.” It is common practice to change “no” responses to 0, “0=No.” It is best practice to rename these recoded variables with a new variable name instead of replacing the original variable. Two or more variables may need to be combined in order to create the independent, dependent or other variables to answer study aims. In this case study, we will examine racial and ethnic differences in sedentary behavior by nativity status. Therefore, we will combine two variables for 1) race and ethnicity and 2) country of birth. An overview of the variables recoded and renamed for analysis in this case study is provided in Table 10.3.
Table 10.3 Overview of NHANES variables and recodes
|Race/Hispanic origin w/ non-Hispanic (NH) Asian
|Country of birth
|1=Born in 50 US states/DC
|Minutes sedentary behavior
|0-1320 Range of values
|Age in years at screening
|Body Mass Index (kg/m^2)
|11.9 to 92.3=Range of values
|1=Healthy or underweight
(BMI>=25 and BMI<=29.9)
A sample SAS program for recoding and renaming NHANES data for this case study is provided in Box 10.3. All recodes are available in the full syntax file provided on the course website.
Box 10.3. Sample SAS program to recode and rename NHANES variables
Step 5: Conduct Descriptive Statistical Analysis
Once all variables are recoded, collapsed, and renamed they can be used for statistical analysis. Statistical analysis should always start with descriptive analysis to describe the data source. Chi square analyses should be conducted to make comparisons between the independent variables, covariates, and dependent variables. It is important to remember that all analysis of NHANES data needs to be conducted with SAS survey procedures due to the complex sample design. Weighting (variable: WTINTPRP for questionnaire data), primary sampling unit (variable: SDMVPSU) and stratum (variable: SDMVSTRA) variables must be included in the programming statements.
A sample SAS program for conducting chi-square tests using 2017-2020 pre-pandemic NHANES data for this case study is provided in Box 10.4.
Box 10.4. Sample SAS program for running descriptive statistics (chi-square)
Step 6: Conduct Inferential Statistical Analysis
After calculating descriptive statistics, inferential statistical analysis can be conducted. Crude and multivariable logistic regression models can be calculated to determine associations between race, ethnicity, nativity status and sedentary guideline adherence. Crude or unadjusted logistic regression models are used to determine the association between the independent and dependent variables without adjusting for other factors. Multivariable or adjusted logistic regression models are used to determine associations between the independent and dependent variables after adjusting for potential covariates (e.g. age, gender, BMI). A reference category for the independent variable in needed. For this analysis, the reference group for each racial and ethnic group will be those born in the US (e.g. US-born non-Hispanic Whites, US-born non-Hispanic Blacks). A sample SAS program for conducting logistic regression analysis using 2017-2020 pre-pandemic NHANES data for this case study is provided in Box 10.5.
Box 10.5. Sample SAS program for running inferential statistics (logistic regression)
This chapter provided an overview of the NHANES and ways to conduct basic statistical analysis using 2017-2020 pre-pandemic public-use data files. The NHANES case study explored differences in sedentary guideline adherence among US- and foreign-born adults by race and ethnicity. Sample SAS programming statements were provided for downloading and inputting data files, merging data files, recoding and renaming variables and conducting categorical descriptive and inferential statistical analysis. The dataset and full SAS programming statements for the NHANES case study are available in the chapter 10 folder in the Open ICPSR data repository.
- NHANES – About the National Health and Nutrition Examination Survey. Published January 8, 2020. Accessed November 3, 2021. https://www.cdc.gov/nchs/nhanes/about_nhanes.htm
- Chen TC, Parker JD, Clark J, Shin HC, Rammon JR, Burt VL. National Health and Nutrition Examination Survey: Estimation Procedures, 2011-2014. Vital Health Stat 2. 2018;(177):1-26.
- Chen TC, Clark J, Riddles MK, Mohadjer LK, Fakhouri THI. National Health and Nutrition Examination Survey, 2015-2018: Sample Design and Estimation Procedures. Vital Health Stat 2. 2020;(184):1-35.
- NHANES Questionnaires, Datasets, and Related Documentation. Accessed April 4, 2022. https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?Cycle=2017-2020
- Centers for Disease Control and Prevention. NHANES Analytic Guidance and Brief Overview for the 2017-March 2020 Pre-pandemic Data Files. https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/overviewbrief.aspx?Cycle=2017-2020
- NHANES 2017-March 2020 Pre-Pandemic Dietary Data. Accessed April 4, 2022. https://wwwn.cdc.gov/nchs/nhanes/search/DataPage.aspx?Component=Dietary&Cycle=2017-2020
- NHANES 2017-March 2020 Pre-Pandemic Examination Data. Accessed June 30, 2022. https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Examination&Cycle=2017-2020
- NHANES 2017-March 2020 Pre-Pandemic Laboratory Data. Accessed June 30, 2022. https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Laboratory&Cycle=2017-2020
- NHANES 2017-March 2020 Pre-Pandemic Questionnaire Data. Accessed June 30, 2022. https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Questionnaire&Cycle=2017-2020
- NHANES 2017-March 2020 Pre-Pandemic Limited Access Data. Accessed June 30, 2022. https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=LimitedAccess&Cycle=2017-2020
- NCHS Data Linkage -Activities. Published February 14, 2022. Accessed June 30, 2022. https://www.cdc.gov/nchs/data-linkage/index.htm
- RDC – Restricted Data – NHANES. Published August 27, 2021. Accessed June 23, 2022. https://www.cdc.gov/rdc/b1datatype/Dt1222.htm
- Ross R, Chaput JP, Giangregorio LM, et al. Canadian 24-Hour Movement Guidelines for Adults aged 18-64 years and Adults aged 65 years or older: an integration of physical activity, sedentary behaviour, and sleep. Appl Physiol Nutr Metab. 2020;45(10 (Suppl. 2)):S57-S102. doi:10.1139/apnm-2020-0467
- Paulose-Ram R, Graber JE, Woodwell D, Ahluwalia N. The National Health and Nutrition Examination Survey (NHANES), 2021-2022: Adapting Data Collection in a COVID-19 Environment. Am J Public Health. 2021;111(12):2149-2156. doi:10.2105/AJPH.2021.306517
10.9 COVID-19 Pandemic Changes
Sections 10.1 to 10.8 were written during the initial waves of the COVID-19 pandemic. Since the NHANES conducts in-person surveys at participants’ households and mobile examination units, there were significant disruptions to the regular methodology due to stay-at-home orders and safety concerns for both participants and the data collectors. All data collection procedures were halted in March 2020. Several changes were made to the design to: ensure safety of the staff and participants; reduce response burden by only collecting essential data; and provide additional COVID-19 specific content.14 Plans were made for data collection to begin in June 2021; however, data collection procedures continue to remain halted as of this writing. Full details of the changes are reported elsewhere.14