1 Introduction

1.1 Purpose

The purpose of this textbook is to train future public health professionals, specifically Master of Public Health (MPH) students, how to conduct basic applied data analysis using secondary data collected from national health surveys. This textbook helps to eliminate gaps in knowledge, skills and analytical abilities that may prohibit MPH graduates from being successful in entry-level public health practice and research-focused positions. A recent study of local health departments demonstrated that entry-level public health professionals lacked the knowledge, skills and abilities for data collection, database management, data cleaning, quantitative data analysis/statistics, and data analysis using SAS statistical software.1 Using publicly available data from national health surveys, this textbook will allow students to learn and practice data analytic skills with SAS statistical software to answer general surveillance and analytical research questions in preparation for their future public health practices.

1.2 Health Services Research Focus

The examples used in this textbook stem from previous studies and the current research laboratory focus of its primary author, Tiffany Kindratt, PhD, MPH. Established in Fall 2019, Dr. Kindratt’s Health Survey Research (HSR) Lab is housed in the Public Health Program, Department of Kinesiology, College of Nursing and Health Innovation at the University of Texas at Arlington. The goal of the HSR lab is to conduct epidemiologic research studies focused on evaluating predisposing and enabling factors that influence individuals’ health behaviors, morbidity, mortality and use of health services with big data methodologies. This includes the secondary analysis of large national health surveys that use complex samples, such as the Medical Expenditure Panel Survey (MEPS), National Health Interview Survey (NHIS), Health Information National Trends Survey (HINTS), American Community Survey (ACS), and others. Another goal of this lab is to collaborate with multidisciplinary teams and contribute to research studies designed to 1) train and mentor future public health and medical professionals and 2) implement community-based participatory research and quality improvement methodologies in community and clinical settings.

The HSR lab’s research focus was developed to cover a wide range of epidemiology outcomes and contributing factors using Andersen’s model of health services as its guiding framework.2 Race, ethnicity, place of birth, and geographic context (urban or rural) disparities are evaluated to determine how individual predisposing factors contribute to health outcomes. Dr. Kindratt’s research incorporates the examination of health disparities among Arab Americans, comprising of either born or tracing heritage to the Middle East or North Africa,  who are largely underrepresented in health research because they are classified as non-Hispanic Whites by the United States (US) federal government.3 The lab’s research focus extends Andersen’s model by incorporating patient experiences as contextual enabling factors of health services utilization and evaluating morbidity and mortality outcomes. Patient experiences that are examined include self-reports of qualities and modes of patient-provider communication, patient-provider gender and race concordance, care coordination, and provider satisfaction.

1.3 Outline of Textbook Chapters

 This textbook is separated into four sections, including: 1) introduction to national health surveys; 2) basic applied data analysis; 3) common national health surveys; and 4) dissemination and conclusions.

1.3.1 Textbook Section 1: Introduction to National Health Surveys

The first section includes three chapters. Chapter 1 provides an overview of the textbook by outlining its purpose to train future public health professionals in the knowledge and skills to conduct applied secondary data analysis using national health surveys. Chapter 2 provides a general overview of the surveys used for the case studies presented in this textbook. The national surveys used for the case studies include the NHIS in Chapter 6, the MEPS in Chapter 7, the HINTS in Chapter 8, the Behavior Risk Factor Surveillance System (BRFSS) in Chapter 9, and the National Health and Nutrition Examination Survey (NHANES) in Chapter 10. Chapter 3 includes a literature review of previous studies that have used national health surveys to answer public health and health services related research questions that align with the case studies in each chapter.

1.3.2 Textbook Section 2: Basic Applied Data Analysis

 The second section includes two chapters. Chapter 4 reviews basic statistical functions commonly used for public health and health services research questions. It is expected that students who use this textbook will have some background knowledge of research methods and study design; however, this chapter includes some basics for students who do not have a strong foundation in research methodology. Chapter 4 includes basic terminology on types of data collected, descriptive, (frequencies, percentages, means, standard deviations) and analytical statistical procedures (chi square, logistic regression) used for analysis of national health surveys. Chapter 5 includes details on additional survey design features that need to be considered when analyzing complex surveys. These include using procedures like PROC SURVEYFREQ, including weights, primary sampling units, and stratum variables. SAS programming examples will be used with NHIS data in these chapters.

 1.3.3 Textbook Section 3: COMMON National Health Surveys

 The third section includes five chapters dedicated to common national health surveys used for secondary data analysis among public health and health services research professionals. Chapters provide:

  1. A general overview of the survey and what it is used for;
  2. An overview of the data files available;
  3. Advantages of the survey;
  4. Disadvantages of the survey;
  5. Practical tips for conducting the analysis;  and
  6. Case study using a national health survey.

Each case study presents 1) a brief gap in the literature that the case study is attempting to address and 2) a research question. The case studies will outline the required steps to download, merge, create recoded (dummy) variables and analyze each dataset to answer research question. Sample SAS syntax will be provided.

Chapter 6 covers the NHIS. The objective of the NHIS survey case study is to explore whether Arab American/Middle Eastern or North African (MENA) adults are more or less likely to receive an annual flu vaccine in comparison to other racial/ethnic groups, such as other non-Hispanic Whites. To answer this research question, 2018 NHIS person and sample adult files will be analyzed. Chapter 7 covers the MEPS. The objective of the MEPS survey case study is to explore whether adults who perceived their health care provider provided quality communication during their visits over the last 12 months are more or less likely to receive an annual flu vaccine in comparison those who did not receive quality patient-provider communication.  To answer this research question, 2017 and 2018 MEPS household level in-person and self-administered questionnaire data will be analyzed. Chapter 8 covers the HINTS. The objective of the HINTS survey case study is to explore associations between electronic patient-provider communication and colon cancer screening uptake using HINTS 5 Cycle 3 data. Chapter 9 covers the BRFSS. The objective of the BRFSS survey case study is to explore how differences in caregiving experiences among urban and rural adults in Texas are moderated by race and ethnicity. To answer this research question,  2019 BRFSS state level data will be analyzed. Chapter 10 covers the NHANES. The objective of the NHANES survey case study is to estimate and compare sedentary behavior guideline adherence among US- and foreign-born adults by race and ethnicity using 2017-2020 pre-pandemic data.

1.3.4. Textbook Section 4: Dissemination and Conclusions

The fourth section includes two final chapters. Chapter 11 covers the dissemination of research studies using secondary data from national health surveys. It includes examples on how create poster presentations, oral presentations, abstracts, and full-length original research manuscripts. Chapter 12 provides a summary of what has been presented in the textbook and outlines potential recommendations for future editions.

1.4 Summary

In summary, this textbook provides instruction on how to conduct basic applied data analysis using secondary data collected from national health surveys. The textbook has been developed based on a previous course, PH 2999: Independent Study in Epidemiology. This individual study course was developed by Dr. Kindratt while receiving her PhD training at the University of Texas Health (UTHealth) School of Public Health Dallas Regional Campus. Dr. Kindratt developed an enhanced version with additional content for the University of Texas at Arlington’s KINE 4352 Big Data for Epidemiology course. The content was originally created to meet the requirements of a breadth/concentration in large database analysis because there was a lack of other courses which offered applied data analysis skills using secondary national health surveys to meet her professional goals and graduation requirements at that time. Learning objectives of the previous course were to:

  1. Review existing research conducted using selected national health surveys;
  2. Review sample designs and survey methods used when collecting national health survey data;
  3. Develop SAS and STATA programs for merging and analyzing selected national health surveys; and
  4. Create a teaching tool for each survey to summarize data analysis methods for future students.

The teaching tools developed for the course were used as the model for each of the chapters in this textbook on specific national health surveys. The course included analysis of MEPS, BRFSS, and NHANES surveys. Examples of the teaching tools developed for PH 2999 are provided in the corresponding Open ICPSR data repository. The examples and content have been updated to reflect changes in survey designs, data collection modalities, and the research interests of the primary author. NHIS and HINTS case studies have been included to make this open textbook more comprehensive of what national surveys students will encounter in the workforce and may be used for students volunteering or working in UTA’s HSR lab.

1.5 COVID-19 Pandemic Changes

The initial version of this textbook was written from June through December 2020 during the early waves of the COVID-19 pandemic. The methods described for the national surveys in this textbook represent “pre-pandemic” methodologies. Many surveillance systems and surveys had to modified due to safety concerns, stay-at-home orders, and data collection needs from 2020 onward.4 Some chapters includes a brief section that discusses these changes for the respective survey.

1.6 References

  1. Ye J, Leep C, Robin N, Newman S. Perception of Workforce Skills Needed Among Public Health Professionals in Local Health Departments: Staff Versus Top Executives. J Public Health Manag Pract. 2015;21 Suppl 6:S151-158. doi:10.1097/PHH.0000000000000299
  2. Andersen RM. National health surveys and the behavioral model of health services use. Med Care. 2008;46(7):647-653. doi:10.1097/MLR.0b013e31817a835d
  3. Abuelezam NN, El-Sayed AM, Galea S. The Health of Arab Americans in the United States: An Updated Comprehensive Literature Review. Front Public Health. 2018;6:262. doi:10.3389/fpubh.2018.00262
  4. Lau DT, Sosa P, Dasgupta N, He H. Surveillance, Surveys, and COVID-19. Am J Public Health. 2021;111(12):2085. doi:10.2105/AJPH.2021.306553


Icon for the Creative Commons Attribution 4.0 International License

Big Data for Epidemiology: Applied Data Analysis Using National Health Surveys Copyright © 2022 by Tiffany B. Kindratt is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.

Share This Book