10.4 Secondary data analysis
- Define secondary data analysis
- List the strengths and limitations of secondary data analysis
- Name at least two sources of publicly available quantitative data
- Name at least two sources of publicly available qualitative data
One type of unobtrusive research allows you to skip the data collection phase altogether. To many, skipping the data collection phase is preferable since it allows the researcher to proceed directly to answering their question through data analysis. When researchers analyze data originally gathered by another person or entity, they engage in secondary data analysis. Researchers gain access to data collected by other researchers, government agencies, and other unique sources by making connections with individuals engaged in primary research or accessing their data via publicly available sources.
Imagine you wanted to study whether race or gender influenced what major people chose at your college. You could do your best to distribute a survey to a representative sample of students, but perhaps a better idea would be to ask your college registrar for this information. Your college already collects this information on all of its students. Wouldn’t it be better to simply ask for access to this information, rather than collecting it yourself? Maybe.
Challenges in secondary data analysis
Some of you may be thinking, “I never gave my college permission to share my information with other researchers.” Depending on the policies of your university, this may or may not be true. In any case, secondary data is usually anonymized or does not contain identifying information. In our example, students’ names, student ID numbers, home towns, and other identifying details would not be shared with a secondary researcher. Instead, just the information on the variables—race, gender, and major—would be shared. Techniques to make data anonymous are not foolproof, however, and this is a challenge to secondary data analysis. Researchers have been able to identify individuals in “anonymized” data from credit card companies, Netflix, AOL, and online advertising companies have been able to be unmasked (Bode, 2017; de Montjoy, Radaelli, Singh, & Pentland, 2015).
Another challenge with secondary data stems from the lack of control over the data collection process. Perhaps your university made a mistake on their forms or entered data incorrectly. If this were your data, you could correct errors like this right away. With secondary data, you are less able to correct for any errors made by the original source during data collection. More importantly, you may not know these errors exist and reach erroneous conclusions as a result. Researchers using secondary data should evaluate the procedures used to collect the data wherever possible, and data that lacks documentation on procedures should be used with caution.
Secondary researchers, particularly those conducting quantitative research, must also ensure that their conceptualization and operationalization of variables matches that of the primary researchers. If your secondary analysis focuses on a variable that was not a major part of the original analysis, you may not have enough information about that variable to conduct a thorough analysis. For example, you want to study whether depression is associated with income for students and you found a dataset that included those variables. If depression was not a focus of the dataset, the original researchers may only have included a question like, “Have you ever been diagnosed with major depressive disorder?” While answers to this question will give you some information about depression, it will not give you the depth that a scale like Beck’s Depression Inventory or the Hamilton Rating Scale for Depression would or provide information about severity of symptoms like hospitalization or suicide attempts. Without this level of depth, your analysis may lack validity. Even when operationalization for your variables of interest is thorough, researchers may conceptualize variables differently than you do. Perhaps they were interested in whether a person was diagnosed with depression anytime in their life, whereas, you are concerned with their current symptoms of depression. For these reasons, understanding the original study thoroughly by reading the study documentation is a requirement for rigorous secondary data analysis.
The lack of control over the data collection process also hamstrings the research process itself. While some studies are created perfectly, most are refined through pilot testing and feedback before the full study is conducted (Engel & Schutt, 2016). Secondary data analysis does not allow you to engage in this process. For qualitative researchers in particular, this is an important challenge. Qualitative research, particularly from the interpretivist paradigm, uses emergent processes in which research questions, conceptualization of terms, and measures develop and change over the course of the study. Secondary data analysis inhibits this process from taking place because the data are already collected. Because qualitative methods often involve analyzing the context in which data are collected, secondary researchers may have difficulty authentically and accurately representing the original data in a new analysis.
Another challenge for research using secondary data can be getting access to the data. Researchers seeking access to data collected by universities (or hospitals, health insurers, human service agencies, etc.) must have the support of the administration. It may be important for researchers to form a partnership with the agency or university whose data is included in the secondary data analysis. Administrators will trust people who they perceive as competent, reputable, and objective. They must trust you to engage in rigorous and conscientious research. Some secondary data are available in repositories where the researcher can have somewhat automatic access if she is able to demonstrate her competence to complete the analysis, shares her data analysis plan, and receives ethical approval from an IRB. Administrators of data that are often accessed by researchers, such as Medicaid or Census data, may fall into this category.
Strengths of secondary data analysis
While the challenges associated with secondary data analysis are important, the strengths of secondary data analysis often outweigh these limitations. Most importantly, secondary data analysis is quicker and cheaper than a traditional study because the data are already collected. Once a researcher gains access to the data, it is simply a matter of analyzing it and writing up the results to complete the project. Data can take a long time to gather and be quite resource-intensive. So, avoiding this step is a significant strength of secondary data analysis. If the primary researchers had access to more resources, they may also be able to engage in data collection that is more rigorous than a secondary researcher could. In this way, outsourcing the data collection to someone with more resources may make your design stronger, not weaker. Finally, secondary researchers ask new questions that the primary researchers may not have considered. In this way, secondary data analysis deepens our understanding of existing data in the field. Table 10.3 summarizes the strengths and limitations of existing data.
|Reduces the time needed to complete the project
Cheaper to conduct, in many cases
Primary researcher may have more resources to conduct a rigorous data collection than you
Helps us deepen our understanding of data already in the literature
Useful for historical research
|Anonymous data may not be truly anonymous
No control over data collection process
Cannot refine questions, measures, or procedure based on feedback or pilot tests
May operationalize or conceptualize concepts differently than primary researcher
Missing qualitative context
Barriers to access and conflicts of interest
Ultimately, you will have to weigh the strengths and limitations of using secondary data on your own. Engel and Schutt (2016, p. 327) propose six questions to ask before using secondary data:
- What were the agency’s or researcher’s goals in collecting the data?
- What data were collected, and what were they intended to measure?
- When was the information collected?
- What methods were used for data collection? Who was responsible for data collection, and what were their qualifications? Are they available to answer questions about the data?
- How is the information organized (by date, individual, family, event, etc.)? Are there identifiers used to identify different types of data available?
- What is known about the success of the data collection effort? How are missing data indicated and treated? What kind of documentation is available? How consistent are the data with data available from other sources?
Sources of secondary data
Many sources of quantitative data are publicly available. The General Social Survey (GSS), which was discussed in Chapter 7 , is one of the most commonly used sources of publicly available data among quantitative researchers. Data for the GSS have been collected regularly since 1972, thus offering social researchers the opportunity to investigate changes in Americans’ attitudes and beliefs over time. Questions on the GSS cover an extremely broad range of topics, from family life to political and religious beliefs to work experiences.
Other sources of quantitative data include Add Health, a study that was initiated in 1994 to learn about the lives and behaviors of adolescents in the United States, and the Wisconsin Longitudinal Study, a study that has, for over 40 years, surveyed a panel of 10,000 people who graduated from Wisconsin high schools in 1957. Quantitative researchers interested in studying social processes outside of the United States also have many options when it comes to publicly available data sets. Data from the British Household Panel Study, a longitudinal, representative survey of households in Britain, are freely available to those conducting academic research (private entities are charged for access to the data). The International Social Survey Programme merges the GSS with its counterparts in other countries around the globe. These represent just a few of the many sources of publicly available quantitative data.
Unfortunately for qualitative researchers, far fewer sources of free, publicly available qualitative data exist. This is slowly changing, however, as technical sophistication grows and it becomes easier to digitize and share qualitative data. Despite comparatively fewer sources than for quantitative data, there are still a number of data sources available to qualitative researchers whose interests or resources limit their ability to collect data on their own. The Murray Research Archive, housed at the Institute for Quantitative Social Science at Harvard University, offers case histories and qualitative interview data. The Global Feminisms project at the University of Michigan offers interview transcripts and videotaped oral histories focused on feminist activism; women’s movements; and academic women’s studies in China, India, Poland, and the United States. At the University of Connecticut, the Oral History Office provides links to a number of other oral history sites. Not all the links offer publicly available data, but many do. Finally, the Southern Historical Collection at University of North Carolina–Chapel Hill offers digital versions of many primary documents online such as journals, letters, correspondence, and other papers that document the history and culture of the American South.
Keep in mind that the resources mentioned here represent just a snapshot of the many sources of publicly available data that can be easily accessed via the web. Table 10.4 summarizes the data sources discussed in this section.
|Organizational home||Focus/topic||Data||Web address|
|National Opinion Research Center||General Social Survey; demographic, behavioral, attitudinal, and special interest questions; national sample||Quantitative||http://www.norc.uchicago.edu/GSS+Website/|
|Carolina Population Center||Add Health; longitudinal social, economic, psychological, and physical well-being of cohort in grades 7–12 in 1994||Quantitative||http://www.cpc.unc.edu/projects/addhealth|
|Center for Demography of Health and Aging||Wisconsin Longitudinal Study; life course study of cohorts who graduated from high school in 1957||Quantitative||https://www.ssc.wisc.edu/wlsresearch/|
|Institute for Social & Economic Research||British Household Panel Survey; longitudinal study of British lives and well- being||Quantitative||https://www.iser.essex.ac.uk/bhps|
|International Social Survey Programme||International data similar to GSS||Quantitative||http://www.issp.org/|
|The Institute for Quantitative Social Science at Harvard University||Large archive of written data, audio, and video focused on many topics||Quantitative and qualitative||http://dvn.iq.harvard.edu/dvn/dv/mra|
|Institute for Research on Women and Gender||Global Feminisms Project; interview transcripts and oral histories on feminism and women’s activism||Qualitative||http://www.umich.edu/~glblfem/index.html|
|Oral History Office||Descriptions and links to numerous oral history archives||Qualitative||http://www.oralhistory.uconn.edu/links.html|
|UNC Wilson Library||Digitized manuscript collection from the Southern Historical Collection||Qualitative||http://dc.lib.unc.edu/ead/archivalhome.php? CISOROOT=/ead|
Spotlight on UTA school of social work
Secondary Data analysis
Dr. Kathy Lee of the University of Texas at Arlington’s School of Social Work is interested in mental health and quality of life among vulnerable and marginalized older adults and their family caregivers. Dr. Lee is particularly interested in social participation interventions and psychosocial intervention that promote their health and well-being outcomes. The majority of Dr. Lee’s work has been conducted with panel survey data from the Health and Retirement Study (HRS). Some advantages of using secondary data includes building evidence based on high quality data (i.e., nationally representative data that are easily accessible) and allowing researchers to understand social trends over time. Although secondary analysis requires time and efforts to be familiar with the dataset due to its complexity and breadth, researchers can answer a wide range of research questions, particularly with knowledge of survey statistics and methods.
HRS is the first and the largest longitudinal study, consisting of over 37,000 individuals age 50 and over in 23,000 households in the United States. The purpose of HRS is to inform researchers and policymakers of important issues of retirement and health of aging populations and to promote discussion to respond to the rapidly aging society. Since 1992, a variety of content and data have been included, such as physical measures, biomarkers, and psychosocial factors, making the study multi-disciplinary. The data are collected through multiple modes: face-to-face, telephone, and mail. The survey is conducted biannually with support from the National Institute on Aging and the Social Security Administration. Panel survey data from publicly available databases, like HRS, are very essential for researchers to better understand opportunities and challenges to aging.
Dr. Lee’s dissertation research (Lee, 2018) examined (1) the impact of volunteering on cognitive health among older adults with cognitive impairment, and (2) the complex relationships between volunteer behaviors, psychological well-being, and cognitive health. Using HRS data collected from 2004 to 2014, her study included older adults age 65 and older living with cognitive impairment based on the Telephone Interview for Cognitive Status (≤11 out of the total score of 27). With a focus on a description of change over time, Dr. Lee tested linear mixed effects models to examine growth or decline in cognitive health of older adults with cognitive impairment by volunteer and non-volunteer group. Dr. Lee also employed structural equation modeling to observe the snapshot of variables of interest – volunteer behaviors, psychological well-being, and cognitive health. The study data showed that (1) the level of cognitive functioning slightly increased over time only among those who volunteered, and (2) the relationship between psychological well-being and cognitive functioning was significantly greater than the relationship between volunteering and cognitive functioning, suggesting the importance of providing volunteer activities that can increase one’s psychological well-being.
Research involving secondary data can be an important contribution to improving the lives of social work clients. The value of Dr. Lee’s secondary data research was recognized by the Gerontological Society of America who awarded her the Junior Scholar Award for Research Related to Disadvantaged Older Adults and the Emerging Scholar and Professional Organization Poster Award in 2018.
Dr. Lee is currently working on multiple other secondary analyses to broaden knowledge around social participation and depression among vulnerable aging populations.
- The strengths and limitations of secondary data analysis must be considered before a project begins.
- Previously collected data sources enable researchers to conduct secondary data analysis.
- Anonymized data- data that does not contain identifying information
- Historical research-analyzing data from primary sources of historical events and proceedings
- Secondary data analysis- analyzing data originally gathered by another person or entity
KathyLeePhoto by Tim Siepker CC BY-NC-ND