2.2 Raw data

Learning Objectives

Learners will be able to…

  • Identify potential sources of available data
  • Weigh the challenges and benefits of collecting your own data

In our previous section, we addressed some of the challenges researchers face in collecting and analyzing raw data. Just as a reminder, raw data are unprocessed, unanalyzed data that researchers analyze using social science research methods. It is not just the statistics or qualitative themes in journal articles. It is the actual data from which those statistical outputs or themes are derived (e.g., interview transcripts or survey responses). There are two approaches to getting data. First, researchers can analyze data that someone else has collected. This may be through publicly archived data, agencies, or other researchers. Using secondary data like this can make projects more feasible, but you may not find existing data that are useful for answering your working question. For that reason, many researchers gather their own data.

Using secondary data

Within the agency setting, there are two main sources of raw data. One option is to examine client charts. For example, if you wanted to know if substance use was related to parental reunification for youth in foster care, you could look at client files and compare how long it took for families with differing levels of substance use to be reunified. You will have to negotiate with the agency the degree to which your analysis can be public. Agencies may be okay with you using client files for a class project but less comfortable with you presenting your findings at a city council meeting. When analyzing data from your agency, you will have to manage a stakeholder relationship.

Another great example of agency-based raw data comes from program evaluations. If you are working with a grant-funded agency, administrators and clinicians are likely to produce data for grant reporting. The agency may consent to have you look at the raw data and run your own analysis. Larger agencies may also conduct internal research—for example, surveying employees or clients about new initiatives. These, too, can be good sources of available data. Generally, if the agency has already collected the data, you can ask to use them. Again, it is important to be clear on the boundaries and expectations of the agency. And don’t be angry if they say no!

Some agencies, usually government agencies, publish their data in formal reports. You could take a look at some of the websites for county or state agencies to see if there are any publicly available data relevant to your research topic. As an example, perhaps there are annual reports from the state Department of Education that show how seclusion and restraint is disproportionately applied to Black children with disabilities, as students found in Virginia. In another example, one student matched public data from their city’s map of criminal incidents with historically redlined neighborhoods. For this project, she is using publicly available data from Mapping Inequality, which digitized historical records of redlined housing communities and the Roanoke, VA crime mapping webpage. By matching historical data on housing redlining with current crime records, she is testing whether there is still an association between redlining and crime to this day.

Not all public data are easily accessible, though. The student in the previous example was lucky that scholars had digitized the records of how Virginia cities were redlined by race. Sources of historical data are often located in physical archives, rather than digital archives. If your project uses historical data in an archive, it would require you to physically go to the archive in order to review the data. Unless you have a travel budget, you may be limited to the archival data in your local libraries and government offices. Similarly, government data may have to be requested from an agency, which can take time. If the data are particularly sensitive or if the department would have to dedicate a lot of time to your request, you may have to file a Freedom of Information Act request. This process can be time-consuming, and in some cases, it will add financial cost to your study.

Another source of secondary data is shared by researchers as part of the publication and review process. There is a growing trend in research to publicly share data so others can verify your results and attempt to replicate your study. In more recent articles, you may notice links to data provided by the researcher. Often, these have been de-identified by eliminating some information that could lead to violations of confidentiality. You can browse through the data repositories in Table 2.2 to find raw data to analyze. Make sure that you pick a data set with thorough and easy to understand documentation. You may also want to use Google’s dataset search which indexes some of the websites below as well as others in a very intuitive and easy to use way.

Table 2.2 Sources of publicly available data
Organizational home Focus/topic Data Web address
National Opinion Research Center General Social Survey; demographic, behavioral, attitudinal, and special interest questions; national sample Quantitative https://gss.norc.org/
Carolina Population Center Add Health; longitudinal social, economic, psychological, and physical well-being of cohort in grades 7–12 in 1994 Quantitative http://www.cpc.unc.edu/projects/addhealth
Center for Demography of Health and Aging Wisconsin Longitudinal Study; life course study of cohorts who graduated from high school in 1957 Quantitative https://www.ssc.wisc.edu/wlsresearch/
Institute for Social & Economic Research British Household Panel Survey; longitudinal study of British lives and well- being Quantitative https://www.iser.essex.ac.uk/bhps
International Social Survey Programme International data similar to GSS Quantitative http://www.issp.org/
The Institute for Quantitative Social Science at Harvard University Large archive of written data, audio, and video focused on many topics Quantitative and qualitative http://dvn.iq.harvard.edu/dvn/dv/mra
Institute for Research on Women and Gender Global Feminisms Project; interview transcripts and oral histories on feminism and women’s activism Qualitative https://globalfeminisms.umich.edu/
Oral History Office Descriptions and links to numerous oral history archives Qualitative https://archives.lib.uconn.edu/islandora/
object/20002%3A19840025
UNC Wilson Library Digitized manuscript collection from the Southern Historical Collection Qualitative http://dc.lib.unc.edu/ead/archivalhome.php? CISOROOT=/ead
Qualitative Data Repository A repository of qualitative data that can be downloaded and annotated collaboratively with other researchers Qualitative https://qdr.syr.edu/

Ultimately, you will have to weigh the strengths and limitations of using secondary data on your own. ==THE FOLLOWING IS FROM AN UNDERGRADUATE TEXT. LET’S UPGRADE THIS TO PHD LEVEL==Engel and Schutt (2016, p. 327)[1] propose six questions to ask before using secondary data:

  1. What were the agency’s or researcher’s goals in collecting the data?
  2. What data were collected, and what were they intended to measure?
  3. When was the information collected?
  4. What methods were used for data collection? Who was responsible for data collection, and what were their qualifications? Are they available to answer questions about the data?
  5. How is the information organized (by date, individual, family, event, etc.)? Are identifiers used to indicate different types of data available?
  6. What is known about the success of the data collection effort? How are missing data indicated and treated? What kind of documentation is available? How consistent are the data with data available from other sources?

In this section, we’ve talked about data as though it is always collected by scientists and professionals. But that’s definitely not the case! Think more broadly about sources of data that are already out there in the world. Perhaps you want to examine the different topics mentioned in the past 10 State of the Union addresses by the President. Or maybe you want to examine whether the websites and public information about local health and mental health agencies use gender-inclusive language. People share their experiences through blogs, social media posts, videos, and performances, among countless other sources of data. When you think broadly about data, you’ll be surprised how much you can answer with available data.

Primary data collection

The primary benefit of collecting your own data is that it allows you to collect and analyze the specific data you are looking for, rather than relying on what other people have shared. You can make sure the right questions are asked to the right people.  It’s worth remembering here that you need to have access to research participants to collect your own data.  Consent from gatekeepers may be important, and as we described earlier, an agency may be interested in collaborating on a project. Bringing an agency on board as a stakeholder in your project may allow you access to email lists or time at staff meetings as well as access to practitioners or community members. Collaborating with agency partners in this way can be a challenge, as you must negotiate roles, get stakeholder buy-in, and manage potentially conflicting time schedules. At the same time, it allows you to make your work immediately relevant to specific practices and client populations.

INSERT OTHER WAYS OF COLLECTING PRIMARY DATA (USING EXAMPLES FROM LITERATURE FROM OUR OWN RESEARCH EXPERIENCES/INTERESTS). possibly interview faculty members re: their own experiences w/primary data collection. or See Lee et al. as an example of peer-reviewed literature related to collecting data.

Key Takeaways

  • Research projects require analyzing data.
  • Researchers can use secondary data or collect their own data.

Post-awareness check (Environment)

In what environment are you most comfortable in data collection (phone calls, face-to-face recruitment, etc.)? Consider your preferred method of data collection that may align with both your personality and your target population.

Exercises

TRACK 1 (IF YOU ARE CREATING A RESEARCH PROPOSAL FOR THIS CLASS)

  • Describe the difference between raw data and the results of research articles.
  • Identify potential sources of secondary data that might help you answer your working question.
    • Consider browsing around the data repositories in Table 2.2.
  • Identify a common type of project (e.g., surveys of practitioners) and how conducting a similar project might help you answer your working question.

TRACK 2 (IF YOU AREN’T CREATING A RESEARCH PROPOSAL FOR THIS CLASS)

You are researching the impact of your city’s recent harm reduction interventions for intravenous drug users (e.g., sterile injection kits, monitored use, overdose prevention, naloxone provision, etc.).

  • Describe the difference between raw data and the results of research articles.
  • Identify potential sources of secondary data that might help you in your harm reduction study.
    • Consider browsing around the data repositories in Table 2.2.
  • What kind of raw data might you collect yourself for your study?

  1. Engel, R. J. & Schutt, R. K. (2016). The practice of research in social work (4th ed.). Washington, DC: SAGE Publishing.

License

Doctoral Research Methods in Social Work Copyright © by Mavs Open Press. All Rights Reserved.

Share This Book