2.3 Data management and analysis

Mavs Open Press

2.3 Data management and analysis

Learning Objectives

Learners will be able to…

Define and construct a data analysis plan
Define key quantitative data management terms—variable name, data dictionary, and observations/cases
Differentiate between univariate and bivariate quantitative analysis
Explain when we might use quantitative bivariate analysis in social work research
Identify how your qualitative research question, research aim, and type of data may influence your choice of analytic methods
Outline the steps you will take in preparation for conducting qualitative data analysis

After you have your raw data, whether secondary data or data you collected yourself, you will need to analyze it. While the specific steps to follow in quantitative or qualitative data analysis are beyond the scope of this chapter, we are going to address some basic concepts in this section to help you create a data analysis plan. A data analysis plan is an ordered outline that includes your research question, a description of the data you are going to use to answer it, and the exact step-by-step analyses that you plan to run to answer your research question. If you look back at Table 2.1, you will see that creating a data analysis plan is a part of the study design process. The data analysis plan flows from the research question, is integral to the study design, and should be well conceptualized prior to beginning data collection. In this section, we will walk through the basics of quantitative and qualitative data analysis to help you understand the fundamentals of creating a data analysis plan.

When considering what primary data you might want to collect as part of your project, there are two important considerations. You might only get one chance to interact with your participants, so you must think comprehensively in your planning phase about what information you need and collect as much relevant data as possible. At the same time, though, especially when collecting sensitive information, you need to consider how onerous the data collection is for participants and whether you really need them to share that information. Just because something is interesting to us doesn’t mean it’s related enough to our research question to chase it down. Work with your research team early in your project to talk through these issues before you get to this point. And if you’re using secondary data, make sure you have access to all the information you need in that data before you use it.

Quantitative Data: Management

Once you’ve collected your quantitative data, you need to make sure it is well-organized in a database in a way that’s actually usable. “Database” can be kind of a scary word, but really, it can be as simple as an Excel spreadsheet or a data file in whatever program you’re using to analyze your data. You may want to avoid Excel and use a formal database such as Microsoft Access or MySQL if you’ve got a large or complicated data set. But if your data set is smaller and you plan to keep your analyses simple, you can definitely get away with Excel. A typical data set is organized with variables as columns and observations/cases as rows. For example, let’s say we did a survey on ice cream preferences and collected the following information in Table 2.3:

Table 2.3 Results of our ice cream survey
Name	Age	Gender	Hometown	Fav_Ice_Cream
Tom	54	0	1	Rocky Road
Jorge	18	2	0	French Vanilla
Melissa	22	1	0	Espresso
Amy	27	1	0	Black Cherry

There are a few key data management terms to understand:

- - Variable name: Just what it sounds like—the name of your variable. Make sure this is something useful and short and, if you’re using something other than Excel, all one word. Most statistical programs will automatically rename variables for you if they aren’t one word, but the names can be a little ridiculous and long.
  - Observations/cases: The rows in your data set. In social work, these are often your study participants (people), but can be anything from census tracts to black bears to trains. When we talk about sample size, we’re talking about the number of observations/cases. In our mini data set, each person is an observation/case.
  - Data dictionary (also called a code book or metadata): This is the document where you list your variable names, what the variables actually measure or represent, what each of the values of the variable mean if the meaning isn’t obvious (i.e., if there are numbers assigned to gender), the level of measurement and anything special to know about the variables (for instance, the source if you mashed two data sets together). If you’re using secondary data, the researchers sharing the data should make the data dictionary available.

Let’s take that mini data set we’ve got up above and we’ll show you what your data dictionary might look like in Table 2.4.

Table 2.4 Sample data dictionary/code book
Variable name	Description	Values/Levels	Level of measurement	Notes
Name	Participant’s first name	open-ended response	Nominal	First name only. If name appears more than once, a random number has been attached to the end of the name to distinguish.
Age	Participant’s age at time of survey	integer, in years	Ratio	Self-reported
Gender	Participant’s self-identified gender	0=cisgender female; 1=cisgender male; 2=non-binary; 3=transgender female; 4=transgender male; 5=another gender	Nominal	Self-reported
Hometown	Participant’s hometown	0=This town 1=Another town	Nominal	Self-reported
Fav_Flav	Participant’s favorite ice cream	open-ended response	Nominal	Self-reported

Quantitative Data: Univariate Analysis

As part of planning for your research, you should come up with a data analysis plan. Remember, a data analysis plan is an ordered outline that includes your research question, a description of the data you are going to use to answer it, and the exact step-by-step analyses that you plan to run to answer your research question. A basic data analysis plan might look something like what you see in Table 2.5. Don’t panic if you don’t yet understand some of the statistical terms in the plan; we’re going to delve into some of them in this section, and others will be covered in more depth in your statistics courses. Note here also that this is what operationalizing your variables and moving through your research with them looks like on a basic level. We will cover operationalization in more depth in Chapter 10.

Table 2.5 A basic data analysis plan
Research question: What is the relationship between a person’s race and their likelihood to graduate from high school?
Data: Individual-level U.S. American Community Survey data for 2017 from IPUMS, which includes race/ethnicity and other demographic data (i.e., educational attainment, family income, employment status, citizenship, presence of both parents, etc.). Only including individuals for which race and educational attainment data is available.
Steps in Data Analysis Plan Univariate and descriptive statistics, including mean, median, mode, range, distribution of interval/ratio variables, and missing values Bivariate statistical tests between the independent, control, and dependent variables. For instance, Chi-square test between race and high school graduation (both nominal variables), ANOVA on income and race. Correlations between interval/ratio variables. Multivariate statistical analysis, like logistic regression, with high school graduation (yes/no) as my dependent variable, race as the independent variable, and multiple control variables I think are relevant based on my conceptual framework. Interpreting and reporting logistic regression results.

An important point to remember is that you should never get stuck on using a particular statistical method because you or one of your co-researchers thinks it’s cool or it’s the hot thing in your field right now. You should certainly go into your data analysis plan with ideas, but in the end, you need to let your research question guide what statistical tests you plan to use. Be prepared to be flexible if your plan doesn’t pan out because the data is behaving in unexpected ways.

You’ll notice that the first step in the quantitative data analysis plan is univariate and descriptive statistics. Univariate data analysis is a quantitative method in which a variable is examined individually to determine its distribution, or the way the scores are distributed across the levels, or values, of that variable. When we talk about levels, what we are talking about are the possible values of the variable—like a participant’s age, income, or gender. (Note that this is different from levels of measurement, which will be discussed in Chapter 11, but the level of measurement of your variables absolutely affects what kinds of analyses you can do with it.) Univariate analysis is non-relational, which just means that we’re not looking into how our variables relate to each other. Instead, we’re looking at variables in isolation to try to understand them better. For this reason, univariate analysis is used for descriptive research questions.

So, when do you use univariate data analysis? Always! It should be the first step you take with your quantitative data, whether you plan to move on to more complex statistical analyses or are conducting a study to describe a new phenomenon. You need to understand what the values of each variable look like—what if one of your variables has a lot of missing data because participants didn’t answer that question on your survey? What if there isn’t much variation in the gender of your sample? These are things you’ll learn through univariate analysis.

Quantitative Data: Bivariate Analysis

Did you know that ice cream causes shark attacks? It’s true! When ice cream sales go up in the summer, so does the rate of shark attacks. So you’d better put down that ice cream cone unless you want to make yourself look more delicious to a shark.

Photo of shark with open mouth emerging from water

Ok, so it’s quite obviously not true that ice cream causes shark attacks. But if you looked at these two variables and how they’re related, you’d notice that during times of the year with high ice cream sales, there are also the most shark attacks. This is a classic example of the difference between correlation and causation. Despite the conclusion we drew about causation being wrong, it’s nonetheless true that these two variables are related, and researchers figured that out through the use of bivariate analysis.

Bivariate analysis consists of a group of statistical techniques that examine the association between two variables. We could look at how anti-depressant medications and appetite are related, whether there is a relation between having a pet and emotional well-being, or if a policy-maker’s level of education is related to how they vote on bills related to environmental issues.

Bivariate analysis forms the foundation of multivariate analysis, which we don’t get to in this book. All you really need to know here is that there are steps beyond bivariate analysis, which you’ve undoubtedly seen in scholarly literature already! But before we can move forward with multivariate analysis, we need to understand the associations between the variables in our study.

Throughout your PhD program, you will learn much more about quantitative data analysis techniques, including more sophisticated multivariate analysis methods. Hopefully, this section has provided you with some initial insights into how data is analyzed, and the importance of creating a data analysis plan prior to collecting data. Next, we will discuss some basic strategies for creating a qualitative data analysis plan.

Resources for Quantitative Data Analysis

If you are affiliated with a university, you will likely have access to some kind of commercial statistics software. Examples in the previous section use SPSS, a common commercial statistical software package. SPSS is relatively easy to use due to its graphical user interface, which does not require researchers to learn basic computer programming. However, like its competitors SAS and STATA, SPSS is expensive and the software license must be renewed every year (like a subscription).

We suggest getting familiar with open source statistical software packages such JASP Statistics or R. JASP is a free and open-source alternative to SPSS developed and supported by the University of Amsterdam. It has a similar user interface as SPSS, and should be similarly easy to learn. Moreover, usability upgrades from SPSS like generating APA formatted tables make it a compelling option. While a great many of my students will rely on statistical analyses of their programs and practices in reports to funders, it is unlikely that any will use SPSS. Browse JASP’s how-to guide or consult this textbook Learning Statistics with JASP: A Tutorial for Psychology Students and Other Beginners, written by Danielle J. Navarro, David R. Foxcroft, and Thomas J. Faulkenberry.

R (a.k.a. The R Project for Statistical Computing) uses a command line interface, so you will need some coding knowledge in order to use it. Luckily, R is the most commonly used statistics software in the world, and the community of support and guides for using R are omnipresent online. For beginning researchers, consult the textbook Learning Statistics with R: A tutorial for psychology students and other beginners by Danielle J. Navarro.

While statistics software is sometimes needed to perform advanced statistical tests, most univariate and bivariate tests can be performed in spreadsheet software like Microsoft Excel, Google Sheets, or the free and open source LibreOffice Calc. Microsoft includes a ToolPak to perform complex data analysis as an add-on to Excel. For more information on using spreadsheet software to perform statistics, the open textbook Collaborative Statistics Using Spreadsheets by Susan Dean, Irene Mary Duranczyk, Barbara Illowsky, Suzanne Loch, and Janet Stottlemyer.

Statistical analysis is performed in just about every discipline, and as a result, there are a lot of openly licensed, free resources to assist you with your data analysis. We have endeavored to provide you with the basics in the past few chapters, but ultimately, you will likely need additional support in completing quantitative data analysis from an instructor, textbook, or other resource. Browse the Open Textbook Library for statistics resources or look for video tutorials from reputable instructors like this video textbook on statistics by Bryan Koenig.

Qualitative Data: Management

Qualitative research often involves human participants and qualitative data can include recordings or transcripts of their words, photographs or images, or diaries and documents. The personal nature of qualitative data poses the challenge of recognizability of sensitive information on individuals, communities, and places. If you choose this methodology for your research, you should familiarize yourself with policies, procedures, and rules to ensure the safety and security of data in the documentation and dissemination process.

In any research involving primary data, a researcher is not only entrusted with the responsibility of upholding the privacy of their participants but also accountable to them, making confidentiality and human subjects’ protection front and center of qualitative data management. Data such as audiotapes, videotapes, transcripts, notes, and other records should be stored and secured in locations where only authorized persons have access to them.

Sometimes in qualitative research, you will learn intimate details about people’s lives. Often, qualitative data contain personal identifiers. A helpful practice to ensure that participants’ confidentiality is to replace personal information in transcripts with pseudonyms or descriptive language (e.g., “[the participant’s sister]” instead of the sister’s name). Once audio and video recordings have been accurately transcribed with the de-identification of personal identifiers, the original recordings should be destroyed.

Qualitative Data: Analysis

There are many different types of qualitative data, including transcripts of interviews and focus groups, observational data, documents and other artifacts, and more. Your qualitative data analysis plan should be anchored in the type of data collected and the purpose of your study. Qualitative research can serve a range of purposes. Below is a brief list of general questions we might consider when using a qualitative approach.

- - Are you trying to understand how a particular group is affected by an issue?
  - Are you trying to uncover how people arrive at a decision in a given situation?
  - Are you trying to examine different points of view on the impact of a recent event?
  - Are you trying to summarize how people understand or make sense of a condition?
  - Are you trying to describe the needs of your target population?

If you don’t see the general aim of your research question reflected in one of these areas, don’t fret! This is only a small sampling of what you might be trying to accomplish with your qualitative study. Whatever your aim, you need to have a plan for what you will do once you have collected your data.

Iterative or Linear

Some qualitative research is linear, meaning it follows more of a traditionally quantitative process: create a plan, gather data, and analyze data; each step is completed before we proceed to the next.

Example. I’ve recently been working on a project where my research team conducted several focus groups. We used a linear approach. First, we planned our study and got it approved. Then we arranged and conducted our focus groups. We proceeded to transcribe all our data and went on to analyze it. We didn’t start our analysis until we had all our data in front of us.

However, many times qualitative research is iterative, or evolving in cycles. An iterative approach means that once we begin collecting data, we also begin analyzing data as it is coming in. This early and ongoing analysis of our (incomplete) data then impacts our continued planning, data gathering, and future analysis.

As a comparison, think about the way this book is written and used. It is written linearly, but you will likely engage with it iteratively as you use it for future reference. You may revisit previous sections so you can understand how they fit together and you are in a continuous process of building and revising how you think about the concepts you are learning about.

Example. To demonstrate an iterative approach, let’s say we are conducting interviews. After we have completed our first three interviews, we sit down and do some preliminary analysis and identify some early themes that seem important. In our next interviews, we add a couple of questions based on these themes to explore these ideas with new participants. These new participants agree that these topics are important, but they have a different take on them, and share their unique experiences with us. This gives us a new understanding of the data we are gathering. If we are utilizing an iterative approach like this, it also means that as new ideas emerge later on in the process, we need to go back to data that was collected earlier and see if there may have been previous evidence of those ideas that were missed.

As you may have guessed, there are benefits and challenges to both linear and iterative approaches. A linear approach is much more straightforward, with each step being fairly defined. However, linear research being more defined and rigid also presents certain challenges. A linear approach assumes that we know what we need to ask or look for at the very beginning of data collection, which often is not the case. Figure 2.1 contrasts the two approaches.

Comparison of linear and iterative systematic approaches. Linear approach box is a series of boxes with arrows between them in a line. The first box is "create a plan", then "gather data", ending with "analyze data". The iterative systematic approach is a series of boxes in a circle with arrows between them, with the boxes labeled "planning", "data gathering", and "analyzing the data". — Figure 2.1 Comparison of linear and iterative systematic approaches

With iterative research, we have more flexibility to adapt our approach as we learn new things. We still need to keep our approach systematic and organized, however, so that our work doesn’t become a free-for-all. As we adapt, we do not want to stray too far from the original premise of our study. It’s also important to remember with an iterative approach that we may risk ethical concerns if our work extends beyond the original boundaries of our informed consent and institutional review board agreement (IRB; see Chapter 3 for more on IRBs). If you need to update your original research plan as you gain more insight into the topic, you can submit an addendum to modify your original IRB application. Make sure to keep detailed notes of the decisions that you are making and what is informing these choices. This helps to support transparency and credibility throughout the research process.

Acquainting yourself with your data

As you begin your analysis, you need to get to know your data. This often means reading through your data prior to any attempt at breaking it apart and labeling it. It is common to read transcripts at least twice before beginning any analyses. This helps give you a more comprehensive feel for each piece of data and the data as a whole, again, before you start to break it down into smaller units or deconstruct it. This is especially important if others assisted in the data collection process. We often gather data as part of a team and everyone involved in the analysis needs to be very familiar with all of the data.

Capturing your emerging understanding of the data

During your reviewing you will start to develop and evolve your understanding of what the data means. Coding is a part of the qualitative data analysis process where we begin to interpret and assign meaning to the data. It represents one of the first steps as we begin to filter the data through our own subjective lens as the researcher. This understanding of the data should be dynamic and flexible, but you want to have a way to capture this understanding as it evolves. You may include this as part of your qualitative codebook where you are tracking the main ideas that are emerging and what they mean. Table 2.6 is an example of how your thinking might change about a code and how you can go about capturing it.

Table 2.6 Example of the evolution of a code in a codebook
Date	Code	Explanations
6/18/24	Experience of wellness	This code captures the different ways people describe wellness in their lives
6/22/24	Understanding of wellness	Changed the label of this code slightly to reflect that many participants emphasize the cognitive aspect of how they understand wellness—how they think about it in their lives, not only the act of ‘experiencing it’. This understanding seems like a precursor to experiencing. An evolving sense of how you think about wellness in your life.
6/25/24	Wellness experienced by developing personal awareness	A broader understanding of this category is developing. It involves building a personalized understanding of what makes up wellness in each person’s life and the role that they play in maintaining it. Participants have emphasized that this is a dynamic, personal and ongoing process of uncovering their own intimate understanding of wellness. They describe having to experiment, explore, and reflect to develop this awareness.

There are a variety of different approaches to qualitative analysis, including thematic analysis, content analysis, grounded theory, phenomenology, photovoice, and more. The specific steps you will take to code your qualitative data and generate themes from these codes will vary based on the analytic strategy you are employing. In designing your qualitative study, you would identify an analytical approach as you plan out your project. The one you select would depend on the type of data you have and what you want to accomplish with it. In Chapter 19, we will go into more detail about various types of qualitative data analysis. Each qualitative approach has specific techniques and methods that take substantial study and practice to master.

Key Takeaways

Getting organized at the beginning of your project with a data analysis plan will help keep you on track. Data analysis plans should include your research question, a description of your data, and a step-by-step outline of what you’re going to do with it. [chapter 14.1]
Be flexible with your data analysis plan—sometimes data surprises us and we have to adjust the statistical tests we are using. [chapter 14.1]
Always make a data dictionary or, if using secondary data, get a copy of the data dictionary so you (or someone else) can understand the basics of your data. [chapter 14.1]
Bivariate analysis is a group of statistical techniques that examine the relationship between two variables. [chapter 15.1]
You need to conduct bivariate analyses before you can begin to draw conclusions from your data, including in future multivariate analyses. [chapter 15.1]
There are a lot of high-quality and free online resources to learn and perform statistical analysis.
Qualitative research analysis requires preparation and careful planning. You will need to take time to familiarize yourself with the data in a general sense before you begin analyzing. [chapter 19.3]
The specific steps you will take to code your qualitative data and generate final themes will depend on the qualitative analytic approach you select.

Exercises

TRACK 1 (IF YOU ARE CREATING A RESEARCH PROPOSAL FOR THIS CLASS)

Make a data analysis plan for your project. Remember this should include your research question, a description of the data you will use, and a step-by-step outline of what you’re going to do with your data once you have it, including statistical tests (non-relational and relational) that you plan to use. You can do this exercise whether you’re using quantitative or qualitative data! The same principles apply.
Make a data dictionary for the data you are proposing to collect as part of your study. You can use the example above as a template.

TRACK 2 (IF YOU AREN’T CREATING A RESEARCH PROPOSAL FOR THIS CLASS)

You are researching the impact of your city’s recent harm reduction interventions for intravenous drug users (e.g., sterile injection kits, monitored use, overdose prevention, naloxone provision, etc.).

Make a draft quantitative data analysis plan for your project. Remember this should include your research question, a description of the data you will use, and a step-by-step outline of what you’re going to do with your data once you have it, including statistical tests (non-relational and relational) that you plan to use. It’s okay if you don’t yet have a complete idea of the types of statistical analyses you might use.

Organizational home	Focus/topic	Data	Web address
National Opinion Research Center	General Social Survey; demographic, behavioral, attitudinal, and special interest questions; national sample	Quantitative	https://gss.norc.org/
Carolina Population Center	Add Health; longitudinal social, economic, psychological, and physical well-being of cohort in grades 7–12 in 1994	Quantitative	http://www.cpc.unc.edu/projects/addhealth
Center for Demography of Health and Aging	Wisconsin Longitudinal Study; life course study of cohorts who graduated from high school in 1957	Quantitative	https://www.ssc.wisc.edu/wlsresearch/
Institute for Social & Economic Research	British Household Panel Survey; longitudinal study of British lives and well- being	Quantitative	https://www.iser.essex.ac.uk/bhps
International Social Survey Programme	International data similar to GSS	Quantitative	http://www.issp.org/
The Institute for Quantitative Social Science at Harvard University	Large archive of written data, audio, and video focused on many topics	Quantitative and qualitative	http://dvn.iq.harvard.edu/dvn/dv/mra
Institute for Research on Women and Gender	Global Feminisms Project; interview transcripts and oral histories on feminism and women’s activism	Qualitative	https://globalfeminisms.umich.edu/
Oral History Office	Descriptions and links to numerous oral history archives	Qualitative	https://archives.lib.uconn.edu/islandora/ object/20002%3A19840025
UNC Wilson Library	Digitized manuscript collection from the Southern Historical Collection	Qualitative	http://dc.lib.unc.edu/ead/archivalhome.php? CISOROOT=/ead
Qualitative Data Repository	A repository of qualitative data that can be downloaded and annotated collaboratively with other researchers	Qualitative	https://qdr.syr.edu/

Date	Code Lable	Explanations
6/18/18	Experience of wellness	This code captures the different ways people describe wellness in their lives
6/22/18	Understanding of wellness	Changed the label of this code slightly to reflect that many participants emphasize the cognitive aspect of how they understand wellness—how they think about it in their lives, not only the act of 'experiencing it'. This understanding seems like a precursor to experiencing. An evolving sense of how you think about wellness in your life.
6/25/18	Wellness experienced by developing personal awareness	A broader understanding of this category is developing. It involves building a personalized understanding of what makes up wellness in each person's life and the role that they play in maintaining it. Participants have emphasized that this is a dynamic, personal and onging process of uncovering their own intimate understanding of wellness. They describe having to experiment, explore, and reflect to develop this awareness.