10.8 Ethical and social justice considerations

Learning Objectives

Learners will be able to…

  • Identify potential cultural, ethical, and social justice issues in measurement

With your variables operationalized, it’s time to take a step back and look at how measurement in social science affects our daily lives. As we will see, how we measure things is both shaped by power arrangements inside our society, and more insidiously, by establishing what is scientifically true, measures have their own power to influence the world. Just like reification in the conceptual world, how we operationally define concepts can reinforce or fight against oppressive forces.

Flawed measures & missing measures

 

===HERE’S A START ON GETTING RID OF THE TEACHING EVALUATION SECTION===

In India up to 2017, government mental health institutions and mental hospitals served as a long term residence facility for persons with psychosocial disabilities, who had been abandoned by their families. The institutions were restrictive with respect to freedom of movement and food choices. An NGO working with the residents wanted to study the change their interventions caused throughout 5 years and implemented a Quality of Life Survey which has been administered in other countries such as USA and UK with different hospital structures. The questionnaire had questions such as “how do you rate your freedom to choose your own food?” and “how far can you travel independently?” which would not be applicable in the Indian setting. However, the NGO implemented the questionnaire unaltered, due to which the tool could not accurately measure the changes the NGO aimed to measure. This puts the validity of the tool in question.

=========

 

 

[REPLACE TEACHING EVALUATION DISCUSSION WITH EXAMPLE RELATED TO SOCIAL WORK RESEARCH]

At the end of every semester, students in just about every university classroom in the United States complete similar student evaluations of teaching (SETs). Since every student is likely familiar with these, we can recognize many of the concepts we discussed in the previous sections. There are number of rating scale questions that ask you to rate the professor, class, and teaching effectiveness on a scale of 1-5. Scores are averaged across students and used to determine the quality of teaching delivered by the faculty member. SETs scores are often a principle component of how faculty are reappointed to teaching positions. Would it surprise you to learn that student evaluations of teaching are of questionable quality? If your instructors are assessed with a biased or incomplete measure, how might that impact your education?

Most often, student scores are averaged across questions and reported as a final average. This average is used as one factor, often the most important factor, in a faculty member’s reappointment to teaching roles. We learned in this chapter that rating scales are ordinal, not interval or ratio, and the data are categories not numbers. Although rating scales use a familiar 1-5 scale, the numbers 1, 2, 3, 4, & 5 are really just helpful labels for categories like “excellent” or “strongly agree.” If we relabeled these categories as letters (A-E) rather than as numbers (1-5), how would you average them?

Averaging ordinal data is methodologically dubious, as the numbers are merely a useful convention. Taking the median value is what makes the most sense with ordinal data. Median values are also less sensitive to outliers. So, a single student who has strong negative or positive feelings towards the professor could bias the class’s SETs scores higher or lower than what the “average” student in the class would say, particularly for classes with few students or in which fewer students completed evaluations of their teachers.

We care about teaching quality because more effective teachers will produce more knowledgeable and capable students. However, student evaluations of teaching are not particularly good indicators of teaching quality and are not associated with the independently measured learning gains of students (i.e., test scores, final grades) (Uttl et al., 2017).[1] This speaks to the lack of criterion validity. Higher teaching quality should be associated with better learning outcomes for students, but across multiple studies stretching back years, there is no association that cannot be better explained by other factors. To be fair, there are scholars who find that SETs are valid and reliable (tor a thorough defense of SETs as well as a historical summary of the literature see Benton & Cashin, 2012).[2]

Even though student evaluations of teaching often contain dozens of questions, researchers often find that the questions are so highly interrelated that one concept (or factor, as it is called in a factor analysis) explains a large portion of the variance in teachers’ scores on student evaluations (Clayson, 2018).[3] Personally, I believe based on completing SETs myself that factor is probably best conceptualized as student satisfaction, which is obviously worthwhile to measure, but is conceptually quite different from teaching effectiveness or whether a course achieved its intended outcomes. The lack of a clear operational and conceptual definition for the variable or variables being measured in student evaluations of teaching also speaks to a lack of content validity. Researchers check content validity by comparing the measurement method with the conceptual definition, but without a clear conceptual definition of the concept measured by student evaluations of teaching, it’s not clear how we can know our measure is valid. Indeed, the lack of clarity around what is being measured in teaching evaluations impairs students’ ability to provide reliable and valid evaluations. So, while many researchers argue that the class average SETs scores are reliable in that they are consistent over time and across classes, it is unclear what exactly is being measured even if it is consistent (Clayson, 2018).[4]

As a faculty member, there are a number of things I can do to influence my evaluations and disrupt validity and reliability. Since SETs scores are associated with the grades students perceive they will receive (e.g., Boring et al., 2016),[5] guaranteeing everyone a final grade of A in my class will likely increase my SETs scores and my chances at tenure and promotion. I could time an email reminder to complete SETs with releasing high grades for a major assignment to boost my evaluation scores. On the other hand, student evaluations might be coincidentally timed with poor grades or difficult assignments that will bias student evaluations downward. Students may also infer I am manipulating them and give me lower SET scores as a result. To maximize my SET scores and chances and promotion, I also need to select which courses I teach carefully. Classes that are more quantitatively oriented generally receive lower ratings than more qualitative and humanities-driven classes, which makes my decision to teach social work research a poor strategy (Uttl & Smibert, 2017).[6] The only manipulative strategy I will admit to using is bringing food (usually cookies or donuts) to class during the period in which students are completing evaluations. Measurement is impacted by context.

As a white cis-gender male educator, I am adversely impacted by SETs because of their sketchy validity, reliability, and methodology. The other flaws with student evaluations actually help me while disadvantaging teachers from oppressed groups. Heffernan (2021)[7] provides a comprehensive overview of the sexism, racism, ableism, and prejudice baked into student evaluations:

“In all studies relating to gender, the analyses indicate that the highest scores are awarded in subjects filled with young, white, male students being taught by white English first language speaking, able-bodied, male academics who are neither too young nor too old (approx. 35–50 years of age), and who the students believe are heterosexual. Most deviations from this scenario in terms of student and academic demographics equates to lower SET scores. These studies thus highlight that white, able-bodied, heterosexual, men of a certain age are not only the least affected, they benefit from the practice. When every demographic group who does not fit this image is significantly disadvantaged by SETs, these processes serve to further enhance the position of the already privileged” (p. 5).

The staggering consistency of studies examining prejudice in SETs has led to some rather superficial reforms like including reminders not to submit racist or sexist responses in the SET’s written instructions. Yet, even though we know that SETs are systematically biased against people from historically marginalized or oppressed groups, the overwhelming majority of universities in the United States continue to use them to evaluate faculty for promotion or reappointment. From a critical perspective, it is worth considering why university administrators continue to use such a biased and flawed instrument. SETs produce data that make it easy to compare faculty to one another and track faculty members over time. Furthermore, they offer students a direct opportunity to voice their concerns and highlight what went well.

As the people with the greatest knowledge about what happened in the classroom as whether it met their expectations, providing students with open-ended questions is the most productive part of SETs. Personally, I have found focus groups written, facilitated, and analyzed by student researchers to be more insightful than SETs. Student activists and leaders may look for ways to evaluate faculty that are more methodologically sound and less systematically biased, creating institutional change by replacing or augmenting traditional SETs in their department. There is very rarely student input on the criteria and methodology for teaching evaluations, yet students are the most impacted by helpful or harmful teaching practices.

Students should fight for better assessment in the classroom because well-designed assessments provide documentation to support more effective teaching practices and discourage unhelpful or discriminatory practices. Flawed assessments like SETs, can lead to a lack of information about problems with courses, instructors, or other aspects of the program. Think critically about what data your program uses to gauge its effectiveness. How might you introduce areas of student concern into how your program evaluates itself? Are there issues with food or housing insecurity, mentorship of nontraditional and first generation students, or other issues that faculty should consider when they evaluate their program? Finally, as you transition into practice, think about how your agency measures its impact and how it privileges or excludes client and community voices in the assessment process.

Let’s consider an example from intervention research. Let’s say you are doing research with a mental health organization that serves youth impacted by community violence. How should you measure the impact of the services on the clients and their community? Schools may be interested in reducing truancy, self-injury, or other behavioral concerns. However, by centering delinquent behaviors in how we measure our impact, we may be inattentive to the role of trauma, family dynamics, and other cognitive and social processes beyond “delinquent behavior.” Indeed, we may bias the interventions by focusing on things that are not as important to clients’ needs. Social workers want to make sure their programs are improving over time, and we rely on our measures to indicate what to change and what to keep. If our measures present a partial or flawed view, we lose our ability to establish and act on scientific truths.

Let’s take for example, the recent findings on the biases in ASWB licensing exams. Here is a statement on the subject written by members of NASW Illinois (link). Head on over and take a look!

You may also be interested in similar arguments against the standard grading scale (A-F), and why grades (numerical, letter, etc.) do not do a good job of measuring learning. Think critically about the role that grades play in your life as a student, your self-concept, and your relationships with teachers. Your test and grade anxiety is due in part to how your learning is measured. Those measurements end up becoming an official record of your scholarship and allow employers or funders to compare you to other scholars. The stakes for measurement are the same for participants in your research study.

 

Self-reflection and measurement

Student evaluations of teaching are just like any other measure. How we decide to measure what we are researching is influenced by our backgrounds, including our culture, implicit biases, and individual experiences. The decisions that researchers make about measurement often default to ones that make the most sense to them and others like them.

This doesn’t mean that standardized scales or indices, for instance, won’t work for diverse groups of people. What it means is that researchers must not ignore difference in deciding how to measure a variable in their research. Doing so may serve to push already marginalized people further into the margins of academic research and, consequently, social work intervention. Social work researchers, with our strong orientation toward celebrating difference and working for social justice, are obligated to keep this in mind for ourselves and encourage others to think about it in their research, too.

This involves reflecting on what we are measuring, how we are measuring, and why we are measuring. Do we have biases that impacted how we operationalized our concepts? Did we include stakeholders and gatekeepers in the development of our concepts? This can be a way to gain access to vulnerable populations. What feedback did we receive on our measurement process and how was it incorporated into our work? These are all questions we should ask as we are thinking about measurement. Further, engaging in this intentionally reflective process will help us maximize the chances that our measurement will be accurate and as free from bias as possible.

The NASW Code of Ethics discusses social work research and the importance of engaging in practices that do not harm participants. This is especially important considering that many of the topics studied by social workers are those that are disproportionately experienced by marginalized and oppressed populations. Some of these populations have had negative experiences with the research process: historically, their stories have been viewed through lenses that reinforced the dominant culture’s standpoint. Thus, when thinking about measurement in research projects, we must remember that the way in which concepts or constructs are measured will impact how marginalized or oppressed persons are viewed. It is important that social work researchers examine current tools to ensure appropriateness for their population(s). Sometimes this may require researchers to use existing tools. Other times, this may require researchers to adapt existing measures or develop completely new measures in collaboration with community stakeholders. In summary, the measurement protocols selected should be tailored and attentive to the experiences of the communities to be studied.

Unfortunately, social science researchers do not do a great job of sharing their measures in a way that allows social work practitioners and administrators to use them to evaluate the impact of interventions and programs on clients. Few scales are published under an open copyright license that allows other people to view it for free and share it with others. Instead, the best way to find a scale mentioned in an article is often to simply search for it in Google with “.pdf” or “.docx” in the query to see if someone posted a copy online (usually in violation of copyright law). As we discussed in Chapter 4, this is an issue of information privilege, or the structuring impact of oppression and discrimination on groups’ access to and use of scholarly information. As a student at a university with a research library, you may be able access the Mental Measurement Yearbook to look up scales and indexes that measure client or program outcomes while researchers unaffiliated with university libraries or libraries with limited resources may not be able to do so. Similarly, the vast majority of scholarship in social work and allied disciplines does not share measures, data, or other research materials openly, a best practice in open and collaborative science. It is important to underscore these structural barriers to using valid and reliable scales in social work practice and research. An invalid or unreliable outcome measure may cause ineffective or harmful programs to persist or may worsen existing prejudices and oppression experienced by clients, communities, and practitioners.

But it’s not just about reflecting and identifying problems and biases in our measurement, operationalization, and conceptualization—what are we going to do about it? While oppressive and problematic research has already been disseminated, much of it published before your time, there are things you can do about this as a doctoral student at this moment. Continually educate yourself on topics around equity. Listen to, read about, and amplify voices of historically oppressed communities. Decenter your own privileged identities to critically read and analyze research. While these steps may add an extra layer to add to your already full study schedule, they are important developmental tools to adequately and effectively incorporates equity practices into your work as a scholar. Doctoral students hold power, privilege, and responsibility which they can employ to ensure their work does not add to the oppressive research out there, but that it serves to deconstruct it.

A place to start: Stop oversimplifying the human experience

[race/gender/other debates/evolution of measures]

Race

One way to get started in bringing cultural awareness to scientific measurement is through a critical examination of how we analyze race quantitatively. There are many important methodological objections to how we measure the impact of race. We encourage you to watch Dr. Alyasah Ali Sewell’s three-part workshop series called “Nested Models for Critical Studies of Race & Racism” for the Inter-university Consortium for Political and Social Research (ICPSR). They discuss how to operationalize and measure inequality, racism, and intersectionality and critique researchers’ attempts to oversimplify or overlook racism when we measure concepts in social science. If you are interested in developing your social work research skills further, consider applying for financial support from your university to attend an ICPSR summer seminar like Dr. Sewell’s where you can receive more advanced and specialized training in using research for social change.

Another recommendation is to  consider using the 2015 guidance from the U.S. Census Bureau when measuring race and ethnicity in survey research.  The items on the questionnaire allow respondents to report multiple racial and ethnic identities and provide options for a wide range of nationalities and ethnicities.

Key Takeaways

  • Social work researchers must be attentive to personal and institutional biases in the measurement process that affect marginalized groups.
  • What is measured and how it is measured is shaped by power, and social workers must be critical and self-reflective in their research projects.

Post-awareness check (Knowledge)

How has the new information presented in this chapter impacted your conceptual definition(s) of key terms in your study?

Exercises

TRACK 1 (IF YOU ARE CREATING A RESEARCH PROPOSAL FOR THIS CLASS):

Think about your current research question and the tool(s) that you will use to gather data. Even if you haven’t chosen your tools yet, think of some that you have encountered in the literature so far.

  • How does your positionality and experience shape what variables you are choosing to measure and how you measure them?
  • Evaluate the measures in your study for potential biases.
  • If you are using measures developed by another researcher, investigate whether it is valid and reliable in other studies across cultures.

TRACK 2 (IF YOU AREN’T CREATING A RESEARCH PROPOSAL FOR THIS CLASS): 

You are interested in studying older adults’ social-emotional well-being. Specifically, you would like to research the impact on levels of older adult loneliness of an intervention that pairs older adults living in assisted living communities with university student volunteers for a weekly conversation.

Think about your working research question and the tool you previously identified that you will use to gather data.

  • Evaluate this measure for potential biases.
  • Investigate whether this measure is valid and reliable in other studies across cultures.

  1. Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty's teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation54, 22-42.
  2. Benton, S. L., & Cashin, W. E. (2014). Student ratings of instruction in college and university courses. In Higher education: Handbook of theory and research (pp. 279-326). Springer, Dordrecht.
  3. Clayson, D. E. (2018). Student evaluation of teaching and matters of reliability. Assessment & Evaluation in Higher Education43(4), 666-681.
  4. Clayson, D. E. (2018). Student evaluation of teaching and matters of reliability. Assessment & Evaluation in Higher Education43(4), 666-681.
  5. Boring, A., Ottoboni, K., & Stark, P. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness. ScienceOpen Research.
  6. Uttl, B., & Smibert, D. (2017). Student evaluations of teaching: teaching quantitative courses can be hazardous to one’s career. Peer Journal5, e3299.
  7. Heffernan, T. (2021). Sexism, racism, prejudice, and bias: a literature review and synthesis of research surrounding student evaluations of courses and teaching. Assessment & Evaluation in Higher Education, 1-11.
definition

License

Doctoral Research Methods in Social Work Copyright © by Mavs Open Press. All Rights Reserved.

Share This Book