One of the questions that comes up regularly in training courses on qualitative methods is how we should assess the quality of a qualitative study. At some point in their research career, qualitative researchers will inevitably experience the ‘apples versus oranges’ phenomenon, whereby our qualitative research is evaluated based on quantitative principles and criteria, instead of qualitative principles. The quality standards used in quantitative research do not directly translate to qualitative studies.
Should We Use Standardized Criteria to Evaluate Qualitative Research?
Over the years, many qualitative scholars have proposed frameworks and criteria for assessing qualitative research (see Guba and Lincoln 1989; Lather 1993; Schwandt 1996; Bochner 2000; Ritchie et al. 2003; Tracy 2010; Altheide and Johnson 2011). Some have also argued that standardized criteria in are unhelpful in qualitative inquiry (i.e. see Schwandt 1996; Altheide and Johnson 2011). For example, Bochner (2000) argues that ‘traditional empiricist criteria’ are ‘unhelpful’ when applied to new ethnographic approaches (cited in Tracy 2010: 838). As Altheide and Johnson (2011: 582) argue:
“There are many ways to use, practice, promote, and claim qualitative research, and in each there is a proposed or claimed relationship between some field of human experience, a form of representation, and an audience. Researchers and scholars in each of these areas have been grappling with issues of truth, validity, verisimilitude, credibility, trustworthiness, dependability, confirmability, and so on. What is valid for clinical studies or policy studies may not be adequate or relevant for ethnography or autoethnography or performance ethnography.”
Qualitative research is conducted within different research paradigms, which complicates the assessment of the quality of a particular study.
As Tracy (2010) notes, many of these critiques result in the development of new quality standards and criteria for evaluating qualitative inquiry which are seen as more flexible than quantitative standard and of more sensitive to the context bound nature of qualitative research. Below, we explore the main criteria proposed for assessing qualitative research:
Criteria for Assessing Qualitative Research
In the 1980s, Guba and Lincoln (1989 see also Krefting 1991) developed criteria which can be used to determine rigor in a qualitative inquiry. Instead of ‘rigor’, they focus on the development of trustworthiness in qualitative inquiry through determining: credibility, transferability, reliability and confirmability.
|Qualitative criteria||Quantitative criteria|
|Transferability||External validity or generalizability|
Credibility asks us to consider if the research findings are plausible and convincing. Questions to consider include:
- How well does the study capture and portray the world it is trying to describe?
- How well backed up are the claims made by the research?
- What is the evidential base for the research?
- How plausible are the findings?
As Stenfors et al. (2020) point out, there should be alignment between ‘theory, research question, data collection, analysis and results’ while the ‘sampling strategy, the depth and volume of data, and the analytical steps taken’ must be appropriate within that framework.
Here, we are interested in how clear the basis is for drawing wider inference (Ritchie et al. 2003) from our study. Can the findings of our study be transferred to another group, context or setting?
As Ritchie et al. (2003) argue, the findings of qualitative research can be generalized but the framework within which this can occur needs greater clarification. Instead, we refer to the transferability of findings in a qualitative study. For example, in an empirical sense: can findings from qualitative research studies be applied to populations or settings beyond the particular sample of the study? We can also explore the generation of theoretical concepts or propositions which are deemed to be of wider, or universal, application from a qualitative study.
When attempting to extrapolate from a qualitative study we should be conscious that meanings and behaviours are context bound. Therefore extrapolation may be possible if offered as a working hypothesis to help us to make sense of findings in other contexts.
Questions to consider include:
- Sample coverage: did the sample frame contain any known bias; were the criteria used for selection inclusive of the constituencies thought to be of importance?
- Capture of the phenomena: was the environment, quality of questioning effective for participants to fully express their views?
- Identification or labelling: have the phenomena been identified, categorised and named in ways that reflect the meanings assigned by participants?
- Interpretation: is there sufficient internal evidence for the explanatory accounts that have been developed?
- Display: have the findings been portrayed in a way that remains true to the original data and allows others to see the analytic constructions which have occurred? (see Ritchie et al. 2003)
Dependability is ‘the extent to which the research could be replicated in similar conditions’ (Stenfors et al. 2020). The researcher should have provided enough information on the design and conduct of their study that another researcher could follow these and take the same steps in their study. Given the context specific nature of qualitative research, it can be difficult to demonstrate which features of the qualitative data should be expected to be consistent, dependable or reliable.
Questions to consider for reliability include:
- Was the sample design/selection without bias, ‘symbolically’ representative of the target population, comprehensive of all known constituencies; was there any known feature of non-response or attrition within the sample?
- Was the fieldwork carried out consistently, did it allow respondents sufficient opportunities to cover relevant ground, to portray their experiences?
- Was the analysis carried out systematically and comprehensively, were classifications, typologies confirmed by multiple assessment?
- Is the interpretation well supported by the evidence?
- Did the design/conduct allow equal opportunity for all perspectives to be identified or were there features that led to selective, or missing, coverage? (see Ritchie et al. 2003).
Here, we are looking for a clear link between the data and the findings. For example, researchers should evidence their claims with the use of quotes/excerpts of data. Qualitative researchers should avoid the temptation to quantify findings with claims such as ‘70% of participants felt that xxx…’ It is also important in the Discussion to demonstrate how the research findings relate to the wider body of literature and to answer the research question. Any limitations of the study should also be flagged up.
Stenfors et al. (2020) draw attention to reflexivity as another important criteria in assessing qualitative inquiry. For Guba and Lincoln (1989) the reflexive journal is a further means of helping to assess qualitative inquiry. A reflexive approach helps us to be aware of the social, ethical and political impact of our research, the central, fluid and changing nature/s of power relations (with participants, gatekeepers, research funders, etc.) and our relationships with the researched (Lumsden 2019).
We can ask whether the researcher has stepped back and critically reflected on their role in the research process, their relationships with the researched, and their social position? It should be clear how reflexivity has been embedded in the research process (Stenfors et al. 2020). As Altheide and Johnson (2011: 581) write:
‘Good qualitative research—and particularly ethnographies—shows the hand of the ethnographer. The effort may not always be successful, but there should be clear “tracks” that the attempt has been made.’
Additional Criteria: Ethics
Tracy (2010) also provides a useful overview of 8 key criteria for excellent qualitative research: worthy topic, rich rigor, sincerity, credibility, resonance, significant contribution, ethical, meaningful coherence (p.840). There is overlap with the above criteria and some elements could be said to be already subsumed in the above discussion, therefore I will not delve into them all here. However, it is important to draw attention to ethical considerations in qualitative studies. As Tracy notes, the research should consider:
- Procedural ethics (such as human subjects);
- Situational and culturally specific ethics;
- Relational ethics;
- Exiting ethics (leaving the scene and sharing the research) (see Tracy 2010: 840).
Strategies for Determining Trustworthiness (Rigor)
The strategies adopted in order to determine the trustworthiness of a qualitative study depend on a variety of factors including: research paradigm, the specifics of each research design, the research methods utilised (i.e. interviews, ethnography, observation, focus groups, creative methods, visual methods, secondary data analysis, narratives etc.) and the type of qualitative analysis being conducted.
Moore (2015) provides a useful evaluation of the use of various strategies for ensuring rigor in qualitative studies. Strategies which she evaluates as typically used in attempts to ensure validity and reliability include:
- Prolonged engagement in ethnographic research via time spent in the field to reduce researcher effect;
- Prolonged observation in ethnographic research reduces researcher effect;
- Thick description;
- Development of a coding system and inter-rater reliability in semi-structured interviews;
- Researcher bias;
- Negative case analysis;
- Peer review debriefing (in team research);
- Member checks;
- External audits (viewed as problematic and not routinely used) (see pages 1217-1220).
She provides a useful evaluation of the appropriateness and success of these strategies for ensuring rigor, for those who wish to explore this further. Interestingly, through her critique of these strategies, Moore also suggests that ‘qualitative researchers return to the terminology of social sciences, using rigor, reliability, validity, and generalizability’ (p.1212) instead of those proposed in the 1980s by Guba and Lincoln (1989).
Awareness of the criteria used when assessing the quality of qualitative research is key for anyone conducting qualitative research. As we have seen these criteria typically include: trustworthiness, credibility, transferability, dependability, confirmability, reflexivity and ethics.
However the strategies which each researcher adopts in order to ensure the trustworthiness (rigor) of their study, will depend on a variety of factors specific to each qualitative research project including the research method they adopt and the research paradigm. As Moore (2019: 1219) writes: ‘…rigor, comprising both validity and reliability, is achieved primarily by researchers in the process of data collection and analysis’. In addition, the assessment criteria which are valid when assessing fields such as clinical studies may not be relevant for those working in areas such as ethnography or narrative studies (see Altheide and Johnson 2011). There is no easy route or ‘one size fits all’ approach for assessing the quality of qualitative research, but the above criteria give us a good starting point which we can refer to when designing and conducting our qualitative inquiries.
References and further reading
Altheide, D.L. and Johnson, J.M. (2011) ‘Reflections on Interpretive Adequacy in Qualitative Research.’ In N.K. Denzin and Y.S. Lincoln (eds) Handbook of Qualitative Research, Fifth Edition (pp. 581-594). London: Sage.
Bochner, A. (2000) ‘Criteria Against Ourselves.’ Qualitative Inquiry, 6: 266-272.
Braun, V. and Clarke, V. (2013) Successful Qualitative Research. London: Sage.
Guba, E. and Lincoln, Y. (1989) Fourth Generation Evaluation. Newbury Park, CA: Sage.
Krefting, L. (1991) ‘Rigor in Qualitative Research: The Assessment of Trustworthiness.’ American Journal of Occupational Therapy, 45: 214–222.
Lather, P. (1993) ‘Fertile Obsession: Validity after Poststructuralism.’ Sociological Quarterly, 34: 673-693.
Lingard L. (2015) ‘Joining a Conversation: The Problem/Gap/Hook Heuristic.’ Perspectives on Medical Education, 4(5): 252–253.
Lumsden, K. (2019) Reflexivity: Theory, Method and Practice. London: Routledge.
Morse, J.M. (2015) ‘Critical Analysis of Strategies for Determining Rigor in Qualitative Inquiry.’ Qualitative Health Research, 25(9): 1212-1222.
Schwandt, T.A. (1996) ‘Farewell to Criteriology.’ Qualitative Inquiry, 2: 58-72.
Spencer, L., Ritchie, J., Lewis, J., and Dillon, L. (2003) Quality in Qualitative Evaluation: A Framework for Assessing Research Evidence, GCSRO. Available at: www.policyhub.gov.uk/publications
Stenfors, T., Kajamaa, A. and Bennett, D. (2020) ‘How to… Assess the Quality of Qualitative Research.’ The Clinical Teacher, https://doi.org/10.1111/tct.13242
Tracy, S.J. (2010) ‘Qualitative Quality: Eight “Big-Tent” Criteria for Excellent Qualitative Research.’ Qualitative Inquiry, 16: 837–851.