The problem with “gold standard” hierarchies of evidence
Four years ago I co-authored a paper on the rise of the “evidence base” (EB) in policing which drew attention to misguided attempts to replicate the “gold standard” hierarchies of evidence used in medicine, health, social care, education etc. in the context of policing (see Lumsden and Goode 2016). The paper addressed the rise of evidence-based policy and practice as a dominant discourse in policing in the UK, and the implications this has for social scientists conducting research in this area, and for police officers and staff. Our paper was intended as an exploration and conversation starter, to draw attention to the dangers and risks of a narrow focus on research which was ultimately driven by/via a government and policy-makers’ construction (see also Lumsden 2017). Other scholars have made similar observations, drawing our attention to the false premise that we should even attempt an evidence-based approach in policing (see Thacher 2001).
The development of an evidence-base in policing has largely been driven by the Maryland Scale of Scientific Methods, imported from the USA to the UK, and adopted and promoted by its proponents (i.e. Sherman), and which placed systematic reviews, RCTs, positivist scientific methods at the top layers of the hierarchy, with qualitative methods at the bottom. Evidence-based policing (EBP) was also later developed in the form of the “Evidence Based Policing Matrix” devised by Lum and colleagues (see Lum, Koper and Telep 2011): “…a research-to-practice translation tool which organizes moderate to very rigorous evaluations of police interventions visually, allowing agencies and researchers to view the field of research in this area,..” and which in that sense is more sympathetic to qualitative research.
We argued that adoption of evidence-based policing (EBP) and the related “gold standard” used to evaluate research (such as those measurable on the Maryland Scale) act as a “technology of power” (Foucault 1988) to draw boundaries (Gieryn 1983; Styhre 2011) around which methodologies and forms of knowledge are legitimate and useful for policing. We also drew attention to the risks posed to researchers entering the field of the loss of decades of seminal policing research if its utility for informing policing and criminal justice is to be judged using the “gold standard” criteria defined by the evidence-based movement more broadly.
Qualitative methods and the evidence-base
The general disregard of qualitative methods in evidence-based policy is not new, and the debate has been well trodden and rehearsed in social care, education, medicine, and health care from the 1990s onwards (i.e. see Avby, Nilsen and Dahlgren 2014; Dixon-Woods, Bonas and Booth 2006). It is worth noting that the recent College of Policing definition of evidence-based policing in the UK has widened to refer to “best available evidence from appropriate methods”, and highlights the need to have a clear theoretical basis and context of research:
“The ‘best available’ evidence will use appropriate research methods and sources for the question being asked. Research should be carefully conducted, peer reviewed and transparent about its methods, limitations, and how its conclusions were reached. The theoretical basis and context of the research should also be made clear. Where there is little or no formal research, other evidence such as professional consensus and peer review, may be regarded as the ‘best available’, if gathered and documented in a careful and transparent way.” (College of Policing 2017)
What I now want to address in this piece are the main problems concerning the use of hierarchies of evidence for judging the merits and rigour of qualitative research. Although I’m sympathetic of attempts to ensure qualitative research is included in EBP policy-making and the point that guidance is needed for policy-makers, “hierarchies of qualitative research” are misleading and do not assist policy makers in conducting a robust, fair and inclusive evaluation of qualitative studies, and in assessing whether they are “promising” enough to be included in the policing evidence-base.
First, I’ll respond to the very notion of “hierarchies of qualitative research”. Then, I’ll turn to some of the claims made in a related blog on the hierarchy (see Huey 2019) which discusses the problems with “pseudo-scientific” “junky” qualitative research.
A “Hierarchy of Qualitative Research” for policy-making
In a response to tweets concerning the above example of a “hierarchy for qualitative research” for evidence-based policing (EBP), it is claimed that “rigour” is built into the hierarchy at every step. However, the very notion of a “Hierarchy of Qualitative Research” in itself is misleading and only serves to further legitimise and dismiss certain forms of knowledge, types of qualitative research, methods, etc. It engages in boundary work (as noted above) around which particular methodologies, forms of knowledge and “voices”, are viewed as legitimate and useful for policing (Gieryn 1983). We cannot use hierarchies to judge qualitative studies.
An abbreviated (and by no means complete) list of some of the problems I see with the use of hierarchies for judging qualitative research in EBP follows:
1. Systematic reviews are at the top of the hierarchy (not surprising given it is largely following the tenets of previous hierarchies such as the Maryland Scale). Are these qualitative systematic reviews per se, or the use of qualitative data within a systematic review? Is the criteria for a “what works” standard review still determined by quantitative principles (i.e. see Booth 2001)?
2. As stated in the blog which introduces us to the model, it makes one big assumption – “that the study being used was well-designed and well-executed” but that a “new category for 0: studies that manifestly poorly designed and executed” was also added. The question then is how do policy makers know before they apply the model that the study is “poorly designed and executed” – are we assuming they have the skills to be making these judgement calls in evaluating qualitative research, before applying the model, which is then step-two of the process? If so, it is not really assisting them to evaluate qualitative research.
3. It fails to acknowledge the relationship between method and methodologies…
4. …And fails to acknowledge how methodology and methods are always tied to the research question being posed. Will the method used help answer the question/s posed? MORE methods (i.e. “quandrangled studies”) does not ensure quality – as methods are ultimately tied to the research question and aims. As Murphy et al. write in their guidance on using qualitative methods in health technology assessment (HTA):
“The goal of all research in HTA should be to establish knowledge about which we can be reasonably confident, and to provide findings that are relevant to policy makers and practitioners. Therefore, decisions about whether qualitative or quantitative methods (or a combination of both) are most appropriate to a particular research problem should be made on the basis of which approach is likely to answer the question most effectively and efficiently.” (1998: iii, author’s emphasis)
5. Quantity over quality: The model gives the impression that more methods are better – i.e. “quadrangled studies” are “very promising” using “more 4 or more different methods or data sources” which could include interviews, focus groups, field observations, media analysis. (Also see point 10 below).
6. “Mixed methods” studies can involve using various forms of qualitative methods, as well as quant/qual mixed methods studies. That said, “very promising” is also mixed methods, even though it only consists of qualitative methods, whereas mixed methods are only “what’s promising”.
7. It doesn’t acknowledge how different qualitative methods also have specific standards which guide their design and use, i.e. focus groups versus participant observation (as well as the general standards for qualitative research).
8. Context is key: Both in qualitative research, and in the study of policing and crime. The latter is not the same as i.e. health-based EBP (although the latter has its own issues concerning context and individual differences in the design and implementation of various initiatives).
9. Case studies are situated at the bottom of the hierarchy alongside “anecdotes” and “expert opinion” when in fact case studies themselves vary widely, and can involve use of multiple methods (qual and/or quant) to explore research question/s (i.e. see use of clinical case studies in nursing, health care).
10. It uses the standards of quantitative research to judge qualitative research (also see point 5 above). The overall message, whether intended or not, is the more qualitative methods the better, the more interviewees the better, etc. as you will then be able to “generalise” (when in fact generalisation is not necessarily the main aim of qualitative studies). (For a discussion of the number of participants in qualitative studies see the insightful report “How many interviews is enough?” (Baker and Edwards 2013)).
In sum: undoubtedly, models such as this will appeal to some police and policy-makers who are looking for a “quick fix”, but it does not help them to evaluate “good” versus “bad” qualitative research: this is more complex, as reflected in for example the various check-lists and tools which are used in health and medicine to evaluate qualitative studies. It will (like EBP models before it) result in the privileging of certain (qualitative) methods, types of research, and therefore forms of knowledge, at the expense of others. Ann Oakley wrote about the need to “dissolve the methods” war in relation to education and the evidence-base in 2002 arguing that:
“A main danger ahead is that large areas of research, traditionally important in education, escape the evidence net either because no one can reach any consensus about how to sort out the reliable from the unreliable, or (which might be worse) because a new orthodoxy sets in according to which qualitative research is simply a world apart—nothing to do with evidence at all.” (Oakley 2002: 284)
I’ve been training non-academic groups and professionals in qualitative methods for the past five years (including police officers, NGOs, policy makers, government, local authorities, health care, charities etc.) and what is clear is that there is a desire to learn the intricacies of qualitative methods so they can evaluate research studies and strengthen their own in-house work. There is also a need for training on how to evaluate qualitative research. Crucially, this includes training in how we should not use the principles of quantitative research to judge qualitative research. These hierarchies do not adequately equip policy-makers to be able to assess the quality of qualitative research.
“Hierarchies of qualitative research” also risk reifying dominant discourses of the evidence-base in policing. There is a risk that it creates expectations that police-researchers, “pracademics”, and academics must mould their research so that it “fits” the model to be i.e. “very promising” – so that it will then be used to build the evidence-base – rather than being robustly linked to questions of epistemology, ontology, methodology and the research question they wish to answer. For example, in the UK the Research Excellence Framework (REF) and proposed Knowledge Excellence Framework (KEF), put pressure on researchers to demonstrate the “usefulness” of their work and EBP aligns with this impetus (Lumsden and Goode 2018).
Claims of “pseudoscientific” “junky qualitative research”
This also relates to a point made in the blog launching the hierarchy which describes an instance of what the author calls “pseudoscientific” and “junky qualitative research”. The argument is that the authors of one particular study which is selected as an example of poor quality research made grander claims than was appropriate from a small sample of only 13 participants, including making national policy recommendations in relation to criminal justice policy. (There is no reference to the particular report in the blog). I agree with the more general point made here about not making national policy recommendations from such a small study (and the other criticisms), however, the hierarchy does not help us address these issues. For example:
1. In the blog it is claimed that there has been “deep resistance among qualitative researchers to the idea of trying to set standards for their work”. Perhaps this is the case in Canada and the US where this model originates, but it hasn’t been my experience. There are multiple examples of ongoing work and discussion in the social sciences, psychology etc., regarding the need to improve transparency and set standards. One example is the influential work of Braun and Clarke in relation to thematic analysis in New Zealand/UK, their particular approach which they call “reflexive thematic analysis”, and their related calls for qualitative researchers to be more transparent in how they have analysed data. This also includes on their website “guidelines for reviewers and editors evaluating thematic analysis transcripts”. The point here is that qualitative research does not occur in a vacuum where “anything goes”.
2. There is a whole continuum of qualitative research styles and inquiry, with i.e. arts-based research inquiries siting at one end of the qualitative continuum. Also, some arts-based projects and inquiries have had impact in relation to policy-making in i.e. the UK, and policy-researchers are also increasingly using these methods. For example, life history and narrative approaches might include only a small number of participants, but for good reasons, which are related to a host of factors including different disciplines, methodologies and philosophies of social science, etc.
3. A lack of training in qualitative methods for graduate students. This is one point that I do agree with in terms of the need for training in both methodologies and methods. I’d also add to that training in the politics of the evidence-base, evaluation methods, and awareness of the impact agenda.
Researchers have a responsibility not to make grand claims from their research and we need standards for judging qualitative research. However, the hierarchy does not address these issues. It only exacerbates them. We can’t solve these issues by privileging studies which use “more” methods. The issue is more complex. Hierarchies of qualitative research, like those before them, zone in on what their creators see as the low standards in the field (cf Lumsden and Goode 2016) and risk “disciplining” qualitative research(ers) (Denzin, Lincoln and Giardina 2006).
We might also look to, and learn from, the debates and work previously conducted in fields such as health and medical research in terms of evaluating qualitative research for policy and judging what is “good qualitative research’” As Barbour (2001) writes in relation to “qualitative research checklists” in medical research: “Reducing qualitative research to a list of technical procedures … is overly prescriptive and results in ‘the tail wagging the dog’”. These checklists “can strengthen the rigour of qualitative research only if embedded in a broader understanding of qualitative research design and data analysis” (2001: 322, author’s emphasis).
Therefore, any practitioner-focused framework which aims to assess the rigour of qualitative research must attempt to be inclusive of a whole host of epistemological and ontological standpoints, related methodologies and methods. Transparency in how we conducted our research is key, but evidence-based policing also needs to be inclusive rather than exclusive, and not kick qualitative research in the teeth.
Avby G, Nilsen P and Dahlgren MA (2014) Ways of understanding evidence-based practice in social work: A qualitative study. British Journal of Social Work 44: 1366–1383.
Baker SE and Edwards R (2013) How many qualitative interviews Is enough? NCRM Review Paper. Accessed July 2019: http://eprints.ncrm.ac.uk/2273/4/how_many_interviews.pdf
Barbour RS (2001) Checklists for improving rigour in qualitative research: a case of the tail wagging the dog? British Medical Journal 322: 115–117.
Denzin NK, Lincoln YS and Giardina MD (2006) Disciplining qualitative research. International Journal of Qualitative Studies in Education 19(6): 769–782.
Dixon-Woods M, Bonas S, Booth A, et al. (2006) How can systematic reviews incorporate qualitative research? A critical perspective. Qualitative Research 6(1): 27–44.
Foucault M (1988) Technologies of the self. In: Martin LH, Gutman H and Hutton PH (eds) Technologies of the Self. Amherst, MA: University of Massachusetts Press, 16–49.
Gieryn TF (1983) Boundary-work and the demarcation of science from non-science: strains and interests in professional ideologies of scientists. American Sociological Review 48(6): 781–795.
Huey L (2019) If we’re going to use qualitative research for public policy, then we need better standards. 22 July 2019. Accessed July 2019: https://www.lhuey.net/single-post/2019/07/22/If-We’re-Going-to-Use-Qualitative-Research-for-Public-Policy-then-We-Need-to-Better-Standards
Lum C, Koper CS and Telep CW (2011) The evidence-based policing matrix. Journal of Experimental Criminology 7(1): 3–26.
Lumsden K (2016) Police officer and civilian staff receptivity to research and evidence-based policing in England: providing a contextual understanding through qualitative interviews. Policing: a Journal of Policy and Practice 11(2): 157-167.
Lumsden K and Goode JH (2016) Policing research and the rise of the evidence-base: police officer and staff understandings of research, its implementation and “what works”. Sociology 52(4): 813-829.
Lumsden K and Goode JE (2018) Public criminology, reflexivity and the enterprise university: experiences of research, knowledge transfer work and co-option with police forces. Theoretical Criminology 22(2): 243-257.
Murphy E, Dingwall R, Greatbatch D, Parker S, Watson P (1998) Qualitative research methods in health technology assessment: a review of the literature. Health Technology Assessment 1998: 2(16).
Oakley A (2002) Social science and evidence-based everything: the case of education. Educational Review 54(3): 277–286.
Styhre A (2011) Knowledge Sharing in Professions. Surrey: Gower.
Thacher D (2001) Policing is not a treatment. Journal of Research in Crime and Delinquency 38(4): 387-415.
Dr Karen Lumsden is based at the University of Nottingham in the UK. She is a sociologist and criminologist with expertise in qualitative research methods and applied research with a range of audiences including police constabularies and victim organisations. She has held posts at the University of Leicester, Loughborough University, the University of Abertay Dundee, and the University of Aberdeen. Karen has a PhD in Sociology, Masters in Social Research, MA in Sociology, and PGCE in Higher Education Learning & Teaching, all from the University of Aberdeen. Karen has experience of teaching qualitative research methods at postgraduate level and to academics and practitioners via the Social Research Association, her own consultancy (The Qualitative Researcher), and at international summer schools and ESRC doctoral training centres. She is the author of over 40 publications including four books, is on the Editorial Board of the journal Sociology, and is currently the Chair of the Editorial Board of Sociological Research Online. She tweets at @karenlumsden2