Virtual School Meanderings

April 28, 2022

Article – After the Gold Rush: Questioning the “Gold Standard” and Reappraising the Status of Experiment and Randomized Controlled Trials in Education

Filed under: virtual school — Michael K. Barbour @ 8:34 am
Tags: , , , , ,

We often hear about this notion of randomized controlled trials and other bench science-based research models as being the gold standard for education research.  In fact, in order for the federal Government to promote things or to be able to add things to the What Works Clearinghouse it need to be able to meet one of these “standards.”  And readers of this space will know that I have been critical of this “gold standard.”

Recently I came across this article that I thought was useful to pass along to readers.

After the Gold Rush: Questioning the “Gold Standard” and Reappraising the Status of Experiment and Randomized Controlled Trials in Education 

Harvard Educational Review (2016) 86 (3): 390–411.

The past few years have seen a resurgence of faith in experimentation in education inquiry, and particularly in randomized controlled trials (RCTs). Proponents of such research have succeeded in bringing into common parlance the term gold standard, which suggests that research emerging from any other design frame fails to achieve the rigor or significance of RCT-based research in answering causal questions and cannot reliably tell us “what works.” In this article, Gary Thomas questions the reasoning behind this conclusion, resting his argument on the theory and practice of experimentation in education and on the limitations of RCTs in particular. He suggests that the arguments about the power of particular kinds of experiment reside in inappropriate ideas about generalization and induction and, indeed, what a scientific experiment needs to look like. Drawing from examples of systematic inquiry in education and other fields, Thomas argues for a restoration of respect for the heterogeneity of education inquiry.

Unfortunately the article is paywalled, but crafty folks might use a service like Sci-Hub to access this (if you don’t have an institutional library to lean on).  But I did want to share some of the quotes – including the first two paragraphs of the article that I think are particularly instructive.

Thirty-five years ago, statistician Gene Glass said, after his major government sponsored assessment of experimental evaluations of compensatory education projects, that “the deficiencies of quantitative, experimental evaluation approaches are so thorough and irreparable as to disqualify their use.” He went on to recommend that the “NIE [National Institute of Education] should conduct evaluation emphasizing an ethnographic, principally descriptive case study approach” (Glass & Camilli, 1981, p. 1). Nearly thirty years later, distinguished evaluator Michael Scriven (2008) said something similar: “The RCT design . . . has essentially zero practical application to the field of human affairs” (p. 12).

But today it’s as though Glass and Scriven had never spoken. Since the beginning of the twenty-first century, educators have encountered an obsession—I use the word advisedly—among policy makers and many education researchers with the idea that education research, to be more useful, needs certain special forms of evidence and inquiry. The widely used term gold standard of evidence has cemented in place in discourse about education the idea that these special forms of inquiry are better than all the others. Law has even been enacted that demands certain kinds of evidence-based practice based on these forms of inquiry. In this essay, I look in more detail at what gold standard inquiry might mean and contend that, by elevating the status of a particular form of inquiry, we frame questions and answers about education in such a way that research is constrained and stunted. I argue for a reappraisal of some of the precepts of education research and a restoration of confidence in many and varied forms of inquiry. (pp. 390-391)

So for the last 40 years we have known that randomized-controlled trials are flawed in the educational context – from some of the biggest educational research methodologists in the field – but yet for the last twenty years or more this is the research methodology that we have preferenced.

Another quote that I thought was interesting…

Why has “experiment” come to have such a specialized, narrow, strictly applied meaning in education (and the social sciences generally)? In the natural sciences (chemistry and physics, for example), “to experiment” means to test an idea under controlled conditions to prove or falsify a conjecture or a hypothesis, and an experiment can take myriad shapes and forms. Robert Hooke in 1676 had a hunch about elasticity in springs, and he tested this idea systematically under controlled conditions, stretching the springs with weights to record the consequences. He was able to emerge from his experiments with a law which states that the extension of a spring is in direct proportion to the load added to it. His method, simple as it was, was unequivocally experimental. (p. 395)

As an example…  Jered Borup is a colleague who developed the Academic Community of Engagement framework based on testing an idea.  To quote some things from Jered’s website:

In the first two articles (Borup, Graham, & Davies, 2013a, b), my co-authors and I used student and parent surveys to measure learning interactions and correlated them with learning outcomes.

Although innovative, the findings were limited because we did not fully examine the intended purposes of those interactions. Unfortunately, the existing online learning frameworks had been developed in higher education and they did not account for the unique characteristics of adolescent learners. As a result, we developed the Adolescent Community of Engagement (ACE) framework to better describe how parents, teachers, and peers can influence K-12 online student engagement (Borup, West, Graham, & Davies, 2014).

The ACE framework has helped guide nearly all of my subsequent research examining K-12 online learning. More specifically, my co-authors and I have conducted a series of case studies examining perceptions and experiences of various stakeholders (e.g., students, parents, teachers, facilitators) in various models of online learning such as a full-time cyber high school, a large independent study program, and a state-run supplemental online program where students were assigned an on-site facilitator.

Following my dissertation, my co-authors and I conducted three rounds of data collection and analysis at the cyber charter school, each having similar but different purposes. First, we surveyed all of the teachers and conducted 22 one-hour-long interviews with 11 teachers to better understand their perceptions regarding the responsibilities and efforts of teachers, parents, and peers to fully engage students in learning activities (Borup, 2016a, b; Borup, Graham, & Drysdale, 2014; Borup & Stevens, 2014). Second, as part of Dr. Jeffery Drysdale’s dissertation we conducted five 75-minute focus groups with nearly all of the cyber school’s teachers, with follow-up interviews with 10 students and five teachers to better understand the school’s online facilitating program (Drysdale, Graham, & Borup, 2014; 2016). Lastly, we conducted additional interviews with the same 10 students and 19 interviews with nine of their parents. These interviews were similar to our teacher interviews and focused on participants’ perceptions and experiences of support provided to students by the online teacher, peers, and parents (Borup, Stevens, & Hasler Waters, 2015; Borup & Stevens, 2016, 2017). Three of the interviewed parents faced especially challenging situations, and we conducted narrative analyses using their interviews to better understand and share their experiences (Borup, Call Cummings, & Walters, accepted).

My co-authors and I have since shifted our focus to other models of online learning. For instance, as part of Dr. Darin Oviatt’s dissertation research we published two articles examining student support systems—both program-provided and student-curated—at a large independent study program. The first article focused on student perspectives at the start of the semester obtained from survey responses from over 1,000 students (Oviatt, Graham, Borup, & Davies, 2016). The second article reported on our analysis of a similar number of student survey responses and nine student-parent interviews collected at the end of the semester (Oviatt, Graham, Borup, & Davies, 2018). We have most recently been conducting and analyzing online teacher and on-site facilitator interviews and student focus groups to better understand the support that teachers, on-site facilitators, and parents provide students enrolled in a supplemental online program. The findings have highlighted the importance of online students working with an engaged facilitator. This research has resulted in four MVLRI reports, three published or accepted journal articles (Borup, Chambers, & Stimson, accepted; Borup & Stimson, accepted; Freidhoff, Borup, Stimson, & DeBruler, 2015), and an article under review.

Now this discussion is dated, but Jered hypothesized that the role of the parent was important in the full-time K-12 online learning environment.  He began his “experiment” by conducting a couple of case studies with a specific cyber charter school.  Based on that initial experiment, he developed a model and then conducted additional case studies to test that model to see if worked in other full-time K-12 online learning environments, and expanded that exploration to other K-12 online learning environments.  Based upon those additional “experiments,” he refined his original model.  Isn’t that basically the scientific method?

I mean didn’t we all learn this model in middle school?

Essentially, Jered’s initial interest was based on his personal experiences observing the phenomenon.  He conducted some case studies to identify what the important variables were and developed a model.  He conducted a bunch of other case studies to test and refined the model.  Based on the scientific method, the next step for Jered and his colleagues would be to develop a reliable and valid instrument to test that model.

How is this not the gold standard of scientific explanation?  Yet, because it isn’t based on randomized controlled trials, none of this work will ever be added to the What Works Clearinghouse.

The article’s conclusion is also quite instructive.

Cause, a complex idea, is legitimately sought and illuminated via a variety of complementary means, including experimentation. The danger of subscribing to a view of inquiry in which types of research are hierarchized, with gold standard inquiry at the top, is that it suggests that these kinds of research approaches are superior to all others in establishing cause, thereby edging out other ways of seeking and understanding complex, causal interrelationships. This view also offers a simplistic notion of cause-effect—“what works”—which perpetuates a model of inquiry being conducted in order that messages can be disseminated, top-down, to practitioners. Such a view disengages inquiry from the practice of the teacher, and it evades questions about how practitioners develop competence, skill, and fluency given the idiosyncrasies of their own situations. The idea of excellent practice moves from one that is personally cultivated by practitioners to one that is bestowed on practitioners by others.

Eclecticism and complementarity are surely central to the way that scientific inquiry works: inquiry cannot be formula driven. In every field, scientific inquiry seeks to answer questions and to solve puzzles. That is its purpose. It looks for explanations—clarification, illumination, enlightenment—about how and why things happen as they do. We link evidence, make connections, test hypotheses, recognize themes, cultivate ideas, and build models of the way the world works. We synthesize all of this using a mix of research methods. As Shaffer (2011) urges, “Rather than eulogizing one particular method, energies could more fruitfully be directed toward selecting the ‘optimal mix’ of research methods which address the key research questions in hand” (p. 1632).

Or, as Maslow (1966) put it, “I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail” (pp. 15–16). We should be wary of being enticed by the notion that there exist particular methodological amulets—gold standard methods. For if these come to dominate the research enterprise, we may end up in a world where research tells us only about certain kinds of issues, and our understanding may be correspondingly restricted and distorted. (pp. 406-407)

It is also worth mentioning that this “gold standard” also isn’t quite as rigorous as what many would have us believe.  While the placebo effect is well documented, typically speaking if you are experiencing a change in condition in a medical study, chances are you are taking the treatment and not the placebo.  Drugs come with side effects and have tastes and scents.  While it is a dramatic production, this is a good illustration of the reality of medical research.

So let’s try to be realistic about the research process, and stop believing that one type of methodology or another is a measure of better research or that it can automatically tell us more about what works.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at

%d bloggers like this: