Recently, I've been reading a book called Lifespan by David Sinclair. In it, Sinclair discusses a theory about why we age. Spoiler alert, it's information loss on the (epi)genetic level. One of the ways said information loss manifests, Sinclair claims, is that over time our cells forget what they are, they become senescent (or zombie cells). For instance, a healthy skin cell is a 100% skin cell and thus does the job of being a skin quite well. A senescent cell, on the other hand, loses its identity: over time, it becomes only 80% skin cell, the rest being, for instance, 10% liver and 10% kidney cells. The problem with that - besides your skin not being a place to grow a kidney or liver - is that the senescent cell doesn't die but instead infects the neighboring cells with cancer.
In the field of psychology, as I'll argue, we have a few of these senescent cells lying around, too, and instead of aging and cancer, we are dealing with all sorts of crises - a replication crisis, a theory crisis, or a generalizability crisis. Whatever name you slap in front of the word “crisis” the result is the same: the field produces wonky results and can't, in all seriousness, be compared to other natural sciences.
In this article, I'll go into some of the reasons why this is the case - mostly reiterating ideas much smarter people have said previously - and attempt to argue - again with their help - for what we should do to get back in the driver's seat. Now bear in mind, this article barely scratches the immensity and depth of the matter: there's been a considerable amount of ink spilled over these issues over the years and what I'll portray is but a smidgen. Nevertheless a smidgen worth reading, I hope.
Crossing off the Letter “E”
To best illustrate the manifold problems plaguing psychology, we’ll use an example. Imagine yourself as a participant in a study. In this study, I - the mad scientist, muahaha (that's my evil laugh) - present you with a few hundred words and ask you to cross off the letter “E” from each word as quickly as you can. You, being a wonderful participant, duly comply and start crossing off the letter “E”, thinking all the while what demon has possessed you to apply for this study. After a short while, you're finished and I ask you to solve some math problems for a good measure. You hate math but do it anyway. You cherish the thought that you're not alone in this mess - I also asked other people to do the same.
Unbeknownst to you, another group of people toils away at my behest. But this group - to your great dismay - is asked only to solve math problems, without crossing off the letter “E” beforehand. They form the control group.
After I torture a few hundred participants with boring tasks and/or re-traumatize them with their scholastic fears of math, I now have a ton of data. In specific, I have:
times, or how long did it take for people in different groups to solve the math problem? and;
a number of solved math problems in each group.
Juicy stuff!
Now also unbeknownst to you - unless you're a psychology student, in which case we have a different kind of problem1 - I have a hypothesis! And the hypothesis goes like this: the people who have been forced to cross off the letter “E” - and also, likely, hate their miserable lives - will take more time to solve math problems and solve fewer of them, as compared to the control group. The rationale is that the people in the experimental group will have depleted some sort of finite mental resource by crossing off the letter “E”, the result being that they will perform worse on other mental tasks (such as solving math problems). So far so good, right? It's kind of intuitive and you can understand why this should work even if you don't have a background in psychology.
But here's where it gets wonky.
Let's suppose I experienced the most validating thing a researcher can experience2 - the data confirms the hypothesis. This means that you - and all the other people I've tortured with crossing off the letter “E” in the experimental group - performed worse, i.e. had higher solving times and lower number of solved math problems - than the control group. Yay for me! Now I can put the data in a fancy table or plot it into a graph, write some word salad around it (not unlike this one, thanks for reading, by the way), and submit it to a journal. Depending on my credentials, how fancy the study looks, and dozen other factors, the journal is likely to accept it, as I confirmed my hypothesis3.
Interpretability of Evidence
So, what did I find? Let's interpret the results. Recall my hypothesis that people forced to cross off the letter “E” would deplete some finite mental resource. I call this mental phenomenon ego depletion. I operationalized - i.e. quantified my verbal statement in numbers - the state of ego depletion through the variables I measured:
higher times to solve math problems, and
solving fewer of them, as per my hypothesis.
Now for the fun part.
How do I know that the effects that I had found emerged because some finite mental resource was depleted? I don't. Nobody does. What if, instead, you solved fewer math problems because you weren't motivated enough after crossing off the letter “E” like a monkey? Plausible.
So, the first problem that we encounter is that any theory in psychology is often undetermined by evidence: the objective data that I have gathered can be explained in many ways. The problem with psychology is that I can't really say - with a lot of certainty, as in, say, physics - that one explanation is more fitting than the other because of the inherent complexity of my subject matter (i.e. humans) and the underspecification of my hypotheses and theory.
Now normally, different models would duke it out before one emerged on top - the one with the highest explanatory power. But that’s often not the case, as psychology isn’t driven as much with paradigm continuation and specification as it is concerned with following trends. Thus, we have a bunch of theories lying around, and these theories - like senescent cells - spoil the whole bunch.
We All Are Special Snowflakes… Kind of
In line with there being alternate explanations for the objective data we have in front of us, consider also that we psychologists assume a priori that the underlying cause for the effect we have found is unitary, eg. that there's some sort of homogenous structure somewhere within each individual that causes the effects I observed in my experiment. This means that - thanks to our assumptions - everyone possesses this unseen phantom structure and the differences that we observe are more of a degree (some people have "more" and some people have "less"), not of a kind (meaning that the effects we observe could be caused by a structure A in you, but a structure B in someone else). But let’s make this more concrete in our example.
This means that the mechanism behind ego depletion that causes mental fatigue and lowers the subsequent mental output could, in theory, be different in each and every one of the participants I subjected to my experiment. Since we can't see the assumed structures within you - and the other participants - we can't tell that they are different.
And they very well might. A huge corpus of knowledge in psychology and neuroscience attests that, funny enough, everyone is different. To remove the stench of platitude from the preceding sentence, consider a mechanism so well established that most people know it without reading any neuroscience paper - neuroplasticity. Our brain responds to the environment by changing its structures and wirings. Thus you, having been exposed to a certain set of environmental stimuli, might develop different adaptations in your skull than me, having experienced something entirely different. This goes meta, too, if you consider the epigenetic adaptations your parents (and their parents parents, and… ad infinitum) have endowed you with. But what we psychologists do in our research is to say: “well, I'm kind of aware that everyone is probably different, but since that doesn't allow me to do any psychological research ever and I like my job and think it's worthwhile, I just sweep this under the rug but hey! - it's okay that I do it because everyone else does it too.”
But say we found the structure or the mechanism responsible for the observed effects. In the example above, I can say that the process that is happening behind the curtains is that glucose - what we humans use as energy to fuel our (mental) actions - is being depleted by crossing off the letter “E”. Sure enough, there are studies that show that glucose is being depleted after people have crossed off a few dozens of Es or as a result of doing some other menial task. Now I had found my cause! Ego depletion is a real thing! Not so fast. Turns out if you tell people that ego depletion isn't a real thing - i.e you manipulate their beliefs - the effects of depleted glucose on mental tasks disappear.
Anyway, you don't know this but I just saved you two days of reading through this 50-page behemoth of a paper by Richters that argues for what I crudely condensed in the above paragraph. If you want nuance, though, feel free to read it.
Moving on.
Unwarranted Inferences
Back to our experiment. Recall that I found that crossing off the letter “E” leads to higher solving times for math problems and that people solve fewer of them. Recall also that I called this phenomenon ego depletion. So, when writing a paper, I won't title it "crossing off Es causes fewer math problems solved, and higher solving times, as compared to control group, which didn't do any crossing off", I will title it "ego depletion is why you can't solve math". Now I might not go as far as to claim that4, but I do a variation of the latter title because, remember, I had my assumptions and they happened to be “true”. And my assumptions were that the observed effects I find can be chalked up to the phenomenon I postulated - ego depletion.
But what if I got lucky in choosing the task I chose, and thus also the results I had found? What if a slightly different task, such as the crossing of the letters “A” or “B”, wouldn't produce the same effect? What if it facilitated solving math problems instead? In short, there's nearly an infinite number of stimuli variations out there that might or might not cause the observed effects. By slapping a name on the results I had found and assuming that the mechanism that causes the observed effect doesn't differ between all these variations of the stimuli, I inadvertently go way beyond what my research has actually found, namely, that crossing off the letter “E” leads to fewer solved math problems and higher solving times, not that there's a phenomenon called ego depletion.
This is a condensed version of what Tal Yarkoni presented in his paper. He claims that one of the reasons psychology is in crisis is because we make unsubstantiated sweeping generalizations and thus the results we have don't say as much as we assume they do. Again, for details and nuance, head over to his paper.
Solutions… Solutions?
Since this is not a textbook titled "Problems with Psychology, 7th Edition, Edited and Revised", but an online article, and since you're not being paid to read this and probably have a dozen other things you should be (and want to be) doing (thanks for reading, by the way), I will stick to the three issues I portrayed above. Suffice it to say that I will probably expand on it in the future. Having said that, what are some of the immediate implications of the problems I described above, and what can we do about them?
One thing we can do as psychologists - looking at the inherent complexity in the interpretability of the results, individual differences, and problems with generalizing across stimuli - is to introduce epistemic humility into our practice. This means that when we discuss our results and make inferences, we mention that there are a number of factors that might be responsible for the results we've found. Not only that, but we also mention that we have worked with averaged values and under the assumption of individual differences. It might very well be that certain populations of people work completely differently, which we can't really prove with the assumptions we made a priori. We can also mention that what we had found might not generalize across other stimuli, that crossing off the letters “A” and ”B” instead of the letter “E” might produce different results.
On a more practical level, reducing the claims we make can also take the form of - gasp! - ditching inferential statistics, and displaying good old descriptives only. This way, we won't get the fancy asterisks behind p-values, because we won't assume that our findings generalize to the population at large. But I get this is a lot to ask, especially because inferential statistics is one of the major reasons psychology is even considered a natural science: it allows us to quantify our findings and generalize across populations. In other words, we psychologists “do math” like physicists, and thus - pretty please - can we also be taken as seriously as that?
Anyway, if you don't conduct research yourself but enjoy reading about it - be it articles themselves, or popular science books - be aware that what you read there might simply not be applicable to you, not valid in different contexts, or entirely wrong. Psychology is not - and will never be - an exact science, despite how formalized its models will become, and how many mathematical models it will use.
On a practical note, the more unintuitive and interesting the finding in psychology seems to be, the more likely it is to be a fluke. Guilty until proven innocent.
Before I leave you disillusioned with the field of psychology and let you go on your merry way, let's recap a bit. The current way the research in psychology is conducted is flawed on many levels. I showed three.
First was the interpretability of the results: one finding can be explained in many ways. The state of ego depletion might be because mental resources are depleted, but also - maybe - because you aren't motivated to solve any math tasks anymore after having suffered through crossing off the letter “E” (or, you know, dozens of other things).
Second, we talked about how individuals are different (brain plasticity, epigenetics), and we can't be sure that the mechanisms we assume to be present in each of us are the same for you and me, or if they are completely and utterly different. As a result, we can't really claim causality, even if we experimentally did everything correctly.
Third, we talked about the problem of sweeping generalizations across variations of stimuli. We don't really know whether crossing off the letters “E” or “B” or “A” causes ego depletion, but we assume it does. In reality, we don't know unless we test it.
But let's finish on a hopeful note. The reason I know about all these problems is that there are smart people working on solving them (see the stuff I linked). And the field of psychology - be it because the evidence is incontrovertible by now or because its social norms have shifted and it's now en vogue to critique how we conduct research, whichever - is experiencing a renaissance: we are really trying to improve our epistemic standards, at least some of us.
If I were to really finish on the hopeful note - haha, you really thought I would do that? - I would omit the fact that we knew about most of these problems (and many more) for decades, and didn't do much about them. Similar to how we approach other complex and gargantuan issues (like climate change), we very much prefer doing things as we've always done them just because a change is scary and brings its own problems. Let's just hope - both in psychological research, as well as in respect to our environment - that the majority of us are able to get our shit together before it hits the fan. We sure know what to do, but the question is: will we do it?
A lot of the issues in the field of psychology also stem from the fact that we tend to be lazy and use convenience samples of students that are WEIRD (western, educated, industrialized, rich, and democratic). Needless to say that WEIRD samples form but a fraction of all people and not a representative one at that. Further, the chances that a participant of a study with a psychology background will know what the study is about - and thus able to influence its results - is much higher with psychology students than with laypeople. Alas, I’ll not get into this problem in this post, hence the footnote.
Besides his mom telling him he's her favorite son - thanks, mom!
The story would probably look different had the hypothesis been disproved, but I will restrict myself to problems inherent to psychology, not systemic problems plaguing science in general... more on that probably in different articles.
Although the journalists picking up the results of my paper certainly won't have such qualms, but that’s also another besides the immediate point.