The Magic Behind Lexical Hypothesis

And the story about how words became the real thing. Second post in the series.

Apr 22, 2022

In the previous article, we talked about the reasons why I became interested in personality psychology, and what are the two biggest lessons that I learned. We learned about Big Five and looked into my personality profile. We then discussed the two biggest lessons that I learned studying personality psychology. First, “things always make sense if you think within the bounds of a theory or a framework.” A theory (about personality) is a story buffed up with numbers, but mostly a story. When you subscribe to this story, you'll begin to interpret reality through its lens. And second, whether you subscribe to any theory is largely dependent on your intuition, and your gut feeling whether it feels right or wrong.

I also mentioned what's to come in the upcoming posts — lexical hypothesis, factor analysis, Big Five, and a critical look at the entire paradigm.

In keeping with that outline, today we'll discuss the motherload assumption, the guiding star, the number 42 of the modern factor analytic approaches to personality — the lexical hypothesis. In particular, we'll take a look at:

the two postulates of lexical hypothesis;
the three philosophical views — realist, constructivist, and functionalist — that try to explain the basis of personality characteristics (or attributes, qualities, which I'll use interchangeably).
the pros and cons of using lexical hypothesis and the final verdict.

So, why should you even bother with reading this? Short answer: the lexical hypothesis is the basis for all of the modern personality theories based on factor analysis. And if you understand it you'll be better able to interpret and think about not only your personality profile but also about the ~~entire~~ most of the subfield of personality psychology that focuses on traits. It's like a master key that opens any lock.

But let's begin in the beginning.

Lexical hypothesis dates back to the cousin of Charles Darwin, Francis Galton, and consists of two postulates.

An important personality characteristic that is used to describe someone will eventually find its way into the language.
Over time, this description will crystallize into a single word.

An example. Let's say that a few thousand years ago as the first states formed, people needed a way to describe someone — say his name is Uruk (sounds Mesopotamian enough to me) — who didn't pay his debts, cheated taxes, and told his wife that yes, of course, she doesn't look fat in that dress. Imagine you're Uruks neighbor and you're bothered by his behavior. Naturally, you tell your friends about Uruk and complain that he's cheating on his clay tablet taxes. And so on. The years go by and instead of recounting all the ways Uruk cheats, lies, and weasels out of his obligations each time you talk about him to your friends1, you come to use a single word - “dishonest” - to describe Uruk’s actions and the meaning embedded in them.

A sciency way of saying this would look like this:

The degree of representation of an attribute in language has some correspondence with the general importance of the attribute in real-world transactions. This key premise of the lexical approach links semantic representation directly with the social importance criterion. [...] If terms in a language are used as variables, an attribute that is represented by multiple terms will likely appear as a factor.

Translated back into normal language, this means that the presence of some personality characteristic (here, dishonesty) in the language is proportional to its importance in human interactions. Since it's quite important to be able to describe a dishonest person (for obvious reasons that I won't spell out here), we have so many similar adjectives that do the job, e.g.:

deceitful,
crooked,
corrupt,
unscrupulous,
untrustworthy.

The lexical hypothesis basically says that if we take all these different, but similar adjectives (nouns and phrases too), we should be able to uncover this latent personality characteristic (dishonesty).

In one sentence: we use a single word as a shorthand to describe some latent quality of a person, and this quality — in itself — consists of many different behaviors and situations.

Here’s a graphical representation in case I couldn’t squeeze any sense into my writing.

But now we're left with another question: what are these qualities of people, which are described by our vocabulary? Or: in what form do they exist? For instance:

Is there a biological basis for dishonesty?
Or is it perhaps a social phenomenon, emergent from interactions between people?
Or are we talking about something that comes to exist based on its usefulness within a certain context, to achieve a certain goal?

These three questions are encapsulated in the following philosophical views on this: realist, constructivist, and functionalist. Let's look at each in turn.

The realist view dates back to Gordon Allport, the father of personality psychology, and it assumes that these personality dispositions emerge from within the individual. This view is reflected in the Big Five, for instance, which assumes there are endogenous biological dispositions that, on the psychological plane, translate into personality traits. Big Five assumes we can tap into these biological dispositions via (mostly) self-reports. Under this assumption, people's answers — on a scale from 1 = “disagree a lot” to 5 = “agree a lot” — to various items, such as:

I am always prepared.
I pay attention to details.
I get chores done right away.
I like order.

will reflect an underlying biological reality (of a conscientious person in this example). There's evidence that biology underlies traits. For instance, the two meta traits of stability (neuroticism (reversed), agreeableness, conscientiousness) and plasticity (extraversion, openness to experience) are assumed to be related to serotonergic and dopaminergic functioning. Broadly speaking, serotonin is the neuromodulator of satiety and contentment, and dopamine of drive and pursuit, which would fit the broad descriptions of the meta traits above. If you want to dig deeper for details, here’s a decent review.

The constructivist view wholesomely disagrees. It argues that we can't simply translate words to underlying biological qualities, because words are biased by the perceptions of the one doing the evaluation2. Instead, a constructivist proposes that the clustering of words around traits is because of the semantic qualities of the words themselves.

(In the strong version of the constructivist view, personality traits are simply confabulations of the researcher, arbitrary categories that have no substantial reality3.)

So, instead of focusing on some latent biological quality, the constructivist view focuses

a) on the one doing the perceiving and;
b) on the phenotypic (surface) level of interpretation (rather than the genetic and biological underpinnings).

There’s also evidence for the constructivist view. For an exhaustive overview, and a study that investigated over N = 1.2k of Norweigian twins, look here. Without getting bogged down with detail, the authors concluded that the model that represents the constructivist view fits the data better, which suggests less influence of the genetic components (as the realists would claim) and more of an environmental influence (as the constructivists proposed).

Then we also have the functionalist view. Sciency quote:

According to the functionalist view, personality description (whether done by the self, by social others, or by scientists) is a perceptual process that cannot be fully separated from the perceiver’s goals and the context. When the realist says that a structural model of personality identifies the important dimensions of individual differences, the functionalist asks “Important to whom and for what purpose?” Personality attributes do not exist in language or in psychologists’ inventories merely for decontextualized and bloodless description.

Basically, this view stresses the function of the personality characteristic. Both

a) individually, or how it serves to achieve an individual's goals (i.e. means to an end), and;
b) socially, or how useful is this individual (with this particular personality characteristic) to others.

On the individual level, dishonesty might have the function to obtain something valuable by inflating one’s credentials. Lying in a resume comes to mind. Socially, if we label someone as dishonest, the function is to deter our friends from dealing with this person. I haven’t stumbled upon any studies that investigated this view, but it also seems plausible.

Let's recap here.

What's important to remember for the upcoming discussion is that, in a nutshell, personality researchers use words to categorize people. The reasoning is that these words are descriptive of some underlying quality, which is socially important, and aggregate around it, like wasps around a popsicle. The underlying basis for these words is disputed. It can be that there's underlying biology (functionalist). It might also be that it's all emergent from how individuals view the world and construct their understanding of it (constructivist). Or it might be that the words we use reflect how instrumental the underlying quality is in achieving individual goals and social interactions (functionalist).

Regardless of the etiology (= the 3 views we just discussed), what can we say about the validity of the lexical hypothesis?

Well, for one, I think it's quite an elegant and useful way of summarizing and describing something that's difficult to grasp: we can't really see the latent influences on someone’s behavior, inspiring one person to be agreeable and another to be assertive. Yet surely there is something, isn't it? Our everyday experience tells us that there are systematic differences between people: we are not blank slates but come preprogrammed with certain proclivities. And we can — obviously with some loss of information along the way — put people into different buckets, and categorize them based on how much or how little of this underlying quality they possess.

Many researchers thought the same. That's why we have so many different personality models based on the lexical hypothesis (collectively called the “psycho-lexical approach” - i.e. deriving psychological qualities from words). We already mentioned the Big Five, which assumes there are five underlying dimensions that describe people. But there are various others. Some claim there are more than five dimensions. An example of that is the HEXACO model, which assumes there are six dimensions - the theory adds the "honesty/humility" factor to the mix. Yet others claim there are fewer dimensions. As mentioned earlier, the five factors from the Big five can be explained by the meta traits called plasticity and stability. And then there are the aspects and facets, too. Recall from my personality profile that agreeableness, for instance, is thought to be constituted by compassion and politeness.

As you can see, it’s quite a jumbled-up hierarchy. Source

What I find interesting is that most researchers tend to agree that the lexical hypothesis can deliver insight into personality characteristics. Sure, they quibble about the structure - the factors, aspects, facets, or whatever. But they don’t quibble as much about whether using words to describe personality (= lexical hypothesis) is a valid way to obtain insights into human personality. Over the years, the “it's better than nothing” kind of mindset settled in. People are aware there are problems with this approach, some of which I'll discuss below, but their reasoning is that we ought to describe personality somehow and that using words to do it is better than nothing4.

So, what are the problems with that?

Words are not objective, but evaluative. There's a bias toward sociality and negative emotion.
Condensing personality characteristics into single words or phrases leads to a substantial loss of information.
Words have different meanings for different people.
Language can only explain a minority of human experience and accounts for only a small portion of interpersonal communication.

Let's briefly discuss each.

You can understand the first critique from the standpoint that:

a) most of the words we use have something to do with describing others. This makes sense since that's one of the major reasons language has probably even developed. Recall the story about Uruk. And;
b) that the negative outweighs the positive; We are more focused on the bad things that can happen to us than we are focused on the things that are going well. This also makes sense — experiencing negative emotion makes us more likely to act to remedy the cause of the negative emotion, whereas positive emotion is basically a signal that everything is going well.

What this means is that the language is biased: there's a disproportionate focus on social and negative aspects. Meaning: the factors within the Big Five (and presumably other psycholexical models), which we have access to through (mostly) self-reports, are not objective descriptions of some underlying reality but are systematically biased.

The second critique pertains to the fact that we're trying to condense a ton of information into a few descriptors (the shortest Big Five inventories have as little as 10 items, 2 for each factor). We might glean some important insights about people's personalities that way, sure, but we might also be leaving the best bits on the table5.

The third critique, also the one that I find most intriguing, is the fact that we differ in how we perceive words. And so, if you present the same words to different people (which is what happens with validated questionnaires), you are introducing bias into your results. Dubbed “projection through capacities”, people view different words based on their embodied experience. For instance, I might view the word “strength” differently than you (assuming you are the average woman, the average older person, or, bluntly, weaker than me), because I’m capable of doing a muscle-up or a one-arm pull-up.

Next to words having different meanings to different people, we also must consider that language is but a small subset of communication. The numbers vary, but the old adage that 90% of communication happens below the level of the language — gestures, body language, mimicry, etc. — is not that far off the truth. So, if we assume that we can tap into latent personality characteristics through language, we must also keep in mind that we can't really verbalize everything (nor should we, to be honest). The resulting personality characteristics excavated via lexical hypothesis are thus impoverished, in a sense, because they can't really reflect deeper, not explicitly communicable realities.

As an example, try to describe your best friend. You might come up with plenty of descriptions: caring, kind, funny, has a stunning sense of fashion, and so on. But if you think about it, does that really describe the feel of the person? No, it doesn’t. It’s like trying to put together a cake using the tools from the toolbox, screwdriver, wrench, hammer, and the like. Depending on your handiness with tools, there might be a cake-ish thing coming out of the oven. But most likely, it will be some sort of a Franken-cake, a poor approximation of the real thing. If you ever read the quote but Laozi and didn’t understand it, try reading it now:

“The Tao that can be told is not the eternal Tao.
The name that can be named is not the eternal name.
The nameless is the beginning of heaven and earth.
The named is the mother of ten thousand things.
Ever desireless, one can see the mystery.
Ever desiring, one can see the manifestations.
These two spring from the same source but differ in name;
this appears as darkness.
Darkness within darkness.
The gate to all mystery.”

So, what is the final verdict? Well, since I myself am a verbose person, I'd say — purely out of self-interest (and perhaps self-preservation) — that the lexical hypothesis can deliver some insight into the nature of human personality. The fact that we have words to describe certain personality characteristics means that there is some basis these words reflect. Now whether it's biological, social, or functional I can't say: as with everything in psychology, it's probably a bit of everything (and a lot of something we haven't even considered).

That said, I think we are often blind to the limitations of this method and the results it can deliver. We perceive the personality profiles based on some variation of the psycholexical approach as immutable reflections of a person, instead of crude approximations that are only a bit more informed than the Hippocrates’ bile theory of personality, or tarot card reading. It’s honestly the same thing we do everywhere else:

we use GDP as the measure of the well-being and happiness of people;
we assume being healthy equals not being ill;
we believe the number of books read (or articles produced) to be a proxy for knowledge.

In other words, we mistake the map for the territory. And over time the abstract models used to represent the real thing will dethrone the thing itself in the minds of people. I get it, though: it’s very convenient to give people a short questionnaire and reduce the chaos of someone’s existence to a manageable personality profile.

What is the offshoot of that? Well, the universal need to describe people coupled with the ease of administration of, say, a Big Five questionnaire means that we use it indiscriminately and everywhere—from political campaigns, to trying to see whether someone is at risk of developing, say, alcohol addiction, to recruitment processes. The magical quality of the Big Five is that it’s always correlated with some variable of interest6.

I think that the weight we ascribe to the results obtained with psycholexical approach should be proportional to the strength of the method. And while the scientists themselves might be careful how they interpret the results, someone who stumbles upon a free personality test online that has a patina of scientific sheen might not be as discerning of the nuances.

So, how would I summarize this article? I think it is reasonable to claim that words — the basis of the psycholexical approach based on lexical hypothesis — can reflect some underlying personality characteristic. I mean, that's what they were “designed” for. However, we shouldn't put too much trust into results excavated through such methods because they are a) inherently biased and b) omit a large part of what actually constitutes a human personality.

Words do have meaning. But words are also empty.

In the next post in this series, we'll explore the method of how researchers test the lexical hypothesis - factor analysis7.

Until then.

friends always hate it when it takes you forever to get to the point was probably as true then as it is now and always will be

In most cases, that's the individual himself (self-report). But it may also be someone close to the individual, who knows them (other-report). Or it might be some semi-objective observer, such as an expert, who does the evaluations.

There was a huge debate in the 70s spurred on by Walther Mischel (you might remember him from the post about delayed gratification) who argued that there's no personality and that what people do is guided by the situation they find themselves in. He based this argument on the fact that the personality traits correlate only about r = .3 with actual behaviors. While that is mostly true, it turned out that this number isn't any worse than any other field of psychology. Still, it stymied the field of personality psychology for quite a long time.

Full disclosure, in case there’s confusion about the claims personality psychologists make based on my interpretation: in personality psychology, traits — what we discussed this entire time — are but one leg upon which the table stands. An established definition of personality also mentions characteristic adaptations and life stories. The former can be understood as the interaction of traits in situations, and the latter as the stories that an individual tells themselves to explain the events happening in their life. For instance, if you’re extraverted (trait), you’re more likely to go out to parties (characteristic adaptation) and see yourself as a social beast (life story, identity). All these interact to constitute your personality.

Here, I'd briefly mention the work of Dan P. McAdams (e.g. here) that describes the Big Five as the “psychology of the stranger”.

on this and more, see: The Ongoing Accomplishment of the Big Five – Carcinisation

There’s actually another, principal component analysis (or PCA) that is mathematically different, but yields quite similar results.

TheQuaintPickle

Discussion about this post