January 2026 · 16 min read

Theories of Intelligence

From Spearman's g-factor to Gardner's multiple intelligences to modern computational views.

A few months ago, I was debugging a multi-agent system where seven AI agents had to coordinate on a complex audit. One agent was brilliant at mathematical reasoning but couldn’t synthesize findings across domains. Another could find obscure research papers in seven languages but fell apart when asked to evaluate whether a physics equation was numerically stable. A third excelled at spotting contradictions but couldn’t generate novel hypotheses.

I kept thinking: these agents each have a kind of intelligence. But none of them have the intelligence. And that made me wonder whether the century-long debate about what intelligence actually is might look different from the vantage point of someone who’s tried to build it.

Turns out, it does.

A century of arguments about a word nobody can define

Intelligence is one of those concepts where everybody knows what it means until you ask them to define it. Psychologists have been fighting about this since the early 1900s, and the fight has produced some genuinely useful frameworks alongside a spectacular amount of academic tribalism.

The story starts with Charles Spearman in 1904. Spearman was a British psychologist who noticed something that seems obvious in retrospect but was actually a significant insight: people who are good at one type of mental test tend to be good at other types of mental tests too. Good at vocabulary? You’re probably decent at spatial reasoning. Fast at arithmetic? You’ll likely do fine on pattern recognition.

Spearman formalized this observation using factor analysis (a statistical technique he essentially invented for the purpose) and proposed that there exists a single underlying factor, which he called g, that explains the positive correlations between different cognitive abilities. Each specific ability also has its own factor (s), but g is the common thread.

This is Spearman’s two-factor theory, and it has proven annoyingly hard to kill. Over a century of research, across dozens of countries, with millions of test-takers, the positive manifold (the fact that cognitive tests correlate positively with each other) keeps showing up. If you run factor analysis on any large battery of cognitive tests, you will find a general factor. Every time.

The g factor predicts job performance across occupations (correlation around 0.5-0.6 with job performance ratings). It predicts educational attainment. It predicts income, though less strongly. It even predicts longevity, with higher g scores correlating with longer lifespans, probably through a combination of better health decisions and socioeconomic advantage.

Here’s the uncomfortable part. Despite its predictive power, nobody knows what g actually is. It’s a statistical construct. Saying “she has high g” is like saying “this stock market index went up.” True, maybe useful, but it doesn’t tell you why. Is g processing speed? Working memory capacity? Neural efficiency? The ability to maintain and manipulate more information simultaneously? Over a century in, we still don’t have a consensus answer.

Splitting the atom: Cattell’s investment theory

Raymond Cattell, a student of Spearman’s, thought g was too blunt. In the 1940s and 1960s, he proposed splitting it into two components:

Type	What it is	How it changes with age
Fluid intelligence (Gf)	The ability to reason about novel problems, spot patterns, think abstractly. Independent of prior knowledge.	Peaks in the mid-20s, declines steadily
Crystallized intelligence (Gc)	Accumulated knowledge, vocabulary, expertise. The product of applying fluid intelligence over time.	Increases throughout life, may plateau but rarely declines sharply

Cattell’s key insight was what he called the investment theory: fluid intelligence gets “invested” in learning experiences to produce crystallized intelligence. You use your raw reasoning ability to learn things, and those learned things become a separate, stable cognitive resource.

This distinction has held up remarkably well. The developmental trajectories are real. Young adults smoke older adults on novel problem-solving tasks (Raven’s Progressive Matrices, for instance), while older adults outperform on vocabulary, general knowledge, and domain-specific reasoning. The crossover happens somewhere in the 30s-40s.

What makes this relevant to the AI conversation is that it maps almost perfectly onto the distinction between a pre-trained model and a fine-tuned one. A large language model fresh off pre-training has massive fluid intelligence (pattern matching, analogical reasoning, novel problem-solving) but limited crystallized intelligence in any specific domain. Fine-tuning, retrieval-augmented generation, and in-context learning are all forms of “investment,” converting general capability into domain-specific knowledge.

This isn’t a metaphor. It’s a structural parallel. And it suggests that Cattell was onto something fundamental about how intelligence works, whether the substrate is biological or silicon.

The Cattell-Horn-Carroll merger

Cattell’s framework didn’t stay static. His student John Horn expanded the model, arguing that fluid and crystallized intelligence were just two of many broad cognitive abilities. Horn identified about ten broad abilities, including visual processing, auditory processing, processing speed, and long-term retrieval.

In the 1990s, John Carroll published his massive meta-analysis of factor-analytic studies (literally hundreds of datasets across decades of research) and proposed a three-stratum model: narrow abilities at the bottom, broad abilities in the middle, and g at the top.

The eventual merger of these frameworks produced what’s now called the Cattell-Horn-Carroll (CHC) theory, which is the dominant framework in psychometric intelligence research today. Most modern IQ tests, including the Wechsler scales and the Stanford-Binet, are structured around CHC theory.

The CHC model looks something like this:

                        g (general intelligence)
                              |
        ┌──────────┬──────────┼──────────┬──────────┐
        Gf         Gc         Gv         Gs         Glr
     (fluid)  (crystallized) (visual)  (speed)   (retrieval)
        |          |          |          |          |
    [narrow    [narrow    [narrow    [narrow    [narrow
     abilities] abilities] abilities] abilities] abilities]

This is a consensus model in the way that the Standard Model is a consensus in physics. Not everyone loves it. Not everyone agrees on the details. But it’s the best empirical fit to the data, and nobody has produced a serious competitor that does better across the full range of evidence.

Sternberg’s rebellion: intelligence is more than test scores

Robert Sternberg looked at the psychometric tradition and thought it was missing something important. His triarchic theory (1985) proposed three types of intelligence:

Component	What it measures	The psychometric tradition’s coverage
Analytical	The ability to analyze, evaluate, compare, and contrast. Classic “school smart.”	Well covered by IQ tests
Creative	The ability to create, invent, discover, imagine, and suppose. Generating novel solutions.	Poorly covered
Practical	The ability to apply knowledge to real-world problems. “Street smart.” Tacit knowledge.	Almost entirely ignored

Sternberg’s most provocative claim was about practical intelligence. He argued that tacit knowledge (the unwritten rules, heuristics, and contextual understanding that people acquire through experience) constitutes a genuine form of intelligence that’s largely independent of IQ.

The empirical picture is mixed. Studies from 3,252 students across the US, Finland, and Spain did find separable analytical, creative, and practical factors. Students taught using all three approaches outperformed those taught using only analytical methods. That’s real evidence.

But the practical intelligence construct has been challenged hard. Linda Gottfredson published a detailed critique arguing that Sternberg’s evidence for practical intelligence as independent from g was weaker than claimed, that tacit knowledge tests correlated with g more than Sternberg acknowledged, and that the distinction between “academic” and “practical” intelligence was blurrier than the theory implies.

Where I land on this: Sternberg was asking the right question (are IQ tests missing important aspects of cognitive ability?) even if his specific answer has problems. The fact that someone can have an IQ of 140 and be functionally incompetent at managing people, or that a person with an average IQ can build a successful business through accumulated practical wisdom, suggests that the psychometric tradition is measuring something real but not everything important.

Gardner’s multiple intelligences: the most popular theory that science doesn’t support

Howard Gardner’s theory of multiple intelligences (1983) is probably the most widely known theory of intelligence outside of academia. It’s also the most controversial among researchers.

Gardner proposed that intelligence isn’t a single thing or even a few things. Instead, there are (originally seven, now eight or nine) distinct intelligences:

Linguistic
Logical-Mathematical
Spatial
Musical
Bodily-Kinesthetic
Interpersonal
Intrapersonal
Naturalistic (added later)
Existential (proposed but not fully committed to)

The theory resonated massively with educators because it validated what teachers intuitively felt: some kids are brilliant but don’t test well, and the kid who can’t sit still might be kinesthetically intelligent rather than learning-disabled.

The problem is that the empirical evidence doesn’t support the theory’s central claim of independent intelligences. When researchers actually test Gardner’s intelligences using factor analysis, the “purely cognitive” ones (linguistic, logical-mathematical, spatial, naturalistic) load heavily on a common g factor. The correlations between them are positive, not zero. People good at linguistic tasks tend to be good at logical tasks too, which is exactly what Spearman found a century ago.

The non-cognitive “intelligences” (bodily-kinesthetic, musical, interpersonal) do show lower g loadings, but this raises a different problem. If they’re not correlated with cognitive abilities, are they “intelligences” at all, or are they better described as talents, skills, or aptitudes? Gardner explicitly chose the word “intelligence” for rhetorical impact, acknowledging that calling them “talents” wouldn’t have the same cultural weight. That’s a branding decision, not a scientific one.

There’s a real cost to the theory’s popularity. Schools that restructured curricula around “learning styles” derived from multiple intelligences theory wasted enormous resources on an approach that systematic reviews have consistently failed to validate. The learning styles hypothesis (that students learn better when instruction matches their dominant intelligence/learning style) is one of the most persistent and well-debunked myths in education research.

That said, Gardner’s instinct that the psychometric tradition was too narrow isn’t wrong. Musical ability, athletic coordination, and social perception are all genuine human capacities that matter for life outcomes. The question is whether calling them “intelligences” helps or hinders our understanding.

The computational turn: intelligence as compression

The most interesting recent development in intelligence theory doesn’t come from psychology. It comes from computer science and information theory.

The computational view of intelligence, championed by researchers like Shane Legg and Marcus Hutter, defines intelligence as an agent’s ability to achieve goals across a wide range of environments. Their formal definition (Universal Intelligence, 2007) literally writes this as a mathematical formula: intelligence is the expected performance of an agent across all computable environments, weighted by their complexity.

This sounds abstract, but it has a concrete implication: intelligence is compression. An intelligent system is one that can find compact representations of complex patterns. The better you compress, the better you predict. The better you predict, the better you perform.

This maps surprisingly well onto what we know about the g factor. If g is fundamentally about the ability to extract patterns and regularities from information, then g is essentially a compression metric. High-g individuals compress more information more efficiently, which is why they perform well across diverse cognitive tasks.

It also maps onto how modern AI systems work. Transformers learn compressed representations of their training data. The quality of these representations (how well they capture the underlying structure rather than surface patterns) determines how well the model generalizes to new tasks. A model with better compression has, in a meaningful sense, higher fluid intelligence.

Predictive processing: the brain as a prediction machine

Karl Friston’s free energy principle and the predictive processing framework take the computational view in a different direction. Under this framework, the brain is fundamentally a prediction machine. It maintains a generative model of the world and constantly generates predictions about incoming sensory data. Intelligence, in this view, is the quality of the generative model.

The key ideas:

Prediction error minimization. The brain tries to minimize the difference between its predictions and actual sensory input. It can do this in two ways: update its model (perception/learning) or act on the world to make its predictions come true (active inference). Both count as intelligence.

Hierarchical generative models. The brain’s model of the world is hierarchical. Lower levels predict raw sensory features. Higher levels predict abstract patterns and regularities. The depth and accuracy of this hierarchy determine cognitive capability.

Surprise minimization. Organisms that survive are, by definition, ones that aren’t often surprised by their environment. They’ve built models good enough to predict the states they’ll encounter. This connects intelligence to evolutionary fitness in a formal way.

What makes predictive processing interesting for intelligence theory is that it provides a unified account. Fluid intelligence is the ability to rapidly build new predictive models for novel situations. Crystallized intelligence is the accumulated library of predictive models for familiar domains. Attention is the precision-weighting of prediction errors. Learning is model updating. Even creativity can be framed as generating predictions from unusual model combinations.

Friston’s framework has been applied to everything from visual perception to motor control to psychiatric disorders (schizophrenia as pathological prediction error, anxiety as chronic uncertainty about predicted states). It’s ambitious, maybe too ambitious, but it’s the closest thing we have to a unified theory of cognition.

What building AI reveals about intelligence

Here’s where I think the conversation gets genuinely interesting. A century of theorizing about intelligence was done from the outside: observe people, give them tests, run factor analyses, debate. Now we’re building systems that exhibit intelligent behavior, and the process of building them reveals things that pure observation couldn’t.

Lesson 1: Generalization is the hard problem.

Every AI researcher knows this viscerally. You can build a system that performs spectacularly on its training distribution and collapses on anything slightly different. GPT-4 can write legal briefs but struggles with novel spatial reasoning tasks that any five-year-old handles easily. This maps directly onto the fluid vs. crystallized distinction: the model has massive crystallized intelligence (absorbed from training data) but its fluid intelligence (genuine novel reasoning) is narrower than it appears.

The positive manifold that Spearman found, the fact that cognitive abilities correlate, might simply reflect the fact that biological brains have genuinely general learning algorithms that transfer across domains. Current AI architectures are getting closer to this (foundation models transfer well across tasks) but still fall short of human-level generalization.

Lesson 2: Scale reveals and hides different things.

One of the most surprising findings from the scaling era of AI is that capabilities emerge discontinuously. A model at 10 billion parameters can’t do multi-step arithmetic. The same architecture at 100 billion parameters can. A model at 70 billion parameters can’t reliably do chain-of-thought reasoning. At 540 billion, it can.

This is reminiscent of Spearman’s law of diminishing returns, which states that g explains less variance among high-ability individuals. At lower ability levels, everything correlates because you need a minimum threshold of general capability for any specific ability to manifest. Above that threshold, specific abilities start to diverge. Similarly, at smaller scales, AI models are uniformly limited. At larger scales, you start to see differentiation between capabilities.

Lesson 3: Gardner was wrong about independence but right about diversity.

When you actually build intelligent systems, you discover that Gardner’s “multiple intelligences” aren’t independent. They share infrastructure. A model that’s good at language is usually decent at reasoning. A model good at spatial understanding tends to be better at analogy. The positive correlations are real, both in humans and in AI.

But Gardner’s broader point, that intelligence manifests in qualitatively different ways depending on the domain, is also vindicated by AI research. The same model weights handle language, vision, and reasoning differently. The computational primitives are shared (attention, representation learning, pattern completion), but the way they combine is domain-dependent. This is closer to the CHC model: general intelligence at the top, broad abilities in the middle, narrow abilities at the bottom, all partially correlated.

Lesson 4: Practical intelligence might just be context compression.

Sternberg’s practical intelligence, the tacit knowledge that makes someone effective in real-world situations, might be the ability to compress contextual cues efficiently. Someone with high practical intelligence reads a room quickly, picks up unwritten rules, and adapts behavior accordingly. This is exactly what in-context learning does in language models: compress the context window to extract implicit patterns and adjust behavior.

If this mapping is right, then practical intelligence isn’t a separate type of intelligence. It’s g applied to social and situational contexts rather than abstract ones. The reason it doesn’t always correlate with IQ is that IQ tests present decontextualized problems, while practical intelligence is all about context. The underlying mechanism might be the same. The input distribution is different.

Lesson 5: The binding problem is still unsolved.

Here’s something that building AI makes vivid. Each component of intelligence, pattern recognition, memory retrieval, analogical reasoning, planning, can be implemented separately and can work well in isolation. The hard part is binding them together into a unified cognitive agent that deploys the right capability at the right time.

This is the orchestration problem I encountered building my multi-agent system. Fourteen specialized agents, each competent in their domain, are less useful than you’d expect without a coordination protocol that tells them when to activate, what information to share, and how to resolve conflicts.

The brain solves this binding problem (mostly) through mechanisms we don’t fully understand. Theories range from synchronized neural oscillations to thalamic gating to attentional selection. Whatever the mechanism, it’s what makes intelligence feel unified even when the underlying components are modular.

Which theory wins?

None of them. All of them. It depends what you’re trying to do.

If you want to predict performance across a range of cognitive tasks, the CHC model with g at the top is your best tool. It’s empirically validated, psychometrically robust, and practically useful. IQ tests built on this framework predict educational and occupational outcomes better than almost any other single measure.

If you want to understand the mechanism of intelligence, predictive processing and computational theories are more promising. They tell you what intelligence is doing (compressing, predicting, minimizing surprise) rather than just measuring how much of it someone has.

If you want to build intelligence, you’ll end up rediscovering most of these theories from the engineering side. You’ll build systems with fluid capability that gets invested into crystallized knowledge (Cattell). You’ll notice positive correlations between capabilities because your systems share infrastructure (Spearman). You’ll find that specialized components need general coordination to produce intelligent behavior (the binding problem). You’ll discover that practical effectiveness requires context-sensitivity beyond raw reasoning ability (Sternberg).

If you want to educate people, Sternberg’s emphasis on teaching for analytical, creative, and practical thinking simultaneously has genuine empirical support. Gardner’s specific theory doesn’t, but his intuition that people have different cognitive profiles and that education should develop multiple capacities is reasonable, as long as you don’t take it to the extreme of “learning styles” instruction.

The uncomfortable convergence

Here’s what I find most striking about the current state of intelligence research. The psychometric tradition, the cognitive science tradition, the computational tradition, and the AI engineering tradition are all converging on a similar picture:

Intelligence is the ability to build and deploy compressed models of the world that generalize across contexts. It has a general component (the quality of the compression/prediction machinery) and specialized components (domain-specific models and skills). The general component explains why abilities correlate. The specialized components explain why people (and AI systems) have profiles of strengths and weaknesses.

The general component can be invested in building specialized components (Cattell’s investment theory, pre-training followed by fine-tuning). The quality of the general component declines with biological aging but can be partially offset by accumulated specialized components (fluid intelligence declines, crystallized intelligence accumulates). The whole system is unified by something we don’t fully understand but which functions as a coordination mechanism that deploys the right capabilities at the right time.

Spearman found the statistical signature of this in 1904. Cattell split it into its developmental dynamics in the 1960s. Friston formalized the computational mechanism in the 2000s. And AI engineers are rebuilding it from scratch right now, discovering the same principles through the process of construction rather than observation.

That convergence from completely independent lines of research, using completely different methods, is probably telling us something true about the nature of intelligence itself.

The question we’re left with isn’t “which theory of intelligence is right?” It’s “what does it mean that all these theories, despite their disagreements, are pointing at the same underlying structure?” And that question, I suspect, will only become answerable once we’ve built systems intelligent enough to help us investigate it. There’s a nice circularity to that.

intelligence psychology cognition

Continue Reading

Design as Thinking Tool