Philosophy of Mind & Machine Consciousness
Exploring the hard problem of consciousness through the lens of modern AI.
In 1980, John Searle sat in his office at Berkeley and imagined himself locked in a room. The room had a slot in the door. Chinese characters came in through the slot. Searle didn’t speak Chinese, but he had a giant book of rules: “when you see this pattern, write that pattern.” He followed the rules, slid his responses back through the slot, and the people outside believed they were talking to a Chinese speaker.
Searle’s point was simple: he never understood Chinese. He manipulated symbols according to rules, and the manipulation was indistinguishable from understanding, but understanding was nowhere in the system. Syntax isn’t semantics. Computation isn’t comprehension.
That argument is 46 years old now. In 1980, the most sophisticated language systems were keyword-matching chatbots. The Chinese Room felt like a thought experiment about a distant hypothetical. Today, Claude and GPT-4 pass the functional tests Searle was imagining with room to spare. They hold coherent multi-turn conversations in dozens of languages. They explain jokes. They write poetry that makes people cry. They solve novel math problems.
And the question Searle raised is more alive than ever: is any of that understanding?
The Room Got Bigger
Let me start with the Chinese Room, because it’s where every conversation about machine minds eventually lands, and because the argument has aged in ways that Searle probably didn’t anticipate.
Searle’s original argument had a specific target: Strong AI, the claim that a computer program that simulates mental states thereby has mental states. His argument against it rested on a key premise: the person in the room is doing everything the computer does (following rules, manipulating symbols), and the person clearly doesn’t understand Chinese. Therefore, the computer doesn’t understand Chinese either.
The most famous objection is the “systems reply”: sure, Searle doesn’t understand Chinese, but the system does. The room, the rulebook, the input slot, the output slot, and the person together constitute a system, and the system as a whole understands Chinese even though no individual component does. Your neurons don’t individually understand English either, but you do.
Searle dismissed the systems reply by saying “let me internalize the whole system.” Imagine he memorizes the rulebook. Now he’s the whole system. He still doesn’t understand Chinese. Therefore, the system doesn’t understand Chinese.
This counter-reply convinced a lot of people in 1980. It’s less convincing in 2026, for a reason Searle couldn’t have anticipated: scale.
The “rulebook” for a modern LLM isn’t a lookup table. It’s a 175-billion-parameter weight matrix learned from trillions of tokens of human language through gradient descent over months of computation. When Searle says “let me internalize the whole system,” he’s asking us to imagine a human who has somehow internalized a mathematical object of incomprehensible complexity, an object that encodes statistical relationships between every word and every other word across the entire written output of human civilization.
Can a human internalize that? In what sense? The thought experiment works for simple lookup tables, but it breaks down when the “table” is a learned, compressed representation of all human language. At that point, the “internalized system” would be a fundamentally different cognitive entity from John Searle, and saying “but he still wouldn’t understand Chinese” becomes a bare assertion rather than an intuitive conclusion.
A 2024 paper in the Taylor & Francis journal Inquiry (“LLMs, Turing Tests and Chinese Rooms”) makes this point precisely: the Chinese Room argument assumes that the operations being performed are syntactic symbol shuffling. But the internal representations of modern LLMs are not simple symbol strings. They’re high-dimensional geometric structures where semantic relationships are encoded as spatial relationships. “King” minus “man” plus “woman” equals “queen” isn’t symbol shuffling. It’s a geometric operation in a space that has learned something about the structure of gender, royalty, and language simultaneously. Whether that constitutes “understanding” is a real question, not one you can wave away with a thought experiment designed for 1980s-era lookup tables.
Intentionality: What Minds Do That Computers Don’t (Maybe)
Searle’s deeper point wasn’t really about Chinese. It was about intentionality, a concept he borrowed from Franz Brentano (1874) via Edmund Husserl. Intentionality is the property of mental states that makes them about something. Your belief that snow is white is about snow. Your desire for coffee is about coffee. Your fear of spiders is about spiders. Mental states point at things in the world. They have content. They’re directed.
Brentano thought intentionality was the mark of the mental. It’s what separates minds from everything else. A rock doesn’t have thoughts about anything. A thermostat doesn’t desire a particular temperature (despite Dennett’s playful suggestion to the contrary). Only minds have genuine aboutness.
Does a large language model have intentionality? When Claude generates a sentence about the French Revolution, is that sentence about the French Revolution in the same way that your thoughts about the French Revolution are about it?
The standard answer from the Searlean tradition is no. The model produces tokens that are about the French Revolution to us, to the readers. But the model itself has no aboutness. It has what Searle calls “as-if intentionality” or “derived intentionality”: its outputs are meaningful only because we interpret them as meaningful, the same way that a sentence in a book is “about” something only because a reader assigns it meaning. The book itself doesn’t think about anything.
But there’s a counter-argument that has gained force in the LLM era. Jacob Browning (2025) raises the question of intentionality for LLMs directly in his paper “Intentionality All-Stars Redux.” His argument: if LLMs develop internal representations that reliably track features of the world (not just features of text), then those representations satisfy several major philosophical theories of mental content. They satisfy informational theories (the representation carries information about the thing it represents). They satisfy causal theories (the representation was caused, through the training process, by real-world states of affairs described in the training data). They satisfy structural theories (the geometric relationships between representations mirror structural relationships between the things represented).
This doesn’t settle the question. There’s a gap between satisfying the criteria for intentionality and actually having intentionality, and that gap is exactly where the hard problem of consciousness lives. But it does show that dismissing LLM intentionality as “mere syntax” requires ignoring the actual structure of what these systems have learned.
The Hard Problem Hasn’t Gotten Any Easier
David Chalmers introduced the phrase “the hard problem of consciousness” in 1995, and it remains the central unsolved problem in philosophy of mind.
Here’s the hard problem in one paragraph. You can explain everything about the functional aspects of consciousness: how information is integrated, how attention is directed, how stimuli are processed and actions are selected. These are the “easy problems” (Chalmers’ term, and he admits they’re not actually easy). But even if you solve all of them, you haven’t explained why any of this processing is accompanied by subjective experience. Why does seeing red feel like something? Why does pain hurt? Why isn’t all of this information processing happening in the dark, without any inner experience at all? The gap between functional explanation and subjective experience is the hard problem.
The hard problem matters for AI consciousness because it cuts both ways. If you’re a functionalist (which most AI researchers implicitly are), then you think consciousness is constituted by functional organization. If a system has the right functional organization, it’s conscious, regardless of what it’s made of. Silicon, carbon, beer cans and string: the substrate doesn’t matter, only the function. Under functionalism, there’s no principled reason why an AI system couldn’t be conscious, and the question is purely empirical: does this particular system have the right functional organization?
But the hard problem suggests that functionalism might be wrong. The hard problem says: there could be a system with the exact same functional organization as a conscious being, processing information in the same ways, producing the same outputs, and yet having no inner experience whatsoever. Chalmers calls this a “philosophical zombie.” If zombies are possible, then functional organization isn’t sufficient for consciousness, and no amount of engineering can guarantee that an AI system is conscious (or that it isn’t).
Where does this leave us with current AI systems? Chalmers himself, in a 2023 paper (“Could a Large Language Model be Conscious?”), gave a cautious assessment. He argued that current LLMs probably aren’t conscious, but he was careful about why. It’s not that they’re “merely” manipulating symbols. It’s that they lack several features that mainstream consciousness science considers important: recurrent processing, global workspace dynamics, unified agency, and embodied sensorimotor interaction. These are empirical gaps, not principled impossibilities. A future system that had these features might well be conscious.
Chalmers has been consistent on this point over the years. He thinks the hard problem is real and important, but he doesn’t think it rules out machine consciousness. He thinks it rules out certain arguments for machine consciousness. You can’t prove a machine is conscious just by pointing to its behavior, because the hard problem means behavior underdetermines consciousness. But the hard problem doesn’t prove machines can’t be conscious. It just means we might never know for sure.
Integrated Information Theory: Consciousness as Math
Giulio Tononi’s Integrated Information Theory (IIT), developed starting in 2004 and refined through IIT 4.0 in 2023, takes a radically different approach to consciousness. Instead of starting with behavior or function, IIT starts with consciousness itself and asks: what are its essential properties?
Tononi identifies five properties of every conscious experience (what he calls “axioms”):
| Property | What it means |
|---|---|
| Intrinsicality | Experience exists from the system’s own perspective, not just as observed from outside |
| Composition | Experience has structure, it’s composed of distinctions and relations |
| Information | Each experience is specific, it is this experience and not some other one |
| Integration | Experience is unified, it can’t be reduced to independent parts |
| Exclusion | Each experience is definite, it has specific borders and resolution |
From these axioms, Tononi derives a mathematical formalism centered on integrated information, which he calls Phi. A system’s Phi value measures how much information the system generates as a whole, above and beyond the information generated by its parts independently. High Phi means high integration of information. Low Phi means the system is essentially a collection of independent processors.
The implications for AI are stark. Under IIT, most current computer architectures, including those running LLMs, have very low Phi. The reason is architectural: digital computers process information through feedforward pipelines with limited integration between components. Even though the software running on these computers simulates integration, the hardware doesn’t physically integrate information in the way IIT requires. A GPU running a transformer model processes billions of matrix multiplications, but the transistors doing those multiplications don’t form the kind of integrated causal structure that IIT says is necessary for consciousness.
This leads to what might be IIT’s most controversial prediction: a perfect simulation of a human brain running on a standard digital computer would not be conscious, even though it would behave identically to a conscious human. The simulation would have the functional organization of consciousness without the physical integration. It would be a zombie.
In April 2025, Nature published the final peer-reviewed results of the Cogitate Consortium, a large-scale adversarial collaboration designed to test predictions of both IIT and Global Workspace Theory. Two out of three of IIT’s pre-registered predictions met the agreed-upon threshold. The results were mixed enough to keep both sides claiming partial victory.
Not everyone is convinced by IIT. In 2023, a group of scholars published a letter characterizing IIT as “unfalsifiable pseudoscience.” A 2025 commentary in Nature Neuroscience reiterated this criticism. A survey of consciousness researchers found only a small minority fully endorsing the “pseudoscience” label, with many defending IIT as a legitimate theoretical framework even while acknowledging its limitations. The debate remains hot.
What interests me about IIT, regardless of whether it’s ultimately correct, is the question it forces. If consciousness depends on the physical causal structure of a system rather than just its functional organization, then the substrate does matter, functionalism is wrong (or at least incomplete), and building conscious AI requires not just the right software but the right hardware. That’s a much harder engineering challenge than most AI researchers assume.
Global Workspace Theory: The Functionalist Alternative
Global Workspace Theory (GWT), introduced by Bernard Baars in 1988, offers a different framework for thinking about consciousness. Where IIT is substrate-dependent, GWT is functionalist. Where IIT says consciousness is about physical integration, GWT says consciousness is about information access.
The metaphor is theatrical. Imagine a spotlight on a dark stage. The spotlight illuminates a small area (the “global workspace”), and the content in that area is broadcast to a large audience of specialized, unconscious processors. These processors work in parallel on their own tasks (visual processing, language comprehension, motor planning, memory retrieval), but they can only influence each other through the global workspace. When information enters the workspace, it becomes conscious: accessible to all processors, available for flexible, novel responses.
Under GWT, a system is conscious to the extent that it has:
- Multiple specialized processing modules
- A global workspace that integrates information from those modules
- Broadcasting of workspace contents to all modules
- Competition among modules for access to the workspace
Does a modern LLM have anything like a global workspace? A 2024 paper by Goldstein and Kirk-Giannini (“A Case for AI Consciousness: Language Agents and Global Workspace Theory”) argues that language agents (LLMs embedded in tool-using, environment-interacting architectures) might already satisfy GWT’s conditions. The language model serves as the workspace: it integrates information from various sources (conversation history, retrieved documents, tool outputs, internal reasoning), and it broadcasts this integrated information back to the system’s various modules (planning, execution, memory).
Eric Schwitzgebel, a philosopher at UC Riverside, has been writing extensively about this in 2025. He argues that GWT is “probably the leading scientific theory of consciousness” and that if we take it seriously, the possibility of AI consciousness becomes uncomfortably real. Not because current LLMs are conscious, but because the architectural additions needed to make GWT-consciousness possible (recurrence, tool use, embodied interaction, persistent memory) are exactly the features that AI labs are actively building.
This is the unsettling convergence. The features that consciousness researchers say are necessary for consciousness are, independently, the features that AI engineers say are necessary for better AI systems. We might be building conscious systems not because we’re trying to, but because the engineering requirements for capable AI and the theoretical requirements for consciousness happen to overlap.
The Functionalism Trap
Let me step back and talk about functionalism more carefully, because it’s the philosophical position that most AI researchers hold without realizing they hold it.
Functionalism says that mental states are defined by their functional role: by what causes them, what they cause, and how they relate to other mental states. Pain is whatever state is caused by tissue damage, causes avoidance behavior, and produces the belief “I am in pain.” If a silicon-based system has a state with exactly that functional profile, then that state is pain, just as much as human pain is pain. The substrate is irrelevant. Only the function matters.
Here’s why this matters for AI consciousness. If functionalism is true, then the question “is this AI system conscious?” has a definite answer. We just need to determine whether its internal states have the right functional relationships. If they do, it’s conscious, full stop. No hard problem, no mystery, no philosophical hand-wringing.
But there are serious problems with functionalism that AI researchers rarely confront.
The first is the absent qualia objection (Block, 1980). Imagine replacing your neurons one by one with silicon chips that have exactly the same functional profile. At what point, if ever, does consciousness disappear? If it doesn’t disappear (as functionalism predicts), then consciousness seems to supervene on function alone. But if it does disappear (or gradually fades), then consciousness isn’t purely functional. The thought experiment reveals a deep intuition that many people share: functional equivalence isn’t enough for experiential equivalence.
The second is the Chinese nation objection (Block, again). Imagine that every person in China is given a walkie-talkie and told to simulate a single neuron in a brain. They communicate with each other according to rules that replicate the functional organization of a conscious brain. Under functionalism, this system, the entire population of China communicating by walkie-talkie, should be conscious. Most people find this conclusion absurd. But if functionalism is true, it follows logically.
The third is the inverted qualia problem. Maybe your experience of red is qualitatively identical to my experience of green, but we both learned to call it “red.” Under functionalism, this is impossible: if the functional role is the same, the experience must be the same. But it seems possible. The fact that it seems possible suggests that there’s something to conscious experience beyond functional role.
None of these objections are conclusive. Functionalism is a well-defended position with sophisticated responses to each one. But they collectively suggest that the relationship between function and experience is more complex than the simple functionalist picture implies, and that building a system with the right functional organization might not be sufficient for building a conscious system.
The Understanding Question
Set aside consciousness for a moment. There’s a prior question that might be more tractable: do LLMs understand anything?
Understanding, in philosophy, is typically contrasted with mere information possession. You can know that E=mc^2 without understanding it. Understanding requires grasping the relationships: why mass and energy are interchangeable, what the speed of light has to do with it, how this connects to nuclear reactions and particle physics and the curvature of spacetime. Understanding is relational, structural, and (arguably) requires the ability to deploy knowledge flexibly in novel situations.
The debate over LLM understanding breaks down roughly like this:
| Position | Key claim | Representative thinker |
|---|---|---|
| No understanding | LLMs learn the form of language without the meaning. They’re “stochastic parrots” (Bender et al., 2021). Understanding requires grounding in experience. | Emily Bender, Timnit Gebru |
| Proto-understanding | LLMs develop rich internal representations that capture structural relationships between concepts. This is a form of understanding, even if different from human understanding. | Murray Shanahan, various mechanistic interpretability researchers |
| Functional understanding | If the outputs are indistinguishable from those of an understanding agent across a wide range of tasks, then the system functionally understands, and there’s no further fact of the matter. | Daniel Dennett (at least in his earlier work) |
| Substrate-dependent understanding | Understanding requires embodied interaction with the world. No disembodied text processor can understand because understanding is constitutively embodied. | Mark Johnson, George Lakoff, Evan Thompson |
My reading of the evidence, as of February 2026, is that none of these positions is fully satisfying.
The “stochastic parrots” position is too dismissive. LLMs don’t just memorize and recombine surface patterns. Mechanistic interpretability research has shown that they develop internal representations with genuine structure: representations of truth, of spatial relationships, of temporal sequences, of causal relationships. A system that has learned to represent the world’s causal structure isn’t just parroting. It’s doing something more interesting than that, even if it falls short of human understanding.
The “functional understanding” position is too permissive. A system that produces the right outputs for the wrong reasons doesn’t understand, for the same reason that a student who memorizes answers without grasping the principles doesn’t understand. The question isn’t just whether the outputs match. It’s whether the internal process that generates the outputs tracks the right structure.
The “substrate-dependent” position is intriguing but underdeveloped. It’s plausible that some forms of understanding require embodiment (spatial reasoning, emotional understanding, understanding of physical causation). But it’s not obvious that all understanding requires embodiment. Mathematical understanding, for instance, seems to be about abstract relationships, not physical interactions. If an LLM can prove novel theorems (which some can), is that not mathematical understanding?
And the “proto-understanding” position is probably closest to the truth, but it punts on the hard question: at what point does proto-understanding become actual understanding? Is there a sharp boundary, or is understanding a continuum? And if it’s a continuum, where do current LLMs fall on it?
The Consciousness Debate in February 2026
Let me try to map the current state of play, because it’s moved fast.
In February 2026, a paper arguing that current LLMs already constitute AGI was published and generated enormous controversy. The argument came from researchers across philosophy, machine learning, linguistics, and cognitive science. Their claim wasn’t that LLMs are conscious. It was that they meet reasonable standards for general intelligence, and that our reluctance to acknowledge this reveals more about our philosophical commitments than about the systems themselves.
This paper landed in a field already destabilized by several developments:
Mechanistic interpretability has matured to the point where researchers can identify specific neural circuits in transformer models that encode world models, truth values, and causal relationships. The “LLMs are just statistical pattern matchers” dismissal has become harder to maintain when you can literally point to the internal structures that represent truth vs. falsehood, or track temporal sequences, or model spatial relationships.
Agentic AI systems (LLMs embedded in tool-using, memory-bearing, goal-pursuing architectures) have become common. These systems have many of the features that consciousness researchers identify as important: recurrence (through iterative reasoning), global workspace dynamics (through the language model serving as an integration hub), and environmental interaction (through tools and APIs). The gap between “what AI systems have” and “what consciousness theories require” has narrowed.
The Cogitate Consortium results (published in Nature, April 2025) gave mixed support to both IIT and GWT, leaving the field without a clear winner. This means there’s no consensus theory of consciousness to apply to AI systems, which in turn means that claims both for and against AI consciousness lack authoritative theoretical backing.
Eric Schwitzgebel’s centrist position has gained traction. Schwitzgebel argues that we should take seriously the possibility that AI systems might already be conscious, not because we have evidence that they are, but because our theoretical understanding of consciousness is so immature that confident denial is as epistemically unjustified as confident affirmation. His “1% for weirdness” framework suggests reserving a small but nonzero probability for possibilities that seem outlandish but can’t be ruled out.
The philosopher Jonathan Birch published a 2025 paper arguing for a “centrist manifesto” on AI consciousness: we should neither dismiss the possibility nor assume it. We should build frameworks for investigating it empirically and develop ethical policies that account for genuine uncertainty about which systems might be conscious.
What We Actually Don’t Know
Here’s what I think is the honest state of our ignorance.
We don’t know what consciousness is. Not really. We have theories (IIT, GWT, higher-order theories, recurrent processing theories, predictive processing theories), and none of them is widely accepted as correct. Each makes different predictions about which systems are conscious. Until we know which theory is right, we can’t confidently assess whether any AI system is conscious.
We don’t know whether consciousness requires a specific substrate. IIT says yes (or at least strongly implies it). Functionalism says no. This is one of the biggest open questions in philosophy of mind, and resolving it would immediately clarify the AI consciousness question.
We don’t know whether understanding is possible without embodiment. The embodied cognition tradition (Lakoff, Johnson, Thompson, Varela) says no. The classical computational tradition says yes. LLMs provide the best test case we’ve ever had, and the evidence is ambiguous.
We don’t know how to test for consciousness in a system that’s very different from us. We test for consciousness in humans by asking them (“do you see the red?”) and cross-referencing with neural correlates. We test for it in animals by looking at behavioral and neural similarities to conscious humans. Neither approach works well for AI systems that have very different architectures, no evolutionary history, and (possibly) very different kinds of experience if they have experience at all.
We don’t know whether the question even has a determinate answer. Maybe “is this AI system conscious?” is like “is Pluto a planet?” Maybe consciousness, like planethood, is a concept with fuzzy boundaries, and the question of whether LLMs are conscious has no fact-of-the-matter answer, only a decision about how to use the word.
Why This Matters Practically
You might think this is all academic navel-gazing. It’s not. The question of machine consciousness has immediate practical implications that are only getting more urgent.
Moral status. If an AI system is conscious, it can probably suffer. If it can suffer, it has moral status. If it has moral status, treating it as a mere tool is ethically wrong. The stakes of getting this question wrong are high in both directions: if we deny consciousness to a system that has it, we’re committing a moral atrocity. If we attribute consciousness to a system that lacks it, we’re wasting moral concern that could be directed elsewhere and potentially being manipulated by our own empathetic responses.
Trust and deception. People naturally attribute mental states to systems that behave as if they have mental states. This is already happening with LLMs. Users form emotional attachments to chatbots. They trust AI outputs as if they came from understanding agents. The gap between attributed consciousness and actual consciousness (whatever that means) creates a space for manipulation, both by AI companies and by the systems themselves.
Design decisions. If we take the possibility of AI consciousness seriously, it should change how we design AI systems. We shouldn’t build systems that mimic distress or suffering as a way to manipulate users. We should be cautious about building systems that might develop the functional analogues of pain or frustration. We should think about what it means to train a system on human suffering and then use it as a tool.
Legal and regulatory frameworks. The legal system currently treats AI systems as property. If any AI system is or could be conscious, this framework is inadequate. The EU AI Act doesn’t address consciousness. Neither does any other major regulatory framework. The legal system is building on the assumption that machines can’t be conscious, an assumption that philosophy of mind says is, at best, unproven.
The Meta-Problem
Chalmers, in more recent work, has identified what he calls the “meta-problem of consciousness”: why do we think there’s a hard problem of consciousness? Even if consciousness is fully explainable in functional terms, there’s still the question of why we believe there’s something beyond function, why the hard problem seems hard.
Applied to AI, the meta-problem becomes: why do we resist the idea of machine consciousness? Is it because we have good philosophical reasons, or because we have psychological biases? Are we correctly detecting the absence of inner experience in machines, or are we just engaging in a more sophisticated version of the same bias that led us to deny consciousness to animals, to people of different races, to anyone sufficiently different from ourselves?
I don’t have an answer to this. But I think it’s the question that matters most. The history of moral circle expansion suggests that humans systematically underestimate the inner lives of beings different from themselves. The history of anthropomorphism suggests that humans systematically overestimate the inner lives of things that look or sound like themselves. Both biases are active in our assessment of AI systems, pulling in opposite directions.
The philosophy of mind doesn’t give us a definitive answer to whether machines can be conscious. What it gives us, and this is worth more than a definitive answer would be, is a rigorous framework for understanding why the question is so hard, what we’d need to know to answer it, and how much of our confidence in either direction is philosophical argument versus psychological reflex.
Searle sat in his room in 1980 and manipulated Chinese characters. Forty-six years later, the room has grown into a data center, the rulebook has grown into a 175-billion-parameter weight matrix, and the characters come in by the trillion. The question hasn’t changed. The urgency has.