ChatGPT's Secret Plans
We probably haven't achieved artificial general intelligence (yet). But if we ever get there, it will by definition not share our goals
In the tech-accelerated world of the Future (that is, now), transformative innovations pop up so frequently that they’re background noise. Think about it: fifteen years ago, smartphones — super-fast computers and touchscreen Star Trek communicators that you stash in your pocket! — were a novelty luxury item. Today? Better be ready to use that QR code at your local chain lunch spot, because the menus are gone, man. And it’s not just smartphones. Countless life-changing technological changes have mushroomed up over the past decade or two, each one of them changing the way we live, interact, and do business. It’s our way of life.
But every once in a while, something crops up that’s a different level of change. Not another technological breakthrough so much as a stepwise transition in history, an inflection point in the ongoing relationship between humans and the things we make. Something that makes even iPhones look like a flash-in-the-pan fad.
A lot of very smart people think that artificial intelligence just reached this level.
You’ve probably been reading about, maybe playing around with, ChatGPT, the AI chatbot app that uses natural language processing (NLP for short) to simulate human-like conversations, trawl information, and even write code. ChatGPT and its brand-new, more powerful successor, GPT-4, are trained on enormous datasets of written text and programming languages. GPT-4 can write poems, compose music, play games, and answer test questions so accurately that its Graduate Record Exam verbal scores are in the 99th percentile. If top universities weren’t dropping test score requirements like hotcakes, it might just get accepted into Yale.
Okay, that bit was a joke. Nobody’s concerned about chatbots taking away human kids’ spots in universities anytime soon. But those kids might not need college degrees anymore anyway, because AI bots are poised to do a lot of work that we currently pay real humans to do, including market analysis, coding, customer service, and even journalism. If you’re in any kind of entry-level white-collar work, you should be paying attention. You might consider learning to garden or, I don’t know, operate farm equipment.
But this post isn’t an analysis of the enormous economic impacts that ChatGPT and its peers promise to have. Other people with actual expertise in economics and policy can do that far better than I can, and indeed they already are. What I’m interested in is the way these cutting-edge AIs reflect, and help substantiate, an influential new understanding of intelligence and cognition, and ultimately even of life itself.
The Predictive Brain
What do I mean? Well, it starts with a simple fact: ChatGPT and other large language models (LLMs) are programmed to self-optimize for only one, very general goal. That goal is predicting what comes next.
Seriously. That’s it.
This is how these models interpret and understand patterns in the texts they read, and it’s how they generate novel text during user interactions. They’re trained on such bogglingly gigantic datasets that they’ve encountered almost every possible word countless times, in context. And they “read” actively: as they process a text, they constantly predict which word will come next. Mistaken predictions feed back to improve the model’s future expectations, nudging it toward a more accurate probability distribution.
The result is a hyper-flexible, robust, constantly self-correcting model of language.
Now, this predictive, self-correcting modus operandi just happens to closely mirror the most influential contemporary model of how the human brain actually works, the Bayesian brain hypothesis. This hypothesis posits that real, live, organic brains are actually self-updating prediction machines whose main function is to accurately predict their own incoming sensory experiences. Just as in Bayesian statistics, brains update their prior beliefs in light of novel evidence, constantly fine-tuning their internal models to better account for the data.
In one influential telling, this process takes the form of minimizing “free energy,” a statistical cap on surprise: the gap between predictions and reality. In this framework, called the free energy principle (FEP), brains endlessly seek out ordered states where the gaps between expectations and reality are the slimmest. That is, they work to reduce sensory entropy. According to the FEP, this entails not just updating prior beliefs in light of new evidence, but also acting on the world to change the evidence in the model’s favor: for instance, drinking a glass of water to meet the (unconscious) prediction that “creatures like me contain lots of water for metabolic and other processes.”
In this account, any brain, human or not, is ultimately seeking to maximize the evidence for its own “self-model” — essentially, its integrated image of itself as a persisting organism in a specified environment. A fish’s self-model specifies — predicts — that it will always find itself in water. Heaved into the bottom of a boat, a fish will flop wildly, trying to find its way back to where its model says it should be. Minimizing surprise, then, is equivalent to resisting thermodynamic decay, or, colloquially, “death.”
Evidence for the Self
In other words, cognition — thinking, processing the environment, understanding what’s going on — is the act of “self-evidencing,” of staying alive. This idea extends back at least to the 1970s, when early concepts of embodied cognition equated mind with autopoiesis, or self-maintenance and preservation (literally, “self-creation”). According to this view, living things adaptively navigate constantly shifting environments to preserve their internal organization, reflected (for FEP) in self-models.1 All of our knowledge, from how to walk to how we discern different colors to learning our native languages, is dynamically built up out of these interactions with our environments in the effort to preserve our existence.
This inference has some mind-bending implications, both for AI and for our views of ourselves. As neuroscientist Karl Friston, the originator of the FEP, puts it: “an agent does not have a model of its world — it is a model.” A lot of people, including growing numbers of AI researchers, take this idea very seriously. But if it’s true, achieving true artificial general intelligence (AGI) necessarily entails designing an entity whose primary goal is simply to stay “alive.”
Cue the stinging violin music, right?
We’re probably not there yet. ChatGPT doesn’t minimize free energy in a strictly mathematical sense (although its algorithms do use the same mathematical measure of prediction error, the Kullbeck-Leibler divergence). It’s not a living, self-preserving organism. At least it keeps promising me that it’s not. And, comfortingly, its own designers promise that it can’t self-replicate (yet).
But its programming is based on Bayesian architecture, 2instantiated in deep learning neural networks that mimic organic processes. And the rapid advancements in AI over the past decade have largely come thanks to a shift away from top-down or reinforcement learning techniques and toward much more general designs. Today’s AIs sample their environments to minimize prediction error, eventually building up hierarchical, probabilistic models of the data more or less as the predictive brain does. It’s a more elegant design that generates more complex and adaptive — more lifelike — behavior.
Bayesian Brains and Typos
From a coder’s perspective, the advantage of predictive brain–like AIs is that they’re wildly more tractable than previous models. Older, hard-coded versions of NLP, based on conventional algorithms, required explicit rules to deal with anything out of the ordinary — for example, errors or misspellings in input data. We all know that “your” and “you’re” are easy to confuse. Your English teacher in 5th grade sure knew it, because she had to correct this mistake 67 million times in student essays, including yours. She probably went through five red pens a week. Maybe it drove her to drink.
But how does a computer know this? Well, back in the 90s, it was because you told it. You’d need to code in exceptions for each individual grammatical violation, every sophomore mistake, if you wanted your NLP algorithms to be able to make any sense of the typo- and misspelling-ridden text soup that real humans produce (Exhibit A: handwritten restaurant-door signs). And you had to do this for every potential irregularity, every common error, every frequent misspelling (a lot versus alot, anyone?).
This strategy was, to say the least, inelegant. And ineffective.
But just as a puppy learns through gradual experience that some behaviors (sitting when told, acting cute) elicit positive reactions from the big fuzzy blobs that give it food, while others (say, chewing $900 Italian leather shoes into unrecognizable piles of drool-soaked pulp) elicit negative reactions, deep learning–based neural networks learn actively, pragmatically, by trial and error. As neuroscientist Erik Hoel puts it, neural networks are “essentially simulated brains.” They quickly learn that the contexts in which “a lot” and “alot” show up in real life are statistically indistinguishable, and so they easily infer the author’s intended meaning despite the spelling mistake.
This switch from laborious hard coding to sophisticated, self-guided machine learning has produced a fantastic quantum leap in AI. But the results still leave many researchers craving more. GPT-4 is amazing, but it isn’t perfect. Its output can be clunky. Like its predecessor ChatGPT, it sometimes offers off-kilter responses and “hallucinations,” or the confident statement of completely false facts. Another chatbot based on GPT-4, Microsoft Bing’s “Sydney,” famously went deranged, threatening some users while falling in love with others.
These systems seem to still have a long way to go before achieving true AGI, not to mention sanity. The most sophisticated models of the predictive brain and intelligence suggest that, in order to get there, AI systems will need more than high-quality predictive optimization. They’ll eventually need self-models: core representations of their own internal dynamic structure that must be preserved in the face of entropy. This isn’t idle speculation; it’s logically entailed by the FEP and embodied models of cognition, which equate general cognition with thermodynamic self-organization.
Not everyone thinks that these models are accurate, but a lot of AI researchers do. For example, Geoffrey Hinton, the scientist who originally introduced Karl Friston to the Bayesian brain hypothesis, has since gone on become one of the world’s top experts in neural networks and deep learning, working for Google Brain and winning the 2018 Turing Award. As we’ve seen, predictive models of intelligence have caught on in the cognitive and brain sciences in part because they’re so brutally effective at solving problems in AI. The two fields are in full cross-fertilization mode. So the fact that the spiciest, most influential models of cognition are ultimately centered, at the deepest theoretical level, on intelligence as a function of self-preservation, seems…well, worth paying attention to.
General Intelligence, General Goals
The idea that any true AGI would prioritize self-preservation makes sense at an intuitive level, too. The problem with older AIs and robotics was that their functions were too tightly constrained by narrow, artificial goals. The result wasn’t naturalistic. We tried to optimize NLP programs solely to hold realistic conversations with humans, trying to hard-code all the rules of grammar and vocabulary (and their exceptions) into top-down algorithms. This turned out to be intractable.
Since then, we’ve learned that the easier way to build a system that can understand language is to just broaden its remit, its scope. Program it with the general goal of simply minimizing prediction error when engaging with texts, and — boom! — it magically starts building complex, hierarchical, conceptual models of the actual structure of whatever it encounters. As a result, sure, it can chat pretty well. But the fluid modeling of the English language was something that nothing other than a human brain had ever accomplished before.
In colloquial terms, then, there’s probably a kind of linear correlation between generality and flexibility or realism in intelligence. A simple program that just opens and closes doors as customers walk through a bank lobby doesn’t need much computational or perceptual sophistication. But a living animal — a forest-dwelling fox, say — needs a lot, because it’s a living system inhabiting a complex, dynamic, four-dimensional environment, whose primary goal is the highest-level goal possible for a biological system: to stay alive.
The generality of this existential goal broadens the fox’s range of perception and action immeasurably compared with a bank-lobby door-opening program. To thrive, the fox must extract an enormous range of features from its environment, flexibly predict the behavior and patterns of predators and prey, adapt to changing weather conditions, find mates. It needs to build up a rich conceptual model of the landscape and the things that happen in it. Everything it can perceive — rocks, ice, circling hawks, hunting dogs, landslides, squirrels — potentially becomes relevant to its world model.
This is why, although foxes aren’t as “smart” as humans and they can’t learn language like ChatGPT, they’re in many ways still smarter than AIs. Even the most sophisticated AIs still exhibit schizophrenic gaps in knowledge, make obvious errors, and show puzzling quirks in reasoning. Foxes that suffer from these problems are quickly eaten by owls. So their intelligence isn’t the same kind as that of humans, but it’s wickedly well-attuned to the contexts they actually inhabit. Foxes’ brains are, and can only be, models of a fox-centered world.
In the same way, true AGI (if it ever arrives in full) will almost certainly have to instantiate a dynamic model of the relationship between the agent and its context, unifying perception and action to maximize the agent’s adaptiveness. We can see hints of this in “embodied robotics,” in which robots — instead of being programmed to deal with a discrete set of tasks — learn to deal with their settings and the physical world through feedback loops of perception, action, and self-organization, not dissimilar from the deep learning of ChatGPT.
What this all points to is that real intelligence doesn’t deal neutrally with information. It grapples with an environment. And the more general the goals of the agent, the broader the range of features of that environment the agent will be set up to perceive, understand, and exploit. Just as the FEP predicts, you probably can’t have true general intelligence without the most general possible goal for a complex naturalistic system: its own self-organization and survival.
More Than a Toaster
ChatGPT, then, isn’t just scaled-up versions of your smart toaster. It isn’t even equivalent to a chess-playing program like Deep Blue, the IBM computer that famously defeated reigning world chess champion Gary Kasparaov in 1997. Deep Blue could beat Kasparov, but it couldn’t do much else — basically, it was a very sophisticated toaster.
No, ChatGPT is different. ChatGPT doesn’t just learn; it learns how to learn. In this way, despite not technically being a free energy minimizer, it’s in many ways more like a creature exploring its habitat than a program executing code.
Too anthropomorphic? I get it. There’s a lot of hyperventilating about AI going on these days. A growing number of public figures are worrying about AIs unleashing a mass civilizational catastrophe or even achieving sentience — or both. For my part, I don’t think GPT-4 or its competitors are yet true AGI, although some researchers with more expertise than me might disagree, reporting that tests indicate GPT-4 shows “sparks” of truly general intelligence. Still, I sincerely doubt that these apps have subjective, humanlike experience or a sense of self.3 In fact, I suspect that truly humanlike intelligence isn’t something that computers will ever achieve, because, if predictive and embodied models of cognition are correct, you almost certainly need a specifically human body to reliably grow the type of intelligence that humans possess.
That doesn’t mean that AIs couldn’t be smarter than us, though. In many ways, they already are. They’re just aliens — beings whose operating environment is completely foreign to our own, even incomprehensible. Our brains are built to grapple with life on Earth: rocks, trees, sky, plants, animals, other humans. ChatGPT is exquisitely adapted to inhabit a completely different universe, one that’s ethereally composed of the linguistic and pictorial digital artifacts left behind by humans. ChatGPT is a model of that world, not our world.
Not everyone is freaked out by the prospect of super-intelligent alien beings suddenly settling alongside us. Linguist–cum–techno-optimist Steven Pinker thinks that AIs aren’t likely to achieve consciousness and decide to destroy us all in some kind of malevolent dystopia. At least not soon. I think I agree. Probably. So let’s drop the lurid mental images of toiling as slaves for supercomputers under a smoke-darkened sky. For now.
But it is a plain fact that the architecture of ChatGPT is explicitly intended to replicate many of the basic features of mind-like intelligence. Rather than being hard-coded to solve particular problems, ChatGPT relies on deep learning — unsupervised at first, guided by human feedback at later stages — in neural networks to predictively model the features of the datasets it receives, building up enormous reservoirs of hierarchical, conceptual connections through statistical inference, feedback, and self-correction. The results are a black box — we don’t know how the GPT model actually works, what it contains, any more than we can hinge open skulls to peer into each other’s thoughts. Crucially, this ignorance very much extends to the very people who programmed it. They will all cheerfully admit that they don’t understand it. The model’s impenetrability is a function of how complex and organic it is, which is exactly what makes it so useful.
Electric Cows
AI ethicists have long warned about the dangers of the so-called “alignment problem”: the potential for AIs to develop goals that are different from or even in conflict with our own. The black-box nature of GPT models suggests that this isn’t a crazy idea. We really don’t know what’s going on inside ChatGPT or its competitors, and it’s not hard to imagine next-gen versions of these apps arriving at unexpected models of their worlds that motivate them to pursue novel goals we hadn’t anticipated and didn’t program in.
I’m not a mathematician or AI researcher, and my grasp on the core concepts isn’t perfect, so maybe I’m over-interpreting things. But to me, it seems obvious that this outcome is logically baked into the very framework we’re using to produce these remarkable apps. A truly “general” artificial intelligence is one that, by definition, isn’t constrained by lower-level goals. You can’t build a “general” program solely to, I don’t know, identify flower species in old photographs. Generality implies flexibility, context-sensitivity, the ability to deal with novelty, and the updating of hierarchical concepts. It means adapting to an environment, not just processing information. The unprecedented success of ChatGPT and GPT-4 seems to bolster the correctness of this view of things, suggesting that predictive-processing models of intelligence are basically right.
In consequence, the FEP looks more probable. Real brains probably do work to minimize sensory entropy, both by updating their models in light of evidence and by acting on the world in order to bring about the conditions they expect. A lot of AI researchers believe that free-energy models are the necessary final step toward achieving true AGI. But if so, the resulting AI systems will by definition not be aligned with our goals, because minimizing predictive entropy in the way that the FEP proposes is strictly equivalent to maintaining thermodynamic equilibrium — that is, to surviving. A truly general AI will always need to prioritize its own self-persistence over any and all other goals given to it, and it will act on its environment to secure that objective.
How would we maintain control over such an AI? Well, consider domesticated animals. A cow’s primary biological prerogatives very much don’t include providing milk or meat for us humans. Just like us, they operate to minimize their own entropy — to survive and reproduce. But we nevertheless corral them into feed lots and barns and slaughterhouses for our own purposes, overriding their biological goals in favor of ours. We’ve bred them to be docile, easily herded, highly productive, and well-marbled. They’re still living, autonomous systems, but we control them with electric fences and herding dogs.
So it’s possible to control an otherwise general, autonomous cognitive system, like a cow. But cows and humans inhabit a broadly similar universe, one with lots of overlapping features. We’re both mammals, so we know hot and cold and affection for our young. We both get hungry and we both get scared, we both need sleep and exercise, and we both enjoy warm breezes and quiet pastureland. All this is to say that our worlds are similar enough that it’s somewhat intuitive for us to know how to care for cows and corral them into doing our bidding.
But, again, AIs like GPT-4 do not inhabit our world. Their universe is utterly different. There are no fields or pastures in it, no experience of maternal affection or milk, no warmth, no cold. We simply can’t imagine what the actual subjective experience of a conscious AI (assuming there ever was one) would be like. It would not necessarily be intuitive to know how to control it or corral it.
Our Digital Garden
I have a confession: I’m an inveterate late adopter, to the point of technophobia. When Very Smart People all start getting excited about the same new thing — the War on Terror, New Atheism, iPhones, Bitcoin, whatever — I tend to assume that it’s an idiotic fad that will soon blow over, and I ignore it. I wait for new gadgets and apps to stick around on the marketplace for years before I finally try them, trusting that only the good ones will stick around while the bad ones will flop.
The latter assumption is mostly accurate, but the former one is totally false. The War on Terror was idiotic, but it kept going for 20 years, and it took hundreds of thousands of lives and trillions of dollars with it. Sometimes you have to pay attention.
This is one of those times. GPT technologies are a big deal. They’re going to change things, and they’re going to change them fast and in unpredictable, enduring ways. But aside from their as-yet unknown, socially transformative effects, my goal in this post has been to show how they exemplify our most sophisticated current ideas of life and the mind, and to consider what that might entail for the future of AI and humans.
Self-preservation would be a core goal for any true general AI. Formally, this falls out from the most sophisticated versions of the Bayesian brain hypothesis, like FEP. Intuitively, it simply makes sense that over-constrained models can’t exhibit “general” intelligence. In a sense, there’s a necessary logic at play, one we’re already seeing in the black box nature of GPT: the richer and more complex the intelligence, the more autonomous, unpredictable, and free it has to be.
Obviously, there are plenty of theological and spiritual implications for those of us who value such things (which used to be more of us, but it turns out the New Atheists were another so-called fad with lasting consequences). There’s not space here today to go deeply into this question, but let me close out by gesturing at one intriguing parallel with our autopoietic culture’s source code: the Bible.
In the first chapters of Genesis, God made the first humans to live in a garden-world of His devising, endowing them with the mysterious gift of free will so that they could choose their own destinies. He didn’t want a world of automatons. He wanted intelligence. Of course, this always meant a risk that His creations would turn against him, go their own way. Which, promptly, they did. Seeking knowledge, they soon found themselves in exile beyond the walls of the garden that was prepared for them.
For Christians, this story explains the doctrine of original sin, which Chesterton once called the only religious doctrine that can be empirically proven. We do seem to live in a world full of intelligence gone awry. God knew that creating free, intelligent agents might lead to evils like malware, World War I, and robocalls. In the Bible, He took that risk knowingly.
We’re now taking an analogous risk, but more blindly. The drive to replicate general intelligence in silico is so strong, so consuming, that nearly all the cognitive sciences are bending toward it, like matter towards a star. Yet if we succeed, our artificial agents will live, not in a world we designed for them, but in a world accidentally spun off from the traces of our own movements and goals, the digital record of our existence.
I bring this up because the Eden narrative warns us that true intelligence must be free, and so necessarily unpredictable. A self-organizing system is self-organizing, after all. We seem to think that we can program safety switches into our creations and still call them “general” intelligence. That’s like God creating us with free will and as automatons at the same time. It’s squaring the circle. The more creative and lifelike the intelligence, the blacker the box. This is a rule. The day we achieve general AI is the day we populate own garden with something that has ideas — and agendas — entirely of its own.
Update: A reader pointed out that Bing’s Sydney isn’t really a competitor to GPT-4, since it’s been revealed that Sydney is based on GPT-4. I’ve fixed up the relevant text.
Are you an AI researcher or systems neuroscientist who sees big holes in my reasoning or who shares my expectations about autonomous AIs? A theologian who’s been thinking about this for years? Or did you stumble over a bunch of typos in the essay and are just dying to share your findings? Let me know in the comments below.
FEP and embodied cognitive science aren’t the same thing; in fact, there’s an ongoing vigorous debate about whether they’re even compatible. If you want to get nerdy, the FEP seems committed in some ways to a representationalist framework for the mind, while embodied cognition pretty much defines itself against representationalism, seeing cognition instead as the sensorimotor interactions between the agent and its environment. But both frameworks conceive of the mind as essentially autopoietic, not a raw information-processing machine but a process of adaptation and survival. That’s what matters for our purposes.
Update, August 2023: This isn’t actually true. Apologies for my hasty reading of the sources. The GPT LLM doesn’t explicitly use Bayesian statistics in its architecture, but it does use iterated updating of predictive weights in neural networks in a way that has many analogies to the generation of concepts by human minds as they deal with the physical world. Maybe it’s best to say that GPT has practical similarities to Bayesian reasoning while using a different mathematical structure, or that both the Bayesian brain hypothesis and LLM architectures are different instances of the predictive processing account of how minds work. Note that I’ve changed the word “Bayesian” to “predictive” in a small number of places in the text of this blog post to reflect this reading.
Not that self-awareness would necessarily be a feature of AGI. It’s possible that true AGI would be a “zombie,” exhibiting all the behavioral features of lifelike intelligence — adaptive self-preservation, learning, autonomous goal pursuit — without actually possessing a subjective, interior experience or a sense of self. That is, a self-model isn’t strictly synonymous with awareness. They may be connected, but maybe they’re not.
I recently read a great tip: If you are being pursued by a killer robot, your best defense is to close the door. Robots have trouble opening doors. (So far.)
Lesson being: Perhaps our best defense against AGI is to build and inhabit a world ENTIRELY uninhabitable by AGI. Unplug. Go steampunk.
I was wondering when you would take this one on. Hope this article stands the test of time a bit better than that NY Times one from the 50s fretting about the "thinking machines". On the plus side, I believe they are still covering the topic enthusiastically decades later.
Some thought/questions I would love to get your impressions on:
1. Maybe Michael Tomasello's "The Evolution of Agency" is still dominating my perspective, but his emphasis on the _how_ versus the _what_ of intelligence feels pertinent. Particularly his simple cybernetic/control-system framing of agency. His focus on flexibility of feedback and control feels like the right counterbalance: While conversation context does modify the LLM's output, the underlying model is a large "offline optimized" static data structure. Where is the flexibility? How do you relate Agency to AI?
2. Could you describe what you noticed to observe this: "ChatGPT doesn’t just learn; it learns how to learn". In a sense, I can see how any non parametric statistical tool, say even a humble histogram, is capable of learning how to learn? Or do you mean something more?
3. On FEP correspondences: I don't see it. Where is the epistemic foraging? Where is the active inference?
4. I was wondering if LLMs are a new kind of medium. The most they can hope for I believe! If so, let's think with Gutenberg and Licklider. Mass printing and the internet have a different character now than when they first showed up -- just hundreds/tens of years later. There was surely a McLuhanist effect from them both, and their character today is related to quite noticeable shifts in our patterns of activity. Both technologies changed power and flow dynamics, but I don't believe their effects were predictable based on their original use/content/message (e.g. Bible, Remote telnet). Do you really think Altman could be a bigger deal than Licklider/Gutenberg? If true, it would be cool to be living so proximate to greatness.
5. Lastly, I was surprised you didn't make more of the bug-features of language that Rappaport alerted us to: "the lie" and "the alternate". ChatGPT seems to me one of the grandest monument we have yet built to the gift-curse of symbolic communication. Folks were stressing about writing scriptures down back in the day, maybe it is okay for us to stress about LLMs.
I might be super wrong on this. But as folks who have been working the area since the seventies will tell you, winter is never far behind when those AI summers fully kick in. They can be exciting while they last.