100 years of artificial intelligence — from Enigma to Claude Code · Wiki

September 2012. In a bedroom at his parents' house, twenty-six-year-old Alex Krizhevsky is teaching a program to recognize photographs. His entire setup is two gaming graphics cards. To him it's just a side project — friends talked him into entering something in a competition. He doesn't yet know that he's about to lay the foundation for one of the most important inventions in history.

I'll walk you to that bedroom — and to what came after. This is a hundred-year story of how machines learned to think: from breaking codes in wartime, through two winters in which the whole field nearly died, to a tool that today writes software for people with no programming background.

War and the first thinking machine

1939. Britain is losing the Battle of the Atlantic. German submarines are sinking ships faster than the Allies can replace them. The Allies, however, get hold of information about a German cipher machine that relays military orders and tactical instructions. It's called Enigma. If they can break its cipher, the Allies will know the positions of the U-boats and the plans of attack.

But the Allies aren't starting from scratch. Enigma had already fallen back in 1932 — broken by a team at Poland's Cipher Bureau: the mathematicians Marian Rejewski, Jerzy Różycki, and Henryk Zygalski. Drawing on permutation theory, Rejewski reconstructed the machine's internal wiring without ever having one in his hands; the team built its own code-breaking tools — Zygalski sheets and an electromechanical cryptologic bomb, from which Turing would later borrow both the name and the starting point for his own, more powerful machine. In July 1939, a few weeks before the invasion, the Poles handed their methods and working Enigma replicas to British and French intelligence. Bletchley Park set off with the Polish work in hand.

The challenge doesn't go away, though: Enigma has more than a hundred quintillion possible settings — far too many for any team of humans to break by hand. The British call in one of the country's greatest mathematical minds: Alan Turing. His sole task is to break Enigma.

Over the next year Turing designs an electromechanical machine — the Bombe. The Bombe runs through thousands of Enigma settings at once, using guessed fragments of messages to catch contradictions and rule out impossible keys. That narrows millions of possibilities down to a handful the team can actually work through. By the end of the war, more than 200 Bombes are running across Britain, breaking over 4,000 German messages a day. The broken cipher reveals strategic intelligence that helps turn the Battle of the Atlantic around and shortens the war by two to four years.

The Bombe, however, has the weaknesses typical of machines of that era. It's packed with vacuum tubes — glass bulbs that control the flow of current and burn out constantly. Its mechanical switches are slow, and you can't reprogram the device without rewiring the connections by hand. After the war, most Bombes were dismantled or scrapped.

Turing didn't abandon the question of what a thinking machine might be, though. In one paper he proposed something he called the imitation game. In his view, science should stop asking whether machines can think and start asking what would prove that they can. If a machine communicating purely through text can convince a person that they're talking to another human, that can count as intelligence. Turing died at 41 and never got to take the idea any further. But the seed had been planted.

How the field got its name

After his death, the idea of thinking machines lived on in scattered labs. Research sprang up across mathematics, psychology, and electrical engineering. That was a problem: without a shared name there's no shared community. And without a community there's no funding, no university programs, no way to draw in new researchers. A field without a name effectively doesn't exist.

A young professor named John McCarthy set out to change that. He believed that if you put the right people in one room for a whole summer, they'd settle on a single name for the field together. In 1955 he proposed the project and secured funding from the Rockefeller Foundation, gathering signatures from researchers at the world's top institutions — Harvard, IBM, Bell Labs. Among the signatories was Claude Shannon. Remember that name; it comes back to this story in the 2020s.

In the summer of 1956, about ten people met at Dartmouth to name the field that studied thinking machines. They had a few options. They chose "artificial intelligence" — because it sounded ambitious and like something worth funding.

Two ideas for a thinking machine

By the late 1950s the field had a name and money. But it was split by a disagreement over how to actually build a thinking machine. The dispute went all the way back to high school, between two students at the Bronx High School of Science: Marvin Minsky and Frank Rosenblatt. They argued about it constantly, and the disagreement only sharpened with age.

Minsky's idea was a set of rules. To create a thinking machine, you give it rules: if you see this, do that; if you see something else, do something else. The chain of rules grows until the machine can handle every possible situation. The assumption is that intelligence is logic, so if you write down enough logic, you get intelligence. This approach was called symbolic — the machine manipulates symbols according to hand-written rules.

Rosenblatt argued exactly the opposite. An intelligent machine shouldn't follow orders; it should work the way the brain does: billions of neurons wired together, switching on and off as we think. So the machine should be built from artificial neurons that tune themselves by looking at thousands of examples and work out the rules on their own. This approach was called a neural network — and it was the first to build something that learned.

In July 1958, Rosenblatt built a working version of his idea and called it the perceptron. It's the simplest possible neural network. It ran on a room-sized IBM 704 and used a 20-by-20 grid of light sensors wired to adjustable connections. Those connections worked like motorized volume knobs that the machine turned up or down on its own as it learned. After roughly 50 attempts, the perceptron taught itself to tell two kinds of punch cards apart. Think about that: in 1958 a machine learned it on its own. Simple as it was, it bordered on a miracle.

The US Navy funded the project and held a press conference. One of the most famous lines came out of it — a quote in The New York Times: the Navy expected the device to be the embryo of a computer that would learn to walk, talk, see, write, reproduce itself, and be conscious of its own existence.

Abstract: a grid of light sensors feeding glowing adjustable connections like motorized knobs, in a cool palette.

Two winters

For the next 11 years, Rosenblatt and Minsky argued at conferences in front of researchers and graduate students. Rosenblatt claimed neural networks could do almost anything; Minsky, almost nothing. And in 1969 he proved it. Together with a colleague at MIT he published the book "Perceptrons." In it he proved mathematically that Rosenblatt's machine had a hard ceiling: there are basic patterns it will never recognize, no matter how long you train it. The math was correct, and it made the entire neural-network research program look like a dead end.

Rosenblatt never got to defend his idea — he died in 1971. His approach died with him for years. Within a few months the US government cut funding for neural networks and redirected it to Minsky's symbolic camp.

But discrediting neural networks didn't prove the symbolic approach worked. Less than two years later, the British government sent a mathematician to check whether AI research was producing anything useful. It wasn't. Speech recognition was a joke, and translation systems didn't translate. The mathematician wrote a report saying that the whole promise of human-level artificial intelligence was an illusion. Funding from Britain and the US collapsed, and the first AI winter set in — years of silence in which no one wanted to fund something that produced no results.

In 1980 the industry changed course. It stopped tackling the problems governments set and turned to commercial ones. Carnegie Mellon University unveiled the first large commercial AI system: XCON. It had one job — to do one tedious thing exceptionally well. When a customer ordered a custom DEC computer, someone had to work out which of the millions of possible component combinations fit together. Humans do that slowly and make mistakes; XCON did it flawlessly in seconds. By 1986 it was saving DEC tens of millions of dollars a year. It was called an expert system — a program that imitates a human expert at one narrow task, guided by thousands of hand-written rules. Minsky's symbolic approach was finally good for something.

Through the 1980s the whole industry tried to clone XCON across every field at once. One expert system diagnosed bacterial infections, another analyzed chemical compounds, another helped geologists find mineral deposits. All of them ran on Lisp machines — specialized computers built to run code written in Lisp, the language these systems were written in. By 1985, Fortune 500 companies were spending over a billion dollars a year on them.

Two years later that same rapid growth collapsed just as fast. Expert systems were brittle. They were excellent at the narrow task they were built for, but failed outside it. Every new, odd situation required a new rule, and maintaining the rules required a whole team. And even when the team added a rule, it could clash with another one — and the whole system fell apart. You could keep pouring money into experts, but that became pointless the moment ordinary workstations — from Sun Microsystems, for instance — hit the market. By 1987 they did the same thing as Lisp machines for a fraction of the price. There was no reason to spend $70,000 on a Lisp machine when a $10,000 Sun workstation ran the same program. The half-billion-dollar AI hardware market collapsed within months, and Lisp Machines went bankrupt. The symbolic approach went from the future of AI to its own downfall. So began the second AI winter.

Abstract: a frozen landscape in cold blue light, a dormant chip under ice, and a single warm green spark beginning to glow.

The return of neural networks: backpropagation, GPUs, and data

A year before the symbolic approach fell, neural networks began to come back. In 1986, when expert systems were at their peak, neural networks were still seen as career-ending research. Three people believed they were still the best path, though. Geoffrey Hinton and two co-authors published a paper showing that the problem Minsky had pointed out could be solved.

Minsky's argument went like this: if you stack many layers of neurons and the network answers wrong, there's no way to tell which neuron in which layer was at fault. And if you don't know what to correct, you can't train the network. The solution turned out to be simple — you work backward. When the network makes a mistake, you propagate that error back through all the layers of connections. Each connection takes a share of the blame in proportion to how much it contributed to the error. And once you know who to blame, you know what to adjust — and you can train networks with any number of layers. This is backpropagation. Rosenblatt never got to fix his idea; Hinton and the others carried his work forward. The backpropagation paper later became one of the most-cited in the history of AI.

The networks still weren't producing results that would change the field's fortunes, though. The reason was simple: the mathematical solution worked, but the hardware to compute it didn't exist yet. Training a multilayer network took the computers of the day weeks or months. The gaming industry solved that problem. Nvidia's graphics cards became astonishingly powerful in the 2000s — and it turned out that the kind of computation a graphics card does is exactly the kind a neural network needs. Twenty years on, the computing power had finally arrived. (A GPU is a graphics processor — a chip that computes thousands of simple operations at once, ideal for multiplying the numbers in a network.) Networks could now be trained on GPUs in days instead of months.

The second problem remained: data. To teach a network to recognize, say, a cat in a photo, you have to show it hundreds of thousands of examples — different angles, lighting, breeds, backgrounds. The networks of the 2000s never saw enough of them to learn anything. The computer scientist Fei-Fei Li gathered a group of graduate students and built the largest set of labeled images in history. By 2009 the set she called ImageNet had over 3 million labeled photographs, and by 2010, 14 million. ImageNet solved the data problem for computer vision. For the first time, researchers had 1.2 million labeled images across a thousand categories — enough to actually train a deep network to recognize real-world objects.

AlexNet, AlphaGo, and the transformer

The power problem, solved. The data problem, solved. The age of machine learning had begun. An annual competition grew up around ImageNet, in which every lab in the world tested its best system on the same 1.2 million images. In 2010 the best system was wrong 28% of the time. In 2011 another team got it down to 26%. Progress was slow.

And here we come back to that bedroom. Alex Krizhevsky, a graduate student in Toronto, entered the competition — but he didn't take the well-worn path. Other teams hand-coded rules telling the network what to look for: edges, corners, textures, shapes. Krizhevsky wrote no rules. He fed the network the entire ImageNet database and let it work out for itself which features mattered. The machine developed its own theory of vision. He called his system AlexNet.

When the competition ended, AlexNet was wrong 15% of the time — 11 points better than the previous year's winner (here lower is better, because it's the error rate). It didn't just win. It made every other approach look obsolete, and it proved that machines really can learn. The result spread through the field almost instantly. Within a year, every serious image-recognition lab on Earth was using neural networks. Within another year, Google, Facebook, and Microsoft had poached most of the top deep-learning talent from universities. The AlexNet architecture helped rebuild products like Google Photos, Google Lens, and Google Search. Overnight, the field went from forgotten to being the future of the big technology companies.

That was only the beginning. A small London lab, DeepMind, caught the world's attention by building a system that taught itself to play Atari games from scratch. Both Facebook and Google wanted it — Google won, acquiring DeepMind in January 2014 for around $500 million. Two years later, DeepMind turned the idea of a thinking machine into reality. In March 2016 its program AlphaGo played a five-game match against the professional Go player Lee Sedol. A typical game of Go has more possible board configurations than there are atoms in the observable universe, and Lee was an eighteen-time world champion. In the second game, AlphaGo did something no one could explain: it placed a stone in a spot no professional would ever have considered. Commentators first assumed the system had glitched. It took Lee more than 12 minutes to find the right response. AlphaGo won that game. That one move showed that AI can generate new ideas — it had developed its own theory of what a good position in Go looks like. Three years later, Lee retired from professional Go, saying that even if he became number one, there is an entity that cannot be beaten.

And just a year after AlphaGo came the biggest change in the history of AI. In June 2017, eight Google researchers published a paper titled "Attention is all you need." In it they proposed a new neural-network design they called the transformer. Its core idea is this: the old language networks read text the way a human does — word by word, left to right, remembering what came before. That's slow and loses context over long passages. The transformer reads all the words in a sentence at once, in parallel. Instead of turning a book page by page, it reads every page simultaneously. The transformer was built to speed up translation between languages. Its eight authors didn't know they had just created what we now call AI.

The race: ChatGPT, Claude, and Claude Code

Researchers at OpenAI noticed that you could take half of the transformer — the half good at generating text — and train it on a single task: read a passage of text and predict the next word. They fed it enormous datasets from the web, books, and code, and had it predict billions of times. In June 2018 they released their first model, GPT-1, then GPT-2 in 2019 and GPT-3 in 2020. These systems could write code, summarize documents, draft emails, and answer questions from a single prompt. Two years later, OpenAI wrapped the technology in a simple chat window and released it to the world as ChatGPT.

That was the moment the field changed for good — and probably the moment many of you first encountered AI. ChatGPT reached a million users in 5 days, a hundred million in 2 months, and became the fastest-growing consumer app in history. Microsoft announced a $10 billion investment, and Google declared an internal "code red," panicking that its core — search — might be dethroned. For the first time since 1956, AI had reached ordinary people. The gold rush had begun.

On March 14, 2023, Anthropic publicly launched Claude, its first assistant — named after Claude Shannon, whom you met earlier in this story, at the table in Dartmouth. Eight months later, in December 2023, Google answered with Gemini. And the money started moving. By the mid-2020s, Microsoft had committed around $13 billion to OpenAI, Amazon gave Anthropic about $5 billion, and Google another $2 billion or so. Three companies set off in a race for dominance, but each bet on something different. OpenAI leaned harder into everyday users — ChatGPT got voice, vision, memory, image generation. Google wired Gemini into its entire ecosystem. Anthropic took a completely different path: it focused on developers.

In June 2024, Claude 3.5 Sonnet shipped with Artifacts — a side panel that showed generated code live, as it was being written. Within a few months Claude had become the model of choice for serious developers. In February 2025, Anthropic released Claude Code — a command-line tool that can read a project, edit files, run commands, and build software locally. That made OpenAI and Google realize the market they were losing. OpenAI launched its own coding tool, Codex, but Claude Code stayed ahead of the competition and by November 2025 was bringing in over a billion dollars a year — one of the fastest revenue jumps in the history of software, just half a year after launch. Even Microsoft, after restructuring its partnership with OpenAI, committed up to $5 billion to Anthropic. Google tried to keep up with tools like Antigravity. But by the end of 2025 the picture was clear: ChatGPT still ruled the consumer market, but among the people who were building something with AI, Claude Code dominated. People with no programming background started assembling complete software in a single weekend — and that's when the term vibe coding entered the mainstream. In April 2026, Amazon committed up to another $25 billion to Anthropic, and Google added a further $40 billion.

A hundred years separate the bedroom in Toronto from the room at Bletchley where the Bombe spun — and what connects them is simpler than it seems. Every time AI moved forward, it wasn't a cleverer rule, dreamed up in advance, that won. It was the machine that was allowed to learn from examples — as soon as there was enough power and enough data. Symbolic rules promised a thinking machine twice and failed twice. Data won. If you're wondering today where this story goes next, don't watch who writes the smartest rules; watch who has the most good examples to learn from — and who finally has enough power to learn from them. The rest of the story is being written right now.