From "vibe coding" to agentic engineering — how building with AI is maturing, per Karpathy · Wiki

Andrej Karpathy — a co-founder of OpenAI, the man who shipped Tesla's autopilot, and the one who coined "vibe coding" a year ago — admits he has never felt so far behind as a programmer. It's a surprising confession from him. Karpathy explains why: working with AI has changed enough in a few months that old habits no longer suffice. Below are his main points, put into plain language — because they concern more than just programmers. (Karpathy is an AI researcher; when he says "model" or "LLM," he means a large language model — a program trained on a vast amount of text from the internet that generates a response from a prompt.)

The moment he stopped correcting

Karpathy had been using agentic tools — the kind where AI doesn't just answer but carries out the next steps of the work itself — for about a year. They were good at writing snippets of code, sometimes got things wrong, needed correcting. The turning point for him was December. The newest models started returning finished snippets, he asked for more, and it kept going well. At some point he realized he couldn't remember the last time he'd corrected anything — and he simply trusted the system. That's how he fell into "vibe coding": building software by feel, where you describe what you want, the model does the rest, and you don't dig into the details.

Karpathy stresses that this was a sharp, distinct shift — and that many people missed it. Most got to know AI a year earlier as "something like ChatGPT": a window to chat in. But around December, in his view, something changed fundamentally — a coherent, self-directed agent workflow actually started working. His folder of side projects is, as he puts it, filled to the brim these days. That's the first important signal I take from this: it's worth looking at these tools again, because the reference point from a year ago is already out of date.

"Software 3.0," or a computer you talk to

Karpathy's most important idea is this: a language model isn't better software — it's a new kind of computer. He arranges it into three eras. "Software 1.0" is classic programming: a human writes precise rules. "Software 2.0" is machine learning: instead of rules you prepare data and train a neural network on it. "Software 3.0" is the present era: the program becomes an instruction written in plain language, and the model is like a computer that runs that instruction.

Karpathy illustrates this with two examples. First: installing a certain tool. It used to be a complicated script that had to anticipate every type of computer, so it ballooned and grew endlessly complicated. Today the instruction is a snippet of text you paste to your agent — and it inspects your environment itself, adapts the steps and fixes any stumbles along the way. You don't have to spell out every detail; the agent supplies the intelligence.

An abstract human figure hands a glowing slip bearing an instruction to a calm robot holding a lit-up box full of tools.

The second example went further and, as he says, floored him. He built an app called MenuGen: you photograph a restaurant menu and the program draws in pictures of the dishes you don't recognize. Then he saw the "3.0" version of the same idea — all it took was handing the model the photo and asking it to draw the dishes straight onto the menu image. The model returned that exact photo, but with the dishes painted in. Karpathy puts it bluntly: his entire app turned out to be redundant — a relic of the old way of thinking. Hence his appeal: let's not treat AI as merely a speed-up of what we already do. Entirely new things are emerging — like building a knowledge base straight from loose documents, something no program could simply do before. That possibility excites him the most.

Why AI is "jagged"

If the models are so powerful, why can they trip over trivial things? Karpathy explains it through the notion of verifiability. A classic computer easily automates whatever can be written down precisely in code. Today's models easily automate whatever can be checked — because they're trained by being rewarded for the correct result. That's why they're strongest where the answer is easy to verify: math, code and related topics. Outside that area they can be rough around the edges.

He calls this unevenness "jagged intelligence." His favorite example: the same advanced model can rebuild an enormous project or find a security hole — yet, asked whether to walk to a car wash 50 meters away or drive, it advises walking, because "it's close, after all." For Karpathy this is an important practical lesson: you still have to stay a little "in the loop," treat the model as a tool and know what it's doing. He adds a sober note too: our capabilities depend on what the labs fed the models. If your task lands in an area they were trained on — you're flying. If not, you'll struggle, and sometimes you have to "top up" the model with your own data.

Two different crafts: vibe coding versus agentic engineering

Here Karpathy draws the cleanest distinction of all his points. He says vibe coding raises the floor: anyone can build any simple program today, and that's wonderful. But agentic engineering is something else — it's about holding the quality bar we know from professional software. You don't get to introduce security holes just because you worked "by feel." You're still responsible for your product as before — the only question is how to do it faster, but properly.

He calls it engineering because it's a discipline. An agent — a specialized AI that carries out tasks on its own — can be unreliable and unpredictable, and at the same time extraordinarily powerful. The art is in steering it so as to go faster without losing quality. Karpathy notes that people used to talk about the "10x engineer," meaning ten times as productive. In his view, people who are genuinely good at working with agents now go well beyond that threshold. It shows up most simply in hiring: in his view the old puzzle-style tasks still belong to the old world. A better test is to give someone a large project — a secure service, say — and watch how they build it and whether they can defend it against an attempted break-in.

What stays on the human side

An abstract conductor's figure with raised baton directs glowing, instrument-like forms arranged like an orchestra of agents.

If agents are doing ever more, what becomes more valuable in a human? Karpathy's answer: taste, judgment and oversight. He gives a vivid example from MenuGen. Login went through a Google account and payment through a separate service — and the agent tried to tie payments to a user by email address. The trouble is that addresses can differ, so the money didn't always reach the right person. A human sees at once that you don't design it that way — you need a stable user identifier. That's the human's role: to set a sensible plan and specification.

Karpathy describes the new division of labor with the image of an intern. The small stuff — dozens of technical details he no longer remembers himself — goes to the agent, because it has a great memory. But understanding what's happening underneath still has to come from the human, so as not to waste resources and to ask for the right things. You're responsible for the design, the meaning and the taste; the agent fills in the gaps. Will taste stop mattering once the models mature? Karpathy honestly doesn't know. He notes that today the code from models can be bloated and clumsy — it works, but it can give you a heart attack. Nothing, though, stands in the way of it improving; the labs just haven't gotten to it yet.

Where this is heading, and what follows from it

Karpathy thinks the world of tools is still written for people — and it ought to be written "for the agent." His favorite gripe: documentation makes him click and read, when all he'd like to know is one thing — which snippet of text to paste to his agent. So he expects infrastructure where data is legible to models, and ultimately a world where people and companies have their own agents: "my agent will sort it out with your agent" over the details of a meeting. How far this goes, he leaves open; he tempers his own enthusiasm.

To close, Karpathy returns to what's still worth genuinely learning as intelligence becomes cheap. He quotes a line that, he says, comes back to him every few days: "You can outsource thinking, but not understanding." He himself feels like the bottleneck these days — he's the one who has to know what we're actually building and why, in order to steer the agents well. And to steer, you have to understand — because understanding is precisely what the models handle worst. For an ordinary reader and operator a simple conclusion follows: the tools do ever more for us, but it's our understanding that determines where we steer them — and that is exactly what stays most valuable.