Codex by OpenAI — build your first project step by step · Wiki

You open the app for the first time and it looks exactly like ChatGPT: a list of chats on the left, a conversation field in the middle. Except this time the AI doesn't just answer — it reads your files, creates spreadsheets, clicks around in the browser and runs code on your computer. This is Codex, OpenAI's coding agent (the same company that built ChatGPT). I'll show it to you the way I teach it: one project, from an empty folder to a working dashboard on the web, and the whole interface along the way. I'll walk you through six steps — setting up the project, the context file, connecting your data, turning a first result into a skill, publishing the page to the web, and an automation that runs without you.

One caveat up front, because it'll spare you some disappointment: this isn't magic. Codex won't "do everything by itself on the first try." It works like a colleague you guide, correct, and trust more and more over time. The best way to think of it, in my view, is not as a wish-granting machine but as someone you're onboarding right now. And one more thing: it's not the only tool of its kind. On the other side there's Claude Code from Anthropic — a different engine, different strengths. The point isn't which one wins, but which one fits the task in hand.

Before you set anything up, take a look at what's in front of you. Codex is an app where you talk to an AI model just like in ChatGPT, but you work in projects — specific folders on your computer. The model (the AI's "brain," here one of OpenAI's GPT models) gets "hands" in the process: it reads and creates files, browses Excel spreadsheets, drives the mouse and the browser, runs code. A chat in the browser can't do that — there you just type and get text back. Here the agent genuinely acts on your data.

Before you build anything, it helps to know where the dials are. At the top you pick the model, its speed and the "effort" level — from low, through medium and high, up to very high. The rule I'd give you: for planning and simple tasks set medium, switch to high for a large project, and save the very highest for a hard bug nothing else can crack. Why not always go maximum? For two reasons. First, on a trivial task the highest level can overthink it — making a mountain out of a molehill. Second, each level costs a different number of tokens (a token is a small fragment of text, the unit the model is billed in). Low costs the least, the highest the most. In the settings, under the limits section, you can see how much of your current session is left and when it resets — worth checking there.

To get started, a ChatGPT account is all you need. There's a free tier with limited access, but for real work a paid plan is better — you'll run out of room less quickly. You download the app for your system, and that's it. Codex also runs inside a code editor or in the terminal with fuller capabilities, but to begin with the app will take you a very long way.

A word more on Claude Code, because the difference is practical. From what I see across the market: Claude tends to be better at free-form thinking and planning, and Codex at the pragmatic execution of a long plan and at hunting down bugs. If you want to go deeper, I have a separate piece on why Claude Code can be the most powerful tool. In this piece you're staying with Codex.

Step 1: set up the project and give it context about you

Most people's first instinct is to delegate something straight away. Stop. Before anything gets built, you give the tool context — who you are and where you're heading. It's a few minutes that pay for themselves many times over.

You start with a new project: you click "new chat," then add a project and point it to the folder where it's to live. It can be empty — call it "YouTube Analytics Demo," say, because that's exactly the project you'll build: a system that pulls the comments from a YouTube channel, analyzes them, gathers them in a spreadsheet and shows them on a dashboard you can open from your phone. From an empty folder to a working result.

Now the most important habit: you create an AGENTS.md file. It's a plain text document that Codex reads at the start of every new conversation — its "onboarding document." In it you write who you are, what the project's goal is and where you're heading. You don't have to write it by hand. Just ask: "Create an AGENTS.md file with context about me and the goal of this project — I'm building a dashboard analyzing YouTube comments that I want to publish to the web." Codex will lay it out itself, and you review it.

Why all this? Because without a file like that the model's knowledge vanishes between conversations. What you settle in one window may no longer exist in a new one — a single chat's memory doesn't carry over on its own. Saving context to a file makes the project "remember" itself, so you're not starting from zero every time. This is the first of several shifts in thinking I want to show you: knowledge gets saved in files, not in your head and not in a single conversation.

Step 2: connect your data and set permissions sensibly

The first real obstacle in any project is access to the data. Here you need YouTube comments — and here comes a great moment for the second shift in thinking: when you don't know whether something is even possible, don't go looking for the answer elsewhere first. Ask the tool itself. Write plainly: "Help me work out how to connect to my YouTube data to pull the comments, and explain it step by step." This is precisely the mechanism by which most people learn these tools fastest — you ask the agent and have it explain, instead of guessing.

Codex comes with ready-made plugins (also called connectors) — links to popular services like Google Drive, Slack, SharePoint and GitHub. You log in there just as you log in to email: you click, enter the password, done. That's the simplest route, and if your data sits in one of those services, use it. The trouble is that for YouTube there's no ready-made plugin. When one's missing, you ask Codex how else to connect, and it walks you through setting up an access key (an API key — a kind of digital entry pass a program presents to a service to get data). In this case you set up a project in the provider's cloud panel, enable access to YouTube data, generate the key and paste it into your own setup.

And here's a point you have to remember once and for all: you paste the key only into a file named .env (with a dot at the start). That dot tells the tool never to share this file publicly. The key is the password to your data — you don't paste it into just any document or send it out into the world.

The second thread is permissions — how much Codex can do without asking for approval. By default it often asks permission for each step — "should I access the web?", "should I overwrite this file?" That's safe, but it slows things down. In the general settings you can loosen this all the way to full access, where the agent acts without asking. My advice is sensible and dull: at the start, leave the defaults. Switch full access on only once you understand what the tool is doing — then it really does save time, but it requires trust you have to earn. You've probably heard stories of agents that wiped a database or fired off an avalanche of emails. That's almost always the result of vague instructions and handing over full control too early, not the tool's malice.

Once the key is in place, you have Codex check the connection: "Test whether this key works." It runs an attempt, sometimes hits a minor error, tries another route on its own and finally reports that it's pulling the comments. If something went wrong along the way, do the thing that builds a smarter system: ask it to record that knowledge in the project, "so this error never happens again." Every failure is data. Recorded, it doesn't return.

Step 3: make a first result and turn it into a skill

You have the data, so it's time for the first concrete result. Switch on plan mode — a toggle in which Codex executes nothing, only lays out a plan and asks exactly what should be built. Only after you accept the plan does it set to work. I always start with planning, because it's here, in the conversation, that mistakes are cheapest to make and fix.

You ask roughly like this: "Pull around 200 of my newest comments, find the patterns in them and show everything in an Excel spreadsheet — with charts and conclusions that help me make the content people are waiting for." Codex will ask about the details (from how many videos, how to classify the comments), present a plan, and once you accept it, assemble the spreadsheet: the share of questions, the most-mentioned tools, comment categories, content ideas, ready opportunities to reply, and the raw data on a separate tab. The first result can be quite decent.

But I'll be honest about what's easy to leave unsaid: with a general request the result tends to be general. If the request is vague, so is the output. The more precisely you say what you're looking for — which metrics interest you, which comments you're hunting for — the more on-target the result. This is the third shift in thinking: input quality decides output quality, and most of the work happens in the precision of the request, not in the power of the model.

And now the thing that really makes this step worth taking — a skill. It's a saved recipe: a plain text file with the instruction "how to do this well, step by step." Once you've worked out a good result, you say: "Turn what you just did into a skill, so that every time I ask for an analysis of YouTube comments, the same process is repeated." From then on you just say "do it" in plain language, and Codex reproduces the whole procedure — which source to pull the data from, how to analyze it, how to build the spreadsheet.

I like the image of a cooking recipe here, because it captures the essence well. When someone asks you for pancakes, you open the recipe and stick to the proportions, ingredients and timing — and they come out the same every time. Without a recipe you guess, and the result jumps around. What's more, a recipe can be improved: one time you add more chocolate, another you shorten the frying — and you update the recipe. Same with a skill: every use is a chance to refine it. You can keep a skill globally (it works in every project) or locally (only in this one) — just ask for one or the other. Building your own library of recipes like that is, to me, the heart of all the work with Codex.

A single line of light branches into an orderly sequence of brightening points on a dark background, depicting a plan being turned into successive steps.

Step 4: build the dashboard and publish it to the web

Raw data in a spreadsheet is only half of it. Now you turn it into a dashboard — a page of charts and conclusions that's pleasant to look at. You point Codex to the finished spreadsheet (you can "tag" a specific file so it knows what it's using) and ask for a nice page that visualizes the data. You can add that it should first use the image generator and propose a look and a logo — a small thing that makes a difference, because the agent sketches the concept first and only then builds the rest.

Here you'll see something genuinely worth appreciating: Codex has a built-in verification loop. Before it hands over the result, it inspects its own page, catches flaws and fixes them. It can report: "the test passed, doing one more visual review" — finds a few problems, fixes them, checks again, and only then delivers the finished dashboard. You don't have to police every pixel; the tool makes several passes itself.

There's a catch, though, that trips everyone up at first. The page it builds works only locally — on your computer, at an address starting with "localhost." Copy that address and paste it to a friend, and they'll see nothing, because the page lives only on your machine. To open it from your phone or show it to anyone, you have to publish it.

Two tools, free to start, do this. The first is GitHub — cloud storage for files and code with a change history. There you create a repository (simply a collection of your files and folders in the cloud) and send the project to it. It's like moving a document from your local drive to the cloud so it's available from any device. The second is Vercel — a service that turns that code into a working address on the internet, the kind anyone can open. Codex helps connect one to the other: it sets up the repository, links your account, sends the files. You just ask it: "Help me connect this project to GitHub and publish it."

The most convenient part is that the two tools "talk" to each other. Every change sent to GitHub goes automatically to the public version of the page on Vercel. It sounds like three places to keep track of — Codex, GitHub, Vercel — but in practice you manage only one: Codex. And one more advantage of this setup: you test changes locally, and they reach the public version only once you approve them. You have it change the background to red, you view it on your own machine, and if you don't like it — the public page never moves. A clean separation between the working version and the one the world sees.

A small tidy panel on the left connects by a clean line of light to an abstract sphere on the right against a dark background, symbolizing publishing from your computer to the internet.

Step 5: set up the automation and learn to check it

The last piece of the puzzle is automation — a scheduled task that Codex runs itself, without your involvement. You open a new chat in the project and describe what you want: "Every Sunday at 5:00 p.m. run the comment-analysis skill, add new rows to the spreadsheet, refresh the statistics, then send the changes to the page so the dashboard updates itself." The prompt can sound a bit chaotic — that's fine, Codex will ask about the missing details and set up a weekly task that pulls the comments, recomputes the data and publishes a fresh dashboard on its own.

There are two traps here I want you to know about before you hit them. First: by default the automation can set itself to a different model than you want — usually a weaker one. Look into its settings and pick the right model and effort level, because on the default the task can crawl or lose its way. Second, and more important: a task like this runs locally. Turn off the computer or close the app, and the automation stops running. For it to work truly around the clock, you have to move it to the cloud — that's a separate, later step.

Etch one rule deep: don't expect the automation to be perfect the first time. That's unrealistic. I like the comparison to teaching a child to ride a bike. You don't let go of the handlebars at once — first you walk alongside, steady them, correct, and over time you hand over more and more independence, finally taking off the training wheels. Same here: every run is data you use to improve the task. When something jams — and it can jam on something as small as an open file the agent can't overwrite — don't just watch helplessly. Stop the task, ask "what's going on?", fix the small thing yourself and carry on. That saves both time and tokens, which would otherwise burn away on wandering.

That's also where browser use comes in handy: you ask Codex to open the finished page itself, click around it, try to "break" it and report the flaws. As it does, you can watch it move the cursor around the screen. It's useful in two ways — for testing your own results (it finds things you'd overlook) and where there's no ready connection and the data has to be clicked through on a page by hand. The best part is that you can write this test permanently into the skill: "don't hand me the result until you've checked it in the browser yourself." That way the tool polices its own work better with each passing day.

What you take away

Step back a moment and look at the whole. You've built a project from an empty folder: set it up, gave it context, connected the data, made a first result and turned it into a skill, published the dashboard to the web and set up an automation that refreshes it every week on its own. That's a fair distance for one short project.

There's one conclusion, though, that ties everything else together — and, to my mind, the most important. The whole project is just a folder of files. Because the files are arranged in a legible structure, any agentic tool can work in them — Codex, Claude Code or another — since they all read the same directory. You can even combine them: plan with one, execute with another, catch bugs with a third. That's why the question is never "which tool is best," but "which fits this task best." If you're starting from scratch and want to get the basics in order first, I have a separate piece on how to put together a simple AI stack to begin with.

One quiet rule remains to close, the same one that came back at every step: these systems aren't meant to be perfect from the start — they're meant to get better, because every use is data you put into them. And if you're looking for the first move, it's dead simple. Write out the boring, repetitive chores in your week — the ones you'd want to happen while you sleep. Pick one. Turn it into a skill. Then the next one.