Codex App: OpenAI's free weapon against Claude Code

What is OpenAI Codex App and why it matters

On February 2, 2026, OpenAI released something they'd been building internally for months: Codex App, a native macOS application that Sam Altman described as "the most loved internal product we've ever had." It's not an editor extension or a chatbot that writes code. It's a command center where multiple AI agents work in parallel on your codebase.

The trick is understanding what problem it solves. Until now, AI coding tools (Claude Code, Cursor, Copilot) followed a simple pattern: you ask, the AI responds. One agent, one task. Codex App breaks that pattern. Think of it like having a team of virtual programmers, each working on an isolated Git branch, without stepping on each other's toes. That's exactly what it offers.

And this isn't theoretical: over one million developers are already using it monthly, and usage doubled since GPT-5.2-Codex arrived in December 2025.

GPT-5.2-Codex: the brain behind the app

What most guides won't tell you is that Codex App would be nothing special without the model powering it. GPT-5.2-Codex is the engine that makes the difference, and the numbers speak for themselves:

Specification	Value
Maximum context	400,000 tokens (~100K lines of code)
Maximum output	128,000 tokens
SWE-Bench Verified	80.0%
HumanEval	89.2%
Terminal-Bench 2.0	64.0% (leader)
Supported languages	50+

To put this in perspective: 400K tokens of context means you can load an entire project (frontend, backend, database) and the model understands how the pieces connect. You don't need to explain the architecture — it figures it out on its own.

The "context compaction" system allows the model to work coherently across millions of tokens in a single task without losing track. In practice, this means you can ask it to refactor an entire module and it will maintain consistency from the first file to the last.

Benchmark performance

On SWE-Bench Verified, the industry standard for measuring real bug-fixing ability, GPT-5.2-Codex hits 80.0%. Is it the best? Almost. Claude Opus 4.5 by Anthropic edges it out by a slim margin at 80.9%. But on Terminal-Bench 2.0, which measures terminal and scripting tasks, GPT-5.2 leads with 64.0%.

The practical takeaway: both models are technically tied at the top. The difference comes down to user experience, not the model itself.

The 4 features that define Codex App

1. Parallel agents with Git worktrees

This is the headline feature. You can launch multiple agents simultaneously, each working in an isolated Git worktree. While one agent implements OAuth authentication, another can optimize database queries, and a third writes unit tests.

Each agent has its own branch. When it's done, you review the diff and decide whether to merge. No conflicts, no collisions.

In practice, this radically changes your workflow. Instead of waiting for one agent to finish before assigning the next task, you can delegate 5 tasks at once and review results when they're ready.

2. Skills: beyond code generation

Skills are packages that extend Codex's capabilities beyond writing code. They include instructions, resources, and scripts that agents can use automatically. OpenAI already offers Skills for:

Information gathering (researching APIs, documentation)
Problem solving (complex debugging)
Technical writing (documentation, READMEs)
Code analysis (security audits, performance reviews)

The interesting part is you can create your own Skills. If your team has a specific deploy process, you package the instructions into a Skill and any agent can execute it.

3. Automations: scheduled tasks

Perhaps the most underrated feature. Automations are tasks that run automatically on a schedule you define. OpenAI uses them internally for:

Daily issue triage: every morning, an agent reviews new issues and classifies them
CI failure summaries: after each merge, it analyzes which tests failed and why
Release briefs: generates a daily summary of production changes
Bug hunting: scans code for problematic patterns

Results go into a review queue. You decide when to check them.

4. Configurable personality

The /personality command lets you choose the agent's communication style. Prefer concise, direct responses? Or a more conversational style that explains the reasoning? It syncs across the app, CLI, and IDE extension.

It's a small detail, but it makes a real difference when you spend hours working with the agent.

Codex App vs Claude Code vs Cursor: the real comparison

Let me break this down with data, not opinions:

Aspect	Codex App	Claude Code	Cursor
Type	Native macOS app	Terminal CLI	Full IDE
Parallel agents	Yes (main advantage)	No (one at a time)	No (one at a time)
Model	GPT-5.2-Codex	Claude Opus 4.5	Multiple
Context	400K tokens	200K tokens	Variable
SWE-Bench	80.0%	80.9%	N/A
Automations	Yes (scheduled)	No	No
Skills/Plugins	Yes (Skills)	Yes (Plugins + Hooks)	Yes (Extensions)
Plan Mode	No	Yes	No
Rewind	No	Yes	No
Starting price	Free (temporary)	~$20/month	$20/month
Platform	macOS only	Mac, Linux, Windows	Mac, Linux, Windows
Revenue	Not disclosed	$1B ARR	Not disclosed

Which one should you pick?

Choose Codex App if: you need to launch multiple tasks in parallel and work on macOS. The simultaneous agent management is genuinely unique with no equivalent from competitors.

Choose Claude Code if: you prioritize precision on complex individual tasks, need mature features like Plan Mode and Rewind, or work on Windows/Linux.

Choose Cursor if: you want an integrated IDE experience with built-in AI assistance, without switching between tools.

The reality is that many developers are using two or more of these tools simultaneously. They're not mutually exclusive.

Pricing and availability

OpenAI made an aggressive pricing move:

Plan	Access	Price
ChatGPT Free	Yes (temporary)	Free
ChatGPT Go	Yes (temporary)	Free
ChatGPT Plus	Yes + double limits	$20/month
ChatGPT Pro	Yes + double limits	$200/month
Business/Enterprise	Yes + double limits	Custom

The key point: right now, Codex App is free for all ChatGPT users, including free tier. OpenAI hasn't confirmed when this promotion ends, but the strategy is clear: capture developers before they consolidate on Claude Code (which already generates $1 billion annualized).

Key limitation: only available on macOS (Apple Silicon, macOS 14+). The Windows version is in development.

The good, the bad, and what needs work

Pros

Real parallel agents with Git isolation: the most differentiating feature on the market
Scheduled automations that eliminate repetitive tasks
Temporarily free for all ChatGPT users
90% first-attempt resolution with GPT-5.2 according to OpenAI
Secure sandbox with granular permissions and internet disabled by default
Extensible Skills that go beyond code generation

Cons

macOS only: Windows and Linux users will have to wait
Heavy Electron app: consumes around 8 GB of RAM for managing chats and diffs
No internet access by default: can't install packages or resolve dependencies automatically
Issues with complex refactoring: tends to want to open a new PR for each iteration
Frontend framework struggles: React and complex components remain a weak spot
Fewer mature features than Claude Code: missing Hooks, Rewind, and Plan Mode
Code in the cloud: your code runs on OpenAI's servers, raising privacy concerns

The developer verdict

The community is split. Those working on large projects with multiple independent modules love the parallel agents. Those who need surgical precision on complex individual tasks prefer Claude Code. And those who want an integrated experience without leaving their editor stick with Cursor.

Sam Altman admitted feeling "a little useless" after watching Codex outperform his own ideas while building an app. It's a powerful statement, but take it in context: it's the CEO selling his product.

Frequently asked questions

Does Codex App replace my IDE?

No. Codex App is complementary to your IDE, not a replacement. You still need VS Code, Cursor, or another editor for direct file editing. Codex App manages agents that work on your repository.

Is my code safe in Codex App?

OpenAI runs each task in an isolated sandbox with internet disabled by default. Secrets are encrypted and removed before execution. However, your code does travel to OpenAI's servers, which may be a concern for companies with strict security policies.

How much does it really cost?

Right now it's free for all ChatGPT users (temporarily). When the promotion ends, it's expected to require at least the Plus plan ($20/month). Pro users ($200/month) will get higher limits.

Does it work on Windows or Linux?

No. At launch, Codex App is only available for macOS (Apple Silicon, macOS 14+). OpenAI has confirmed the Windows version is in development, but there's no date.

Is it better than Claude Code?

It depends on the use case. For parallel tasks and automations, Codex App wins. For precision on complex individual tasks and advanced features, Claude Code has the edge. Both use models with virtually identical SWE-Bench performance.

Conclusion: is Codex App worth trying?

The short answer: yes, especially now that it's free.

Codex App isn't perfect. It lacks features Claude Code already has, consumes too much RAM, and only works on Mac. But the parallel agent management is a genuine innovation that changes how you work with AI-assisted code.

If you're a developer on macOS, there's no reason not to try it while it's free. The worst that can happen is you discover you prefer your current tool. The best case is you find a multi-agent workflow that multiplies your productivity.

The AI coding tools war is just getting started. With Claude Code generating $1 billion annualized and Codex App capturing one million users, 2026 will be the year that defines who dominates the developer's desktop.