OpenAI has released a standalone Codex app for macOS that deeply integrates coding agents based on GPT-5.2 into the operating system. The tool relies on isolated Git work trees to solve complex tasks in parallel in the background without blocking the developer’s active workflow in the main editor. We analyze how this asynchronous „manager“ approach compares directly to Anthropic’s CLI competition.
- GPT-5.2 Codex Engine: The new model holds up to 400,000 tokens in context memory and generates 128,000 output tokens at a time, enabling the refactoring of entire modules without interruption.
- Git Worktree Isolation: Instead of changing code live in the editor („ghost typing“), the app invisibly creates an isolated repository copy for each task, allowing developers to continue working asynchronously and without disruption in the main branch.
- Platform exclusivity: Despite being based on Electron, the release is limited to macOS 14 (Apple Silicon); Windows and Linux are not initially supported.
- Included pricing: Use is included in existing plans without a separate app subscription, starting at $20/month (ChatGPT Plus) with a rate limit of approximately 160 messages every 3 hours.
- Automation via AGENTS.md: A configuration file can be used to define fixed rules („skills“) and schedules that autonomously wake agents, e.g., @Daily at 3:00 a.m. for security audits.
The evolution to „Command Center“: Specs and architecture
The launch of the OpenAI Codex app on February 2, 2026, marks the end of the era in which developers had to manually copy code snippets back and forth between IDE and browser. The application no longer sees itself as a passive chatbot, but as a native command center for agentic workflows on the desktop.
The engine: GPT-5.2 Codex
At the heart of the architecture is the new model derivative GPT-5.2 Codex. Unlike the generic GPT-4o, this model has been specifically trained to understand complex software architectures and file dependencies. The key technical specifications define a new standard for local development agents:
- 400k context window: The model holds up to 400,000 tokens in memory simultaneously. This enables the app to overview not only individual files, but the structure of entire repositories.
- 128k output tokens: This allows the model to generate not only small functions, but entire modules or extensive refactorings in a single pass without interrupting the process.
Architecture shift: From chat to „manager & worker“
The most fundamental change in UX design is the shift from synchronous conversation to asynchronous delegation. While classic AI coding tools often act as synchronous „pair programmers“ (user waits for AI), Codex positions itself as an autonomous employee.
This manager & worker model enables true multitasking:
- Delegation: The developer (manager) defines tasks in the command center.
- Parallel agent threads: The app starts multiple threads simultaneously. One agent fixes bugs in the background while a second writes unit tests in parallel.
- Non-blocking: Since the agents work independently, the IDE remains freely available for the developer to use.
System requirements & pricing
Although the architecture is partly based on web technologies (Electron), the app is very limited in terms of hardware at launch. OpenAI is initially targeting the high-end segment of web and app development.
Hardware requirements:
- Operating system: macOS 14 (Sonoma or newer).
- Processor: Exclusively for Apple Silicon (M1/M2/M3/M4 chips).
- Incompatibility: Windows and Linux are not supported at the time of release.
The pricing structure does not include separate app subscriptions and is integrated into the existing OpenAI tiers:
| Tier | Cost | Rate Limits (Codex App) |
|---|---|---|
| ChatGPT Plus | $20/month | ~160 messages every 3 hours (or 30-150 complex agent tasks) |
| ChatGPT Pro | $200/month | ~300–1,500 messages every 5 hours |
Note: To accelerate adoption, the limits have been temporarily doubled for the launch and free users have been granted limited access.
The biggest problem with previous AI coding assistants in IDEs was „ghost typing“: while the developer types, the AI inserts code fragments asynchronously, moves the cursor, or causes syntax errors in the live build. The Codex app for macOS radically solves this architectural problem by using Git worktrees.
Architecture of isolation
Instead of operating directly in the user’s open editor window, Codex uses a headless instance of the repository. When a task is handed over to an agent, the app performs the following steps invisibly in the background:
- Worktree creation: A temporary Git worktree (a copy of the repository in a separate folder) is created.
- Branching: The agent checks out a new feature branch (e.g.,
fix-auth-bug) within this worktree. - Execution: All file operations, test runs, and commits take place in this isolated environment.
The „Async Feature Branching“ Workflow
This separation is what makes the parallel agent threads approach practical. A developer can continue to work on the main feature without files changing „as if by magic.“
A concrete scenario from practice illustrates the stability of this workflow:
- Foreground (user): You are actively working on the
main.pyfile. The build is running stably. - Background (agent): You start a task: „Investigate why the login token expires after 5 minutes and fix it.“
- Process: The agent isolates itself. It can cause tests to fail, delete files, or refactor without disrupting your local development environment.
- Merge: Only after completion does the agent report back with a finished diff. You actively decide on the merge instead of watching code being written live.
Comparison: Isolation vs. Direct Edit
This approach fundamentally distinguishes the Codex app from CLI tools such as Anthropic Claude Code, which act more like „pair programmers.“
| Feature | OpenAI Codex App | Claude Code (CLI) / IDE Plugins |
|---|---|---|
| Editing mode | Asynchronous & Isolated:Agent works in a separate worktree. | Synchronous & Direct:Agent edits the „real“ files live. |
| User Experience | Manager perspective:You delegate tasks and continue working in parallel. | Pair programming:You watch or wait until the agent is finished. |
| Risk | No ghost typing:Your editor status remains unaffected until the merge. | Potential for conflict:Simultaneous typing can lead to invalid code. |
| Build stability | Background tests may fail without blocking the user. | An AI error can crash the local dev server. |
The use of Git Worktrees makes the Codex app less of a chatbot and more of an autonomous employee who works in their own „office“ (branch) until the result is ready to be presented.
Practical guide: Asynchronous bug fixing with AGENTS.md
The biggest productivity killer in development is context switching. The Codex app addresses this with async feature branching. Instead of interrupting work on the current feature to fix a bug, you delegate the repair to a background agent. The core of this architecture is strict Git Worktree isolation.
The workflow: fixing without interruption
The Codex app does not act as a pair programmer (like GitHub Copilot), but as an autonomous employee. A typical scenario for parallel agent threads:
- Discovery: While working on
main.py, you notice a bug in the auth module, but don’t want to stop your current flow. - Delegation (prompt): You give the command to Codex: „Create a new branch
fix-auth-bug. Investigate why the login token expires after 5 minutes and fix it. Write a test for this.“ - Isolation: Codex invisibly creates a Git worktree (a copy of the repo in a separate folder) in the background. Your active editor remains untouched—there is no „ghost typing“ that suddenly changes your code.
- Execution & Merge: The agent checks out the branch, runs tests, edits the code, and commits the fix. Once finished, Codex reports: „Fix is ready and tested. Here is the diff.“ One click on „Merge“ integrates the solution.
Configuration via AGENTS.md
To ensure that the agent does not fix „blindly“ but adheres to project standards, configuration via an AGENTS.md file in the root directory is essential. Here you define „skills“ and hard rules.
For the bug fixing scenario mentioned above, AGENTS.md prevents untested code from being merged or security vulnerabilities from arising during the fix.
Example configuration for quality assurance:
# AGENTS.md (Quality Control Skill)
## Rules
- NEVER commit a fix without running the relevant unit test suite first.
- If tests fail, summarize the error log in the chat and abort the commit.
- Use explicit variable names (no single letters like `x` or `i`).
## Background Task Schedule
- @Daily 03:00 AM: Run full regression test suite and create bug reports for failures.
These rules force the GPT-5.2 codex to maintain discipline. The result is a workflow in which the developer acts as the „manager“ while the agent works through the time-consuming debugging tasks in isolation.
Setup: Configuring AGENTS.md
The heart of the customization in the Codex app is AGENTS.md. This file acts as a persistent rulebook and task planner for the underlying GPT-5.2 model. Instead of having to explain to the agent how to test or deploy with each prompt, developers store project-specific skills and security guidelines here.
Defining rules as „hard constraints“
Within AGENTS.md, explicit rules can be defined that the agent prioritizes as guardrails. This is particularly critical for preventing hallucinated or unsafe code operations. The system scans this file before executing each task.
A typical setup includes deployment locks for failed tests or specific formatting requirements:
# AGENTS.md (Deployment Skill)
## Rules
- NEVER deploy to production without running `npm run test:e2e` first.
- If tests fail, summarize the error log and abort. Do NOT attempt to brute-force fix without approval.
- Always use TypeScript rigid typing, avoid `any`.
## Background Task Schedule
- @Daily 08:00 AM: Run dependency audit and create PR for security updates.
- @Every 3h: Check for new bug reports in issue tracker and draft fix proposals.
Automation via schedules
A unique feature of the Codex app compared to pure chat interfaces (such as Claude Code CLI) is the native integration of cron-like background tasks. As can be seen in the code snippet under Background Task Schedule, repetitive maintenance tasks can be delegated.
- Syntax: Commands such as
@Dailyor@Every [time]trigger the agent. - Execution: These tasks run in parallel agent threads. Thanks to Git Worktree Isolation, the agent checks out a separate branch for this scheduled task, performs the
dependency audit, and creates a pull request. - User impact: The developer is not disturbed („zero friction“) because the main branch remains untouched in the editor until the PR is ready for review.
This configuration transforms the app from a mere „assistant“ to a proactive team member that independently monitors code hygiene.
„Manager mode“: Delegation via asynchronous threads
In the Codex app for macOS, the prompting paradigm changes fundamentally compared to classic chat interfaces. The user no longer acts as a pair programmer who watches line by line, but as a manager who delegates tasks to a worker. The prompt aims to complete a task entirely in the background.
A typical command for bug fixing would look like this:
Prompt: „Create a new branch
fix-auth-bug. Investigate why the login token expires after 5 minutes and fix it. Write a test for this.“
Execution via „Git Worktree Isolation“
As soon as this command is issued (send key: Enter), the Codex app starts a parallel agent thread. Unlike synchronous CLIs (such as Claude Code), which could block the current editor, Codex uses advanced Git Worktree Isolation for this.
The process in detail:
- Isolation: The agent invisibly creates a Git Worktree—a copy of the repository in a separate folder.
- Analysis & Fix: The model (GPT-5.2 Codex) checks out the
fix-auth-bugbranch, reads the code, and makes changes. - No ghost typing: While the agent writes unit tests and fixes the auth bug in the background, the developer can continue working on another feature in the main window without interruption. There are no „magically“ changing lines in the active editor.
- Completion: The agent only reports back when the task is complete: „Fix is done and tested. Here is the diff.“ The user only has to confirm the merge.
Controlling prompt quality via AGENTS.md
To ensure that the agent adheres to the correct standards when „fixing and testing“ (as required in the prompt), the Codex app uses configuration files in the repository. Prompts are implicitly enriched by an AGENTS.md file that specifies „skills“ and rules for the agent.
Here’s an example of such a rule definition that ensures that the auth fix doesn’t break production:
# AGENTS.md (Deployment Skill)
## Rules
- NEVER deploy to production without running `npm run test:e2e` first.
- If tests fail, summarize the error log and abort.
## Background Task Schedule
- @Daily 08:00 AM: Run dependency audit and create PR for security updates.
This combination of natural language commands and a fixed policy in the repo minimizes the risk of hallucinations or faulty fixes in asynchronous processing.
The core promise of the new Codex app on macOS is the shift away from synchronous „pair programming“ to an asynchronous manager-worker model. This is based on the GPT-5.2-Codex model in combination with an architectural decision that fundamentally respects the workflow of developers: Git Worktree Isolation.
Technical basis: Git Worktree Isolation
Previous AI coding tools often write directly to the active editor („ghost typing“), forcing the developer to watch. Codex, on the other hand, invisibly creates an isolated Git Worktree for each task—a copy of the repository in a separate folder that operates on its own branch.
This enables true multitasking:
- The user works on feature A (
main.py) in the main window. - The agent works in the background on bug fix B (in an isolated instance).
- There are no file locks or merge conflicts during the writing phase.
Practical example: „Async feature branching“
A specific scenario from the beta phase illustrates the workflow: While working on a new feature, a developer discovers an authentication error. Instead of changing context, they delegate the fix.
- Prompt: „Create a new branch
fix-auth-bug. Investigate why the login token expires after 5 minutes, fix it, and write a test.“ - Background process: The agent checks out the branch in the isolated worktree, runs tests, and commits the fix.
- User status: The developer continues typing undisturbed. No lines of code pop up in the field of vision.
- Merge: Codex reports: „Fix is ready and tested.“ The user reviews the diff and clicks „Merge.“
Architecture comparison: synchronous vs. asynchronous
The difference to market competitors such as Anthropic Claude Code lies in the philosophy of collaboration:
| Feature | OpenAI Codex App (asynchronous) | Classic AI tools / Claude Code (synchronous) |
|---|---|---|
| Role of AI | Worker:Processes tasks in the background. | Pair Programmer:Works with you in the terminal/editor. |
| Editor status | Static:Your code only changes when merged. | Dynamic:You can see how code is being edited in real time. |
| Context | Parallel Threads:Multiple agents can solve different issues simultaneously. | Single thread:Focus on one problem at a time. |
| Ideal for | Refactoring, test writing, bug fixing in the background. | Complex debugging that requires human intuition in real time. |
Automation via AGENTS.md
In addition to direct interaction, scheduled background tasks can be defined via a configuration file (AGENTS.md). These run completely autonomously, for example at night when the computing load is lower.
# AGENTS.md (example: Nightly Security Audit)
## Background Task Schedule
- @Daily 03:00 AM: Run `npm audit`.
- If critical vulnerabilities are found:
1. Create a new branch `security-fix-[date]`.
2. Attempt to update packages.
3. Run `npm test`.
4. Only create PR if tests pass.
These automations transform the Codex app from a pure chatbot into a CI/CD-like agent that actively maintains the code base, even when the developer is offline.
The merging process in the Codex app is fundamentally different from previous AI coding tools. Since the app acts as an asynchronous agent rather than a synchronous „pair programmer,“ the developer’s role shifts from co-author to code reviewer.
Worktree isolation as a safety net
The technical core of this workflow is the use of Git Worktrees. While conventional copilots often write directly to the open file buffer („ghost typing“), the Codex agent operates in an isolated environment.
- No conflicts: The agent invisibly creates a copy of the repository in a separate folder and checks out a temporary branch (e.g., for a bug fix).
- Uninterrupted flow: The developer can continue working on
main.pyin the main window without files suddenly changing or cursors jumping. - Security: Faulty loops, as occasionally reported with GPT-5.2 Codex (endless loops without output), do not destroy the current work status in the editor, as they remain trapped in the worktree in a sandbox-like manner.
The review workflow
Once the agent has completed a task (e.g., „Fix Login Token Timeout“) and run tests in the background, the status in the app changes.
- Notification: The app reports „Fix is complete and tested.“
- Diff inspection: One click opens a dedicated diff view. Here, not only the code change is displayed, but often also the context of the passed tests.
- Decision: The user has two options:
- Refinement: Return to the agent with feedback („Add another error log“).
- Merge: Transfer the changes to the main branch with a click of the mouse.
Comparison: Isolation vs. Live Edit
The Codex app’s approach differs drastically from command line tools such as Claude Code, which has a direct impact on the user experience (UX) during review:
| Feature | OpenAI Codex App (macOS) | Anthropic Claude Code (CLI) |
|---|---|---|
| Edit mode | Asynchronous/isolated: Agent codes in the background (worktree). No visual interference with the current editor. | Synchronous / Direct:Agent edits files live. Changes are immediately visible in the editor. |
| Control | Review-First:Code only enters the workspace after explicit „merge.“ | Monitor-first:You watch the agent write and must intervene if there are errors. |
| Commit strategy | Agent often commits granularly in the feature branch; user merges the finished branch/PR. | Agent modifies files directly; user must commit the changes afterwards. |
| Risk | Low risk for the current context (context switching). | High risk of distraction if files are changed while being read. |
This „manager & worker“ approach clearly positions the Codex app for tasks that should run in parallel – such as fixing bugs overnight or refactoring legacy code while the developer builds new features during the day.
Two completely different developer philosophies collide here. While Anthropic relies on direct interaction in the terminal with Claude Code (Sonnet 3.7/Opus), OpenAI attempts to elevate the developer from a pure coder to a manager with the Codex app.
A direct comparison of the architecture
The main difference lies in how—and where—the AI agent intervenes in the code.
| Feature | OpenAI Codex app | Anthropic Claude Code (CLI) |
|---|---|---|
| Philosophy | Manager & Worker:The agent works autonomously in the background (asynchronously). You delegate, it executes. | Pair Programmer:You debug code together in the terminal (synchronously). Direct dialogue „line by line.“ |
| Code isolation | Git Worktrees:Codex uses invisible, isolated copies of the repository. Your editor stays clean until you merge the fix. | Direct Edit:Claude edits your files live. Changes happen right before your eyes. |
| Platform | macOS Native(Electron Core). Windows & Linux are not supported at this time. | Terminal CLI(platform-agnostic). Runs natively on macOS, Windows, and Linux. |
| Pricing | Flat rate:Included in ChatGPT Plus ($20/month). Predictable costs for power users. | Pay-per-token(API) or pro-sub. Power users often report significantly higher costs. |
| Best case | „Build this feature overnight and create a PR.“ | „Help me understand this complex race condition bug _now_.“ |
Asynchrony instead of „ghost typing“
The Codex app solves a UX problem that many AI coding tools fail at: distraction.
By using Git Worktrees, the app creates an isolated environment for each task (e.g., „Refactor Auth Modules“). While the agent changes files, writes tests, and pushes commits, your main window in VS Code or Xcode remains untouched. You work on branch A, the agent works on branch B. There is no „ghost typing“ where lines of code suddenly appear in the editor and interrupt your flow.
In contrast, Claude Code is designed for maximum transparency. You can see exactly which file is open and being edited in the CLI. This is ideal for deep debugging, where you need to validate the AI’s thought process step by step, but it often blocks the workflow for parallel tasks.
The platform war: Electron vs. CLI
A major criticism of OpenAI from the community is its exclusivity. Although the Codex app is technically based on web technologies (Electron), it is artificially restricted to macOS (Apple Silicon).
Claude Code, on the other hand, wins here thanks to its flexibility: as a pure CLI tool, it integrates seamlessly into any existing Linux or Windows pipeline. Those who develop in WSL2 or on a remote server are currently at a disadvantage with the Codex app.
Decision aid:
- Choose the Codex app if you want to automate standard tickets („update dependencies,“ „write tests“) and focus on architecture decisions.
- Go with Claude Code if you work in a cross-platform environment or need an intelligent partner for complex live debugging.
Community verdict: „Fake exclusivity“ and teething problems
Since its launch on February 2, platforms such as Reddit (r/codex, r/LocalLLaMA) and HackerNews have been dominated less by admiration for the new agent features and more by frustration over technical decisions and usability hurdles. Criticism focuses on three core areas: platform policy, UI design, and model stability.
The „Electron lie“
Probably the loudest criticism concerns the lack of support for Windows and Linux. Although the Codex app is technically based on the Electron framework —i.e., it uses web technologies that are platform-independent—OpenAI artificially restricts access to macOS (Apple Silicon).
The developer community sees this as arrogance. A much-quoted comment on Reddit sums up the mood: „The real work is done in Linux. They know this… [but] it’s prioritized for macOS.“ The fact that an app that is not natively written in Swift/Objective-C is nevertheless released exclusively for Mac fuels the suspicion that this is purely a marketing decision rather than a technical necessity. In a direct comparison, Claude Code (CLI), which runs independently of the system as a terminal tool, scores highly here.
UI/UX: Chat client vs. IDE
Opinions are also divided when it comes to interface design. While the minimalist design is intended to be visually appealing, power users criticize massive breaks in the workflow:
- Muscle memory conflict: In every IDE, the
Enterkey creates a new line. In the Codex app,Entersends the command immediately. The lack of standard IDE shortcuts (e.g.,Shift Enterfor line breaks as standard) leads to unwanted „misfires“ of prompts. - Readability: The chosen font is often criticized as being too thin („too thin/light font weight“), which tires the eyes, especially during longer code reviews.
Quality regression: The „infinite loops“
However, the most serious technical problem seems to affect the underlying GPT-5.2-Codex model. Users report so-called „inference loops.“ In these loops, the agent gets stuck in a loop where it repeatedly reads files („Reading main.py…“) without ever writing productive code or committing changes.
In direct comparison with the competition, there is currently a stability gap here:
| Problem area | OpenAI Codex (GPT-5.2) | Anthropic Claude Code (Sonnet 3.7) |
|---|---|---|
| Looping | Frequent:Agent gets stuck in „read mode“ without output. | Rare: Acts more directly and aborts more quickly when errors occur. |
| Context handling | Occasionally loses track during long sessions (despite 400k window). | Currently considered more stable for complex refactorings („needle in a haystack“). |
| Output | Tends to rewrite files completely (higher token consumption). | More precise in surgical interventions (diff-based edits). |
These teething problems suggest that although the Codex app is designed to be a powerful „command center,“ in daily practice it still lags behind the stability of pure CLI solutions such as Claude Code.
Conclusion
The OpenAI Codex app is a fascinating promise that stumbles on its own execution. The architectural shift from synchronous „pair programming“ to an asynchronous „manager & worker“ model using Git Worktrees is revolutionary. Finally, the AI no longer fiddles around live in the editor („ghost typing“), but delivers finished results via pull requests. This is the workflow we always wanted.
However, the product itself seems arrogant and unfinished. The fact that an app based on web technologies (Electron) is artificially restricted to macOS is a slap in the face for the developer community. Added to this are massive teething problems such as „inference loops“ and questionable UI decisions that betray its beta status. In a direct comparison, the Claude Code CLI is currently the more robust, honest tool – less „magic in the background,“ but reliable and platform-independent.
Our recommendation:
- Install it if: You develop on an Apple Silicon Mac, already have ChatGPT Plus, and want to completely delegate repetitive tasks (writing tests, refactoring, dependency updates). The asynchronous workflow is unbeatable for staying in the „flow.“
- Stick with Claude Code (or Copilot) if: You use Windows/Linux, need maximum stability, or do deep debugging where you need to control the AI’s thought process live. If you have deadlines, you can’t afford „endless loops.“
Next step:
Consider the Codex app as a testing ground for the future of work, but don’t rely on it for critical projects just yet. The architecture (worktrees) will become the industry standard—but it remains to be seen who will implement it best. OpenAI has taken the lead, but now urgently needs to make technical (stability) and political (platform openness) improvements.

Florian Schröder ist Experte im Online-Marketing mit Schwerpunkt PPC (Pay-Per-Click) Kampagnen. Die revolutionären Möglichkeiten der KI erkennt er nicht nur, sondern hat sie bereits fest in seine tägliche Arbeit integriert, um innovative und effektive Marketingstrategien zu entwickeln.
Er ist überzeugt davon, dass die Zukunft des Marketings untrennbar mit der Weiterentwicklung und Nutzung von künstlicher Intelligenz verbunden ist und setzt sich dafür ein, stets am Puls dieser technologischen Entwicklungen zu bleiben.









