OpenAI Launches GPT-5.3-Codex-Spark for Real-Time Coding with Ultra-Low Latency

OpenAI has introduced a new developer-focused model, GPT-5.3-Codex-Spark, designed to deliver near-instant code generation and edits within its Codex environment. The model, released on February 12, 2026, marks the company’s first system optimized specifically for real-time programming workflows.

Positioned as a lightweight counterpart to the more powerful GPT-5.3-Codex, the new model focuses on speed and rapid iteration rather than long-duration tasks.

Built for instant feedback

Codex-Spark is engineered for developers who need tight feedback loops — making small logic changes, adjusting interfaces, or refining functionality and immediately seeing the results. The model operates with a streamlined working style, prioritizing minimal, targeted edits and avoiding automated test execution unless users explicitly request it.

The system currently supports text-only interactions and offers a 128,000-token context window. It runs on ultra-low-latency infrastructure capable of generating more than 1,000 tokens per second.

Access is initially limited to ChatGPT Pro subscribers through the latest Codex app, command-line interface, and Visual Studio Code extension. API availability has been restricted to a small group of design partners as OpenAI evaluates real-world integration performance.

Infrastructure redesigned for speed

OpenAI said the performance gains come not only from the model itself but also from a reworked delivery pipeline. Enhancements include:

Persistent WebSocket connections enabled by default
80% reduction in roundtrip overhead
30% lower per-token processing overhead
50% faster time-to-first-token

The company indicated that this low-latency delivery architecture will eventually be extended to other models.

New hardware partnership

A key component of the rollout is the use of the Cerebras Wafer-Scale Engine 3, marking the first operational milestone in OpenAI’s partnership with Cerebras Systems, announced earlier this year. The wafer-scale processors are being deployed as a dedicated serving tier for latency-sensitive tasks, complementing OpenAI’s existing GPU infrastructure.

Executives from both companies said the preview will help identify new developer behaviours and use cases enabled by high-speed inference.

Safety and deployment approach

OpenAI stated that Codex-Spark inherits the same safety training and evaluation standards as its primary models, including safeguards related to cybersecurity risks. The company said the system is not expected to reach high-risk capability thresholds under its internal preparedness framework.

Users may encounter temporary queuing during peak demand as data center capacity is expanded.

Two-mode development strategy

The launch reflects OpenAI’s broader plan to offer Codex in two distinct modes: a fast, collaborative environment for real-time editing and a separate long-horizon mode designed for extended reasoning and multi-step execution.

Over time, the company aims to merge both capabilities into a unified workflow that allows developers to shift seamlessly between rapid iteration and deeper autonomous tasks.

Broader access to Codex-Spark is expected in the coming weeks as infrastructure scales and performance is refined.