OpenAI has introduced a new developer-focused model, GPT-5.3-Codex-Spark, designed to deliver near-instant code generation and edits within its Codex environment. The model, released on February 12, 2026, marks the company’s first system optimized specifically for real-time programming workflows.
Positioned as a lightweight counterpart to the more powerful GPT-5.3-Codex, the new model focuses on speed and rapid iteration rather than long-duration tasks.
Built for instant feedback
Codex-Spark is engineered for developers who need tight feedback loops — making small logic changes, adjusting interfaces, or refining functionality and immediately seeing the results. The model operates with a streamlined working style, prioritizing minimal, targeted edits and avoiding automated test execution unless users explicitly request it.
The system currently supports text-only interactions and offers a 128,000-token context window. It runs on ultra-low-latency infrastructure capable of generating more than 1,000 tokens per second.
Access is initially limited to ChatGPT Pro subscribers through the latest Codex app, command-line interface, and Visual Studio Code extension. API availability has been restricted to a small group of design partners as OpenAI evaluates real-world integration performance.
Infrastructure redesigned for speed
OpenAI said the performance gains come not only from the model itself but also from a reworked delivery pipeline. Enhancements include:
- Persistent WebSocket connections enabled by default
- 80% reduction in roundtrip overhead
- 30% lower per-token processing overhead
- 50% faster time-to-first-token
The company indicated that this low-latency delivery architecture will eventually be extended to other models.
New hardware partnership
A key component of the rollout is the use of the Cerebras Wafer-Scale Engine 3, marking the first operational milestone in OpenAI’s partnership with Cerebras Systems, announced earlier this year. The wafer-scale processors are being deployed as a dedicated serving tier for latency-sensitive tasks, complementing OpenAI’s existing GPU infrastructure.
Executives from both companies said the preview will help identify new developer behaviours and use cases enabled by high-speed inference.
Safety and deployment approach
OpenAI stated that Codex-Spark inherits the same safety training and evaluation standards as its primary models, including safeguards related to cybersecurity risks. The company said the system is not expected to reach high-risk capability thresholds under its internal preparedness framework.
Users may encounter temporary queuing during peak demand as data center capacity is expanded.
Two-mode development strategy
The launch reflects OpenAI’s broader plan to offer Codex in two distinct modes: a fast, collaborative environment for real-time editing and a separate long-horizon mode designed for extended reasoning and multi-step execution.
Over time, the company aims to merge both capabilities into a unified workflow that allows developers to shift seamlessly between rapid iteration and deeper autonomous tasks.
Broader access to Codex-Spark is expected in the coming weeks as infrastructure scales and performance is refined.
