ChatGPT Images 2.0 is Here: Everything New in OpenAI’s Latest Model

The Problem With Text in AI Images Is Finally Being Addressed

For most of AI image generation's short history, text has been its most visible failure. Ask any image model to produce a poster, a menu, a diagram, or an infographic with readable words, and the result has been predictably broken: misspelled labels, garbled fonts, letters that blur into each other, and invented characters that look like language but are not. The limitation was so well-established that the AI image community developed workarounds as a standard practice, generating images without text and adding it manually in post-production.

On April 21, 2026, OpenAI announced ChatGPT Images 2.0, powered by a new underlying model called gpt-image-2. Text rendering is the headline capability, but the release is broader than that. It introduces native reasoning for image generation, support for up to 2K resolution, a flexible aspect ratio system from 3:1 wide to 1:3 tall, multi-image consistency across up to eight outputs from a single prompt, and improved multilingual support for non-Latin scripts. It also marks the retirement of DALL-E 2 and DALL-E 3, which OpenAI has scheduled for May 12, 2026.

OpenAI describes the release as "a step change" in image models.

Two Modes With Different Purposes

ChatGPT Images 2.0 operates in two distinct modes that serve different needs.

Instant mode is the default for all ChatGPT users, including free tier accounts. It produces fast outputs at the base quality level, prioritizing speed over deliberation. OpenAI quietly tested this mode under the internal codename "duct tape" on LMArena before the public launch. Instant mode delivers the core quality improvements in text rendering, instruction following, and visual fidelity to every user regardless of their subscription plan.

Thinking mode is reserved for ChatGPT Plus, Pro, and Business subscribers. It applies explicit reasoning to the structure of an image before rendering, allowing the model to plan layout, verify typography placement, and confirm object positioning before the final output is produced. OpenAI describes this as the model acting as a "visual thought partner" rather than a rendering tool. In practice, Thinking mode reduces the need for iterative re-prompting on complex requests because the model self-verifies its work before presenting results.

Thinking mode also unlocks web search during generation, allowing the model to pull live information and incorporate it into the image. It enables multi-image generation, up to eight consistent images from a single prompt with character and object continuity maintained across the set. These capabilities open specific workflows that were previously impractical, including producing entire manga sequences, storyboards, consistent social media graphics, and product marketing families where visual identity must remain coherent across multiple outputs.

The tradeoff is latency. Thinking mode takes longer than Instant mode to produce outputs, which is expected given the additional reasoning steps involved.

What Improved Text Rendering Actually Means

The text rendering improvement is the change with the most direct commercial impact. TechCrunch's review noted that asking the model to generate a restaurant menu now produces something that could be used immediately, contrasting it with the broken output that was standard just two years ago. VentureBeat observed that the model handles infographics, slide layouts, maps, and even manga with readable text coherently integrated into the design.

The improvement extends beyond English. ChatGPT Images 2.0 delivers specifically stronger support for Japanese, Korean, Chinese, Hindi, and Bengali. These are languages with non-Latin scripts that have historically been particularly difficult for diffusion-based image models. For teams producing localized marketing materials, educational content, or documentation across multiple languages, this removes a significant manual step from the workflow: text no longer needs to be generated separately and composited in after the image is produced.

OpenAI positions the model as capable of composing full layouts from a single prompt, including headlines, body text, hero imagery, and pull quotes, without requiring separate asset generation and layout work. The practical framing is that ChatGPT Images 2.0 moves image generation from a tool for visual experimentation into a tool for producing usable design outputs.

Resolution and Aspect Ratio Flexibility

Two technical specifications have practical implications for how the outputs are used.

Resolution has increased to up to 2K on the long edge through the API, with 4K support currently in beta for API developers. This makes gpt-image-2 outputs viable for print-ready marketing assets and technical documentation without requiring upscaling, which has been a standard post-processing step for professional use of AI-generated images.

Aspect ratio support has expanded significantly. gpt-image-2 supports a continuous range from 3:1 ultra-wide to 1:3 ultra-tall, including standard ratios including 1:1, 3:2, 2:3, 16:9, and 9:16. Users can specify the ratio in the prompt or select it through the interface. The practical effect is that assets for banners, social media posts, mobile vertical formats, presentation slides, and print compositions can all be generated at the correct dimensions without a separate cropping or reformatting step.

The End of DALL-E

The ChatGPT Images 2.0 launch comes with a concrete end date for its predecessors. DALL-E 2 and DALL-E 3 are scheduled to be retired on May 12, 2026, approximately three weeks after the Images 2.0 launch. gpt-image-2 replaces them across every surface OpenAI controls, including ChatGPT, Codex, and the API.

The retirement is operationally significant for developers who have built products on the DALL-E API endpoints. OpenAI has not announced backward compatibility for those endpoints after May 12. Developers currently using DALL-E 3 in production applications should evaluate the migration to gpt-image-2 before the cutoff date. API pricing for gpt-image-2 is $8 per million image input tokens and $30 per million image output tokens, with text tokens at $5 input and $10 output per million.

The model's knowledge cutoff is December 2025, which OpenAI acknowledges could affect the accuracy of prompts involving events or developments after that date.

How It Compares to the Competition

OpenAI is releasing Images 2.0 into an actively competitive market. On the LM Arena text-to-image leaderboard as of early April, Google's Gemini image generation model held first place, with OpenAI's predecessor gpt-image-1.5 in second. Google's image model, which also introduced improved in-image text support, set a benchmark that Images 2.0 is directly responding to.

VentureBeat's hands-on testing found that ChatGPT Images 2.0's fidelity in reproducing user interfaces, screenshots, and multi-image outputs appeared to exceed Google's current image model capabilities. TechCrunch noted that the model's thinking capabilities specifically, including web search integration during generation, represent a meaningful differentiation that Google's current offering does not match.

The competitive lead in AI image generation has shifted between providers multiple times in the past year. OpenAI's Studio Ghibli-style outputs generated significant viral attention earlier in the year. Google's Nano Banana model scored well at launch. ChatGPT Images 2.0 is OpenAI's response to the current competitive state, and it is specifically designed to address the text rendering and reasoning limitations that have held AI image generation back from professional workflow adoption.

Availability

ChatGPT Images 2.0 with Instant mode is available today to all ChatGPT users at no additional cost. Thinking mode, multi-image generation, and web search during generation are available to ChatGPT Plus, Pro, and Business subscribers. API access is available to OpenAI developer accounts under the model ID gpt-image-2. Pro users receive additional access to ImageGen Pro for more advanced generation.

The model is also available in Codex, OpenAI's code-focused product, expanding its reach into development workflows that involve generating visual assets programmatically.

If you are building web or mobile products that integrate AI image generation, working on marketing automation pipelines that require programmatic asset creation, or evaluating how ChatGPT Images 2.0 fits into a professional creative workflow, please reach out to MonkDA. We work with development teams building AI-integrated products and creative tools at every stage.

ChatGPT Images 2.0 is Here: Everything New in OpenAI’s Latest Model

The Problem With Text in AI Images Is Finally Being Addressed

Two Modes With Different Purposes

What Improved Text Rendering Actually Means

Resolution and Aspect Ratio Flexibility

The End of DALL-E

How It Compares to the Competition

Availability

Claude Opus 4.7 Just Dropped — Here’s What’s New

Google Chrome's New Skills Feature Turns Your Best AI Prompts Into One-Click Workflows.

Microsoft Is Building Another OpenClaw-Style AI Agent—Here’s What It Means

Ready to take your idea to market?