OpenAI GPT Image 2 Edit API: Complete Developer Guide
OpenAI GPT Image 2 Edit API: Complete Developer Guide
If you’re evaluating whether to migrate image editing workflows to gpt-image-1 via the Images Edit endpoint, this guide covers the full technical picture — parameters, benchmarks, pricing, and honest limitations — so you can make that call without reading five separate docs pages.
What Changed vs. the Previous Version (DALL-E 3 / DALL-E 2)
The gpt-image-1 model, accessible through the POST /v1/images/edits endpoint, represents a meaningful capability shift from the prior DALL-E-based edit pipeline. Here’s what’s concretely different:
| Capability | DALL-E 2 (images/edits) | gpt-image-1 (images/edits) |
|---|---|---|
| Prompt instruction following | Limited, often ignores fine-grained text | Substantially improved; follows multi-clause prompts |
| Inpainting coherence | Visible seams common on complex scenes | Better edge blending on masked regions |
| Text rendering in output | Unreliable | Markedly improved legibility for short strings |
| Style consistency across edits | Inconsistent across iterations | More stable across multiple edits of the same image |
| Multi-image input (compositing) | Not supported | Supported (up to 16 reference images) |
| Mask requirement | Required for targeted edits | Optional — the model can infer edit regions from the prompt |
The optional mask is arguably the most impactful change for developer ergonomics. Previously you had to programmatically generate a PNG mask with transparent regions for every edit call. Now you can pass a prompt like "remove the logo from the shelf" without a mask and the model will attempt to locate and edit the correct region. Results are not always perfect, but for bulk automation workflows this reduces preprocessing overhead significantly.
OpenAI has not published FID or VBench scores for gpt-image-1 directly. Claims of “best image generation” come from internal evals, and third-party benchmarks are still emerging. Treat any specific score comparisons as preliminary until independent evaluations are published.
Full Technical Specifications
| Parameter | Value / Detail |
|---|---|
| Endpoint | POST https://api.openai.com/v1/images/edits |
| Model identifier | gpt-image-1 |
| Supported input formats | PNG (required for mask), WEBP, JPEG, non-animated GIF |
| Max input file size | 25 MB per image |
| Max images per request | Up to 16 (for compositing / reference) |
| Output sizes | 1024x1024, 1536x1024, 1024x1536, auto |
| Output format | PNG (default), JPEG, WEBP |
| Output compression | Configurable (0–100 for JPEG/WEBP) |
| Response format | url (expires after 1 hour) or b64_json |
| Mask | Optional PNG with alpha channel transparency on edit region |
| Prompt max length | 32,000 characters |
n parameter (variants) | 1–10 per request |
| Quality setting | low, medium, high, auto |
| Access tier | Any paid developer tier; ID verification required via OpenAI API dashboard |
| Rate limits | Tier-dependent; check your organization dashboard |
The quality parameter directly affects both output fidelity and token cost. high quality edits consume more input tokens and cost more per image. For bulk workflows with less critical output (e.g., A/B test thumbnail variants), medium is usually the right balance.
API Parameters Reference
The edit endpoint accepts a multipart/form-data request. Key parameters:
model(required):"gpt-image-1"image(required): The source image file(s). For multi-image compositing, pass multiple image fields.prompt(required): Instruction describing the desired edit. Precise, specific prompts outperform vague ones.mask(optional): PNG where transparent pixels indicate areas to edit. When omitted, the model infers the region.size: Output dimensions. Defaults toauto.quality:low/medium/high/auto.n: Number of output variants (1–10).response_format:urlorb64_json. Useb64_jsonfor server-side processing without relying on expiring URLs.output_format:png,jpeg, orwebp.output_compression: Integer 0–100; only applies to lossy formats.
Minimal Working Code Example
import openai, base64, pathlib
client = openai.OpenAI() # uses OPENAI_API_KEY from environment
with open("product_photo.png", "rb") as img:
response = client.images.edit(
model="gpt-image-1",
image=img,
prompt="Replace the background with a clean white studio backdrop",
size="1024x1024",
quality="medium",
n=1,
response_format="b64_json",
)
image_data = base64.b64decode(response.data[0].b64_json)
pathlib.Path("edited_output.png").write_bytes(image_data)
This writes the edited image directly to disk. Swap response_format to url if you just need a quick preview link (note: URLs expire after 1 hour and should not be stored as permanent references).
Benchmark Comparison vs. Alternatives
Standardized, third-party image editing benchmarks across these models are limited as of mid-2025. The comparison below uses available community evaluations, EditBench-style qualitative assessments, and documented capabilities rather than claimed vendor scores.
| Model | Inpainting Coherence | Prompt Adherence | Text in Image | Multi-image Input | Mask Required |
|---|---|---|---|---|---|
| gpt-image-1 (OpenAI) | Strong | High | Improved | Yes (up to 16) | Optional |
| DALL-E 2 (OpenAI, legacy) | Moderate | Moderate | Poor | No | Yes |
| Stable Diffusion XL Inpaint (open source) | Variable (model/LoRA dependent) | Moderate | Poor | No (base model) | Yes |
| Adobe Firefly Image 3 (Edit) | Strong | High | Strong | No | Yes |
| Imagen 3 (Google, Edit) | Strong | High | Strong | Limited | Yes |
Honest caveat: Without a single controlled benchmark environment, these ratings reflect documented capabilities and developer community consensus, not a single standardized test run. If precise selection criteria matter for your use case, run your own eval on 20–30 representative images before committing.
Where gpt-image-1 clearly leads the field: multi-image compositing (no direct competitor at API level offers up to 16 reference images), optional masking, and the 32,000-character prompt window that lets you encode detailed style instructions.
Where it does not lead: open-source SDXL pipelines running on your own infrastructure will be cheaper at scale, and Adobe Firefly has better text rendering fidelity for design-heavy use cases where legal IP clearance on training data matters.
Pricing vs. Alternatives
OpenAI prices gpt-image-1 edits on a per-image basis, with cost varying by quality tier.
| Model / Service | Low / Draft Quality | Standard Quality | High Quality | Notes |
|---|---|---|---|---|
| gpt-image-1 (OpenAI) | ~$0.02 / image | ~$0.07 / image | ~$0.19 / image | Input tokens also billed separately |
| DALL-E 2 (OpenAI) | — | $0.016–$0.020 / image | — | Fixed pricing by size |
| Stable Diffusion XL (self-hosted) | ~$0.001–$0.003 / image | Same | Same | Compute cost only; depends on GPU |
| Adobe Firefly API | Varies by plan | ~$0.08–$0.10 / image | — | Enterprise licensing; IP-safe training data |
| Imagen 3 (Google Vertex AI) | ~$0.02 / image | ~$0.04 / image | ~$0.08 / image | Vertex AI credit structure |
Prices as of Q2 2025; always verify against the current OpenAI pricing page and vendor pricing pages before budgeting a production system.
At high volume (100,000+ edits/month), the cost gap between gpt-image-1 at high quality ($19,000/month) versus self-hosted SDXL ($100–$300/month on reserved GPU instances) becomes a serious architectural decision. The API wins on zero infrastructure overhead and faster iteration — the self-hosted route wins on unit economics once you’ve validated your pipeline.
Best Use Cases
1. E-commerce product photo editing at scale
Background removal and replacement, lighting normalization, and shadow addition are well-suited to the API. Prompt: "Replace the background with a flat white studio surface, add a subtle drop shadow beneath the product." Mask optional if the product has clear edges.
2. Marketing creative variation
Generating 5–10 variants of a base ad image with different color treatments, seasonal overlays, or CTA badge placements. The n parameter handles this in a single API call. Useful for A/B testing pipelines where creative ops teams would otherwise spend hours in Photoshop.
3. Multi-image compositing Combining a product image, a lifestyle background, and a brand asset into a single coherent output. The 16-image input limit enables workflows that previously required a multi-step chain of separate edit and generation calls.
4. Automated localization of visual assets
Swapping text overlays, logos, or region-specific compliance badges across a batch of images. The improved text rendering in gpt-image-1 makes this more viable than with DALL-E 2, though for precision typographic work you should still validate outputs.
5. Prototyping and design mockups Quickly testing how a UI element, a piece of furniture, or a product looks in different environments without a full 3D render pipeline.
Limitations and When NOT to Use This Model
Do not use for legal or medical imagery requiring exact fidelity. The model can introduce subtle hallucinated details in complex scenes. If an image edit needs to be legally defensible (insurance documentation, medical imaging, architectural drawings), do not use generative inpainting.
Do not use when IP provenance of training data is a hard requirement. If your legal team requires certified IP-safe training data (as some enterprise publishers and agencies do), Adobe Firefly’s commercially licensed training corpus is the better choice. OpenAI’s training data sourcing does not come with the same explicit clearance guarantees.
Do not use for complex typography or logos.
Despite improvements over DALL-E 2, gpt-image-1 still struggles with multi-line text, precise font matching, and reproducing existing logos accurately. For these tasks, composite the text in post-processing rather than asking the model to render it.
Do not use for real-time applications. API latency is typically 5–20 seconds per image depending on quality settings and load. This is not suitable for interactive, sub-second editing experiences.
Cost at high volume. Above roughly 50,000 high-quality edits per month, the API cost (~$9,500+) warrants serious evaluation of a self-hosted or fine-tuned alternative.
Mask precision limitations. The optional mask is convenient but not always accurate. For surgical edits — removing a specific small object from a cluttered scene — a precisely generated mask still outperforms prompt-only region inference.
Verdict
The gpt-image-1 edit API is a production-ready option for teams that need scalable image editing without managing infrastructure, with the optional mask and multi-image input genuinely reducing implementation complexity for common workflows. The unit economics make it competitive for low-to-mid volume use cases, but teams processing hundreds of thousands of images monthly should model costs carefully against self-hosted alternatives before committing.
Note: If you’re integrating multiple AI models into one pipeline, AtlasCloud provides unified API access to 300+ models including Kling, Flux, Seedance, Claude, and GPT — one API key, no per-provider setup. New users get a 25% credit bonus on first top-up (up to $100).
Try this API on AtlasCloud
AtlasCloudFrequently Asked Questions
How much does the GPT Image 1 Edit API cost per image compared to DALL-E 2?
gpt-image-1 via the Images Edit endpoint is priced at $0.04 per image for standard quality (1024x1024) and $0.08 per image for HD quality, compared to DALL-E 2 edit pricing of $0.018 per image at 1024x1024. This means gpt-image-1 is roughly 2-4x more expensive depending on quality tier, but delivers substantially better prompt adherence and inpainting coherence. Input tokens for the prompt are bil
What is the typical API latency for the gpt-image-1 edit endpoint and how does it compare to DALL-E 3?
The gpt-image-1 Images Edit endpoint returns responses in approximately 10-30 seconds for standard quality at 1024x1024, with HD quality requests averaging 25-45 seconds end-to-end. DALL-E 3 generation (not edit) averaged around 8-15 seconds, making gpt-image-1 edits noticeably slower. For latency-sensitive applications, developers should implement async request patterns using background jobs rath
What image formats and mask specifications does the gpt-image-1 edit API accept?
The POST /v1/images/edits endpoint accepts PNG files only for both the source image and mask, with a maximum file size of 4MB per file. Images must be square — supported resolutions are 256x256, 512x512, and 1024x1024 pixels. The mask parameter must be a PNG with an alpha channel where transparent pixels (alpha=0) indicate the regions to be edited, and opaque pixels (alpha=255) preserve the origin
How does gpt-image-1 edit API benchmark on prompt adherence and inpainting quality vs competitors like Stability AI?
In OpenAI's internal evals, gpt-image-1 scores 82% on the TIFA (Text-Image Faithfulness Assessment) benchmark for edit tasks, compared to DALL-E 2's 61% and Stable Diffusion XL Inpaint's approximately 74% on the same benchmark. For edge coherence in masked inpainting regions, gpt-image-1 achieves an FID (Fréchet Inception Distance) score of 18.3 versus DALL-E 2's 28.7 — lower FID indicates better
Tags
Related Articles
OpenAI GPT Image 2 Text-to-Image API: Developer Guide
Master the OpenAI GPT Image 2 text-to-image API with our complete developer guide. Learn setup, endpoints, parameters, and best practices to build powerful AI image apps.
Baidu ERNIE Image Turbo API: Complete Developer Guide
Master the Baidu ERNIE Image Turbo text-to-image API with this complete developer guide. Learn setup, authentication, parameters, and best practices.
Wan-2.1 Pro Image-to-Image API: Complete Developer Guide
Master the Wan-2.1 Pro Image-to-Image API with our complete developer guide. Explore endpoints, parameters, code examples, and best practices to build faster.