Grok Imagine

All image models

Image modelxAI

Grok Imagine Text to Image

Text → Image generation.

Specifications

Input mode: Text → Image
Pricing: 3 credits / image — higher resolutions cost more
Typical generation time: ~12s
Free tier: No

Text → Image examples

Maximalist surreal landscape built entirely from ramen: noodle rivers, broth lakes, chashu islands and egg-yolk suns under god-ray light through steam

Surreal ramen dreamscape — noodle rivers and broth lakes

A wide-angle digital illustration of an ancient, mossy forest filled with glowing exotic flowers in pink, blue, and yellow. Small, winged fairies with iridescent wings flutter among the trees, surrounded by floating, iridescent bubbles and soft rays of sunlight filtering through the canopy, generated by Grok Imagine at 1408x768 resolution.

Enchanted Fairy Forest — Grok Imagine

An underwater photograph of a vibrant coral reef teeming with colorful tropical fish, created using Grok Imagine at 1408x768 resolution. Brain corals, staghorn corals, and pink sea fans rest on a sandy seabed under clear blue water with sunlight filtering down from the surface.

Vibrant Underwater Coral Reef — Grok Imagine

A low-angle shot of a mountain biker wearing a helmet and a colorful red, yellow, and blue jersey, riding a black mountain bike down a dirt trail in a dense forest. Dust kicks up behind the rear wheel. Generated by Grok Imagine at 1408x768 resolution.

Mountain Biker on Forest Trail, by Grok Imagine

A vertical image of two cats, an orange tabby and a tuxedo cat, sitting on a wooden floor and looking down curiously at a small tooth placed on a crumpled piece of white paper. Generated using the Grok Imagine model at 768x1408 resolution.

Two Curious Cats Inspecting a Tooth — Grok Imagine

A digital illustration of Zeus, a muscular man with a long white beard, raising a bright, crackling lightning bolt. He stands before a massive ancient Greek temple with classical marble columns under a dark, stormy sky. Several robed figures bow in reverence. Generated via Grok Imagine at 1408x768 resolution.

Zeus Summoning Lightning — Grok Imagine

A high-resolution 1408x768 text-to-image generation from Grok Imagine showing a crowded city street market during golden hour. Multicolored townhouses line the street, while pedestrians walk past food stalls with bright umbrellas and steam rising from food carts. A vibrant graffiti mural decorates a brick wall on the right.

Bustling City Street Market — Grok Imagine

A Doberman and a Pitbull snarling and biting a gold bar in a dark stone vault. Gold and silver coins are scattered on the stone floor around them. This 16:9 ratio image was generated at 1408x768 resolution using the Grok Imagine text-to-image model.

Grok Imagine: Dogs Fighting Over Gold Bar

Holographic Logo on Coffee Mug — Grok Imagine

A wide-angle digital image of a futuristic night cityscape, generated by Grok Imagine at 1408x768 resolution. Tall obsidian towers are adorned with spherical glass greenhouses filled with vibrant green and purple tropical foliage. Multiple quadcopter drones with glowing red lights fly through the hazy, atmospheric air between the structures.

Futurist Solarpunk City with Drones — Grok Imagine

What is Grok Imagine best for?

Powered by xAI's proprietary Aurora engine, Grok Imagine excels at generating highly aesthetic, cinematic compositions with rapid turnaround times. Provider documentation and community tests highlight its strong prompt adherence and accurate text rendering. It is particularly useful for rapid iteration on concept art, marketing visuals, and stylized imagery where fast inference speed is a priority.

What is the lineage and release date of Grok Imagine?

Grok Imagine is xAI's flagship image generation model, with its API officially announced on January 28, 2026. It marks a significant architectural shift for xAI, moving away from their earlier reliance on Black Forest Labs' diffusion models—like Flux 1.1 Pro—toward their own Aurora engine, an autoregressive mixture-of-experts transformer. It integrates natively with Grok Video for image-to-video workflows.

How can I get the most realistic results from Grok Imagine?

Because Grok Imagine uses an autoregressive architecture, it responds better to natural language than traditional keyword stacking. To avoid the "plastic" or CGI aesthetic sometimes reported by users, explicitly prompt for details like "photorealistic, 35mm film, natural skin texture" and use negative prompts against "CGI" or "smooth skin". When using its image editing features, describe only the specific change you want (e.g., "Make the sky a dramatic sunset") rather than re-describing the entire scene.

Prompt tips

Be exhaustively descriptive: Specify the subject, action, environment, lighting, and camera angle (e.g., "cinematic depth of field, 40mm lens") to compensate for the model's literal interpretation.
Use realism keywords: To avoid cartoonish or "uncanny valley" looks in human subjects, include phrases like "life-like" or "normal people".
Iterate in fast modes: When testing concepts, use the faster default generation modes to find your composition before switching to the stricter "Quality Mode" for final renders.
Leverage image-to-image for consistency: When editing, use a strong base image and only prompt for the specific changes you want; the model natively understands the content and preserves the overall composition.

Grok Imagine

All image models

Image modelxAI