Grok Video

All video models

Video modelxAI

Grok Video Image to Video

Image → Video — generates video.

Specifications

Input mode: Image → Video
Accepts: start frame
Aspect ratios: 16:9, 9:16, 1:1
Resolutions: 480p, 720p
Durations: 5s, 6s, 7s, 8s, 9s, 10s, 11s, 12s, 13s, 14s, 15s
Max duration: 15s
Native audio: No
Pricing: 7 credits / second — longer clips and higher resolutions cost more
Typical generation time: ~2 min
Free tier: Yes

Image → Video examples

A medium close-up shot of a smiling young woman with long wavy brown hair wearing a cream-colored knit sweater. She is captured in a bedroom setting, looking directly at the camera in a vlog style. This 1280x720 video was generated using Grok Video on Hedra.

Grok Video: Woman Vlogging in Her Bedroom

A 1950s style diner interior featuring a waitress in a mint green uniform carrying a tray of milkshakes. Classic turquoise and red cars are parked outside the large windows at sunset. Generated by Grok Video at 1280x720 resolution from an initial image.

Retro 1950s Diner Waitress Serving Milkshakes — Grok Video

First frame of a 1280x720 video generated by Grok Video, showing an older scholar with silver hair and a burgundy velvet robe. He leans over a wooden desk covered in ancient maps and brass navigation instruments, reaching his hand toward the viewer. The background is a dimly lit, classic study.

Scholar Reaching Over Map — Grok Video

A vertical video frame showing a woman with braided hair in a high bun leaning against a vintage green car in a concrete tunnel covered in graffiti. Generated by Grok Video at 720x1280 resolution, the scene features overhead fluorescent lighting reflecting off wet ground.

Woman Leaning on Vintage Car in Graffiti Tunnel — Grok Video

A 1280x720 video frame generated by Grok Video showing a crew of breakdancers made of glossy chocolate and vanilla pudding. The figures perform dynamic handstands in a narrow city alleyway covered in colorful graffiti, with the ground flooded by reflective liquid chocolate.

Pudding Breakdancers in Graffiti Alley — Grok Video

A close-up video frame showing a blonde man in ornate gold armor with red glowing accents. He stands inside a dark, circular metallic corridor lit by red warning lights. This 1280x720 resolution video was generated using the Grok Video model.

Grok Video: Golden Armored Sci-Fi Soldier

A low-angle shot from behind shows a warrior's legs walking in desert sand. The character wears crimson metallic greaves and a white tunic, holding a curved blade. Generated at 1280x720 resolution with Grok Video, the frame captures detailed sand ripples and harsh shadows under bright sunlight.

Warrior Walking in Desert Sand — Grok Video

A medium close-up of an animated girl with bright orange hair and wide green eyes, showing a shocked expression. She has mechanical, metallic joints on her shoulders. This vertical 720x1280 video frame was generated from an input image using the Grok Video model.

Animated Character Transforming — Grok Video

A close-up shot of a blonde man with closed eyes, wearing detailed sci-fi armor with red glowing accents, inside a metallic capsule. This first frame of a 1280x720 video generated by Grok Video displays soft lighting and a reflective metallic background.

Astronaut in Stasis Chamber — Grok Video

A low-angle shot of a chaotic city street where several domestic cats are leaping and running forward alongside a large chimpanzee. In the background, skyscrapers rise under a bright sky, with smoke rising from a flipped car. This video frame was generated using the Grok Video model at 1280x720 resolution.

Cats and Chimps Clashing in City Street — Grok Video

Grok Video Text to Video

Text → Video — generates video.

Specifications

Input mode: Text → Video
Aspect ratios: 16:9, 9:16, 1:1
Resolutions: 480p, 720p
Durations: 5s, 6s, 7s, 8s, 9s, 10s, 11s, 12s, 13s, 14s, 15s
Max duration: 15s
Native audio: No
Pricing: 7 credits / second — longer clips and higher resolutions cost more
Typical generation time: ~44s
Free tier: Yes

Text → Video examples

A close-up shot of a male desert wanderer wearing brass goggles on his forehead and a scarf blowing in the wind. The background features a vast desert landscape under a purple twilight sky with glowing crystal structures, generated by Grok Video at 1280x720 resolution.

Desert Wanderer with Goggles — Grok Video

A video still showing a majestic deity with metallic golden skin, glowing white eyes, and flowing light-colored hair. The figure stands inside a translucent crystal structure against a deep blue cosmic background featuring galaxies and stars. This 1280x720 text-to-video generation was created using Grok Video.

Golden Cosmic Deity — Grok Video

A low-angle shot of a man in a white tank top, black headband, and large diamond chain gesturing at the camera. He stands in front of a black SUV on a wet city street at night. Generated by Grok Video at 848x480 resolution.

Grok Video: Rapper in Front of Black SUV

A wide shot of a Sufi whirling dervish spinning in a traditional white skirt and tall hat inside an ancient stone courtyard with arched pillars. Warm sunlight beams through the background. This 1280x720 video was generated using the Grok Video model.

Whirling Dervish in Stone Courtyard — Grok Video

An extreme close-up of a smooth, white wax sculpture of a human face with stylized droplets under the eyes. Generated by Grok Video at 1280x720 resolution, the frame features soft studio lighting against a dark, neutral background.

Melting Wax Face Sculpture — Grok Video

What is Grok Video especially good for?

Grok Video is uniquely capable of generating natively synchronized audio—including dialogue with accurate lip-sync, ambient noise, and sound effects—in a single pass alongside the video. Powered by xAI's Aurora engine, it excels at producing realistic object interactions, fluid motion, and cinematic physics. The community highlights its strong instruction following, making it highly effective for rapid prototyping, social media content, and dynamic scene creation without needing secondary audio tools.

Who developed Grok Video and what is its release history?

Grok Video is part of the generative media suite developed by xAI. The API officially launched on January 28, 2026, followed by the Grok Imagine 1.0 release on February 3, 2026. In early June 2026, xAI released the Grok Imagine Video 1.5 Preview, which quickly topped community leaderboards, surpassing competitors like ByteDance's Seedance 2.0. It shares its underlying architecture with the image-focused Grok Imagine model.

What are the best prompting practices for Grok Video?

When prompting Grok Video, focus on directing motion rather than describing static elements, as the model reads prompts autoregressively from left to right. The community recommends a sequential structure: start with subject movement, follow with camera trajectory, and end with atmospheric shifts. Because Grok largely ignores negative prompts, you should rely entirely on explicit positive instructions. You can read more in the official xAI docs.

How can I create longer videos with consistent characters?

While base generations are capped at 10 to 15 seconds, you can create longer sequences using Grok's video extension feature, often called the End Frame Method by the community. By feeding the final frame of your clip back into the model along with your original prompt, you can chain generations together. Recent updates ensure that audio context and character identity remain consistent across these extended cuts.

Prompt tips

Lead with camera intent: Place camera tracking instructions (e.g., "arc shot circling the subject") early in your prompt to establish framing before subject motion peaks.
Chain clips using end frames: To bypass the 15-second limit, use the final frame of a generated clip as the starting image for your next prompt to create seamless, extended videos.
Keep the camera still for cuts: When planning to extend a video, instruct the camera to remain completely still during the final second to ensure invisible transitions between chained clips.
Compress source files: When using image-to-video, compress your source files to stay under the ~50 MB payload limit, as exceeding this triggers immediate rejection.
Use explicit positive instructions: Because the model struggles with negative prompts, avoid telling it what not to do; instead, describe exactly what you want the subject to do in detail.

Grok Video

All video models

Video modelxAI

Grok Video Image to Video

Image → Video — generates video.

Specifications

Input mode: Image → Video
Accepts: start frame
Aspect ratios: 16:9, 9:16, 1:1
Resolutions: 480p, 720p
Durations: 5s, 6s, 7s, 8s, 9s, 10s, 11s, 12s, 13s, 14s, 15s
Max duration: 15s
Native audio: No
Pricing: 7 credits / second — longer clips and higher resolutions cost more
Typical generation time: ~2 min
Free tier: Yes

Image → Video examples

Grok Video: Woman Vlogging in Her Bedroom

Retro 1950s Diner Waitress Serving Milkshakes — Grok Video

Scholar Reaching Over Map — Grok Video

Woman Leaning on Vintage Car in Graffiti Tunnel — Grok Video

Pudding Breakdancers in Graffiti Alley — Grok Video

Grok Video: Golden Armored Sci-Fi Soldier

Warrior Walking in Desert Sand — Grok Video

Animated Character Transforming — Grok Video

Astronaut in Stasis Chamber — Grok Video

Cats and Chimps Clashing in City Street — Grok Video

Grok Video Text to Video

Text → Video — generates video.

Specifications

Input mode: Text → Video
Aspect ratios: 16:9, 9:16, 1:1
Resolutions: 480p, 720p
Durations: 5s, 6s, 7s, 8s, 9s, 10s, 11s, 12s, 13s, 14s, 15s
Max duration: 15s
Native audio: No
Pricing: 7 credits / second — longer clips and higher resolutions cost more
Typical generation time: ~44s
Free tier: Yes

Text → Video examples

Desert Wanderer with Goggles — Grok Video

Golden Cosmic Deity — Grok Video

Grok Video: Rapper in Front of Black SUV

Whirling Dervish in Stone Courtyard — Grok Video

Melting Wax Face Sculpture — Grok Video

What is Grok Video especially good for?

Who developed Grok Video and what is its release history?

What are the best prompting practices for Grok Video?

How can I create longer videos with consistent characters?

Prompt tips

Lead with camera intent: Place camera tracking instructions (e.g., "arc shot circling the subject") early in your prompt to establish framing before subject motion peaks.
Chain clips using end frames: To bypass the 15-second limit, use the final frame of a generated clip as the starting image for your next prompt to create seamless, extended videos.
Keep the camera still for cuts: When planning to extend a video, instruct the camera to remain completely still during the final second to ensure invisible transitions between chained clips.
Compress source files: When using image-to-video, compress your source files to stay under the ~50 MB payload limit, as exceeding this triggers immediate rejection.
Use explicit positive instructions: Because the model struggles with negative prompts, avoid telling it what not to do; instead, describe exactly what you want the subject to do in detail.

Grok Video

Overview

Grok Video Image to Video

Specifications

Image → Video examples

Grok Video Text to Video

Specifications

Text → Video examples

What is Grok Video especially good for?

Who developed Grok Video and what is its release history?

What are the best prompting practices for Grok Video?

How can I create longer videos with consistent characters?

Similar models

Prompt tips

What Will You Create?

Grok Video

Overview

Grok Video Image to Video

Specifications

Image → Video examples

Grok Video Text to Video

Specifications

Text → Video examples

What is Grok Video especially good for?

Who developed Grok Video and what is its release history?

What are the best prompting practices for Grok Video?

How can I create longer videos with consistent characters?

Similar models

Prompt tips

What Will You Create?

Company

Overview

Grok Video Image to Video

Specifications

Image → Video examples

Grok Video Text to Video

Specifications

Text → Video examples

What is Grok Video especially good for?

Who developed Grok Video and what is its release history?

What are the best prompting practices for Grok Video?

How can I create longer videos with consistent characters?

Similar models

Prompt tips

What Will You Create?

Overview

Grok Video Image to Video

Specifications

Image → Video examples

Grok Video Text to Video

Specifications

Text → Video examples

What is Grok Video especially good for?

Who developed Grok Video and what is its release history?

What are the best prompting practices for Grok Video?

How can I create longer videos with consistent characters?

Similar models

Prompt tips

What Will You Create?