Grok-1.5, xAI's latest video generation model, released May 2026. It generates video with native audio, dialogue, ambient sounds, and sound effects, in a single pass. It also gives the user the ability to prompt multi-shot. This is an image-to-video model.
Key Features
- Multi-Shot – User can prompt for multiple shots in a single generation
- Native Audio Generation — Produces synchronized audio (dialogue, ambient sounds, effects) alongside the video in one generation. No separate audio step needed.
- Image-to-Video — Animate a still image with a text prompt. The output preserves the look of the original image while adding motion and audio.
- Configurable Duration & Resolution — Set duration from 1–15 seconds and choose between 480p and 720p per request.
Technical Capabilities
- Inputs: Image-to-Video
- Resolution: 480p and 720p
- Duration: up to 15 seconds
- Audio: Natively generated with the video
Limitations
- Max 720p — no 1080p option.
- 15-second max duration — fine for social clips, limiting for longer-form content.
- Aspect Ratio: determined by user-uploaded image