Grok-1.5 – Artlist

Grok-1.5, xAI's latest video generation model, released May 2026. It generates video with native audio, dialogue, ambient sounds, and sound effects, in a single pass. It also gives the user the ability to prompt multi-shot. This is an image-to-video model.

Key Features

Multi-Shot – User can prompt for multiple shots in a single generation
Native Audio Generation — Produces synchronized audio (dialogue, ambient sounds, effects) alongside the video in one generation. No separate audio step needed.
Image-to-Video — Animate a still image with a text prompt. The output preserves the look of the original image while adding motion and audio.
Configurable Duration & Resolution — Set duration from 1–15 seconds and choose between 480p and 720p per request.

Technical Capabilities

Inputs: Image-to-Video
Resolution: 480p and 720p
Duration: up to 15 seconds
Audio: Natively generated with the video

Limitations

Max 720p — no 1080p option.
15-second max duration — fine for social clips, limiting for longer-form content.
Aspect Ratio: determined by user-uploaded image
Credits are not refunded for generations that fail due to an NSFW/content-policy error.