Seedance 2.0 Family – Artlist

Seedance 2.0 — One Pager

Video Model

Seedance 2.0 is ByteDance's latest video model and a top contender alongside Veo 3.1 and Kling 3. Its main draw is that it generates video and synchronized audio together in a single pass — sound effects, ambient audio, and lip-synced dialogue, with no post-production layering. It's strong on realistic physics, multi-shot editing within a single generation, and director-level camera control. It takes text, image, and multi-reference (image / video / audio) inputs. It originally shipped in two tiers — a Standard tier for maximum quality and a Fast tier for lower-latency production work — and the family has since added Mini (lightweight) and 4K (high-resolution) variants.

Variants at a glance

UPDATE — June 2026: Two variants have been added to the Seedance 2.0 family — Mini (lightweight, speed/cost-optimized) and 4K (high-resolution output).

Variant	Type	Modalities	Max Res.	Talking points
Seedance 2.0	Video	Text / Image / Reference → Video	4K	Max-quality tier; supports up to 4K; higher latency.
Seedance 2.0 Fast	Video	Text / Image / Reference → Video	720p	Lower latency and cost for production workloads; capped at 720p.
Seedance 2.0 Mini	Video	Text / Image / Reference → Video	720p	Lightweight tier for speed- and cost-sensitive work. Early access.
Seedance 2.0 4K	Video	Text / Image / Reference → Video	4K	High-resolution variant; outputs up to 4K. Early access.

Key Features

Audio & Output

Native audio: Generates SFX, ambient sound, and lip-synced dialogue jointly with the video in a single pass — no post layering. Wrap dialogue in double quotes for lip-sync.
Adaptive duration & aspect: Set duration or aspect ratio to “auto” and the model picks the optimal length and framing for the inputs.

Inputs

Three input modes: Text-to-video; image-to-video (start frame plus optional end frame); and reference-to-video.
Multimodal references: Combine up to 9 images, 3 video clips, and 3 audio files (12 files max) in one generation; reference them in-prompt as @Image1 / @Video1 / @Audio1.

Motion & Camera

Director-level camera control: Dolly zooms, rack focuses, tracking shots, POV switches, and handheld movement, controlled via prompt.
Realistic physics & motion: Collisions, cloth/fabric, fluid, and character motion; handles complex action like sports, dancing, and fight scenes.
Multi-shot editing: Natural cuts and multiple shots within a single generation.
Editing & extension: Provide a reference video and describe changes (replace an object, swap a background, alter style), or describe what happens next to extend the clip while keeping characters and style consistent.

Technical Capabilities

Modalities	Text-to-video, image-to-video, reference-to-video (images + video + audio)
Duration	“auto”, or 4–15 seconds
Resolution / quality	Standard up to 1080p; Fast up to 720p (480p / 720p / 1080p); new 4K variant outputs up to 4K (early access). Note: 4K uses H.265/HEVC encoding, directly outputs 10-bit depth at 4K resolution
Aspect ratios	auto, 21:9, 16:9, 4:3, 1:1, 3:4, 9:16
Audio	Native, synchronized — SFX, ambient, and lip-synced dialogue; on by default
Reference inputs	Up to 9 images, 3 videos, and 3 audio files (12 total)
Input image formats	jpg, jpeg, png, webp, gif, avif

Limitations

Fast tier is capped at 720p — 1080p output requires the Standard tier.
Maximum clip length is 15 seconds.
Reference-to-video constraints: reference videos must total 2–15 s at 480p–720p each; reference audio must total ≤ 15 and requires at least one image or video.
Only Seedance 2.0 supports 4K resolution output; Seedance 2.0 fast and Seedance 2.0 mini do not

Prompting Tips

Be specific — describe camera movement, lighting, mood, and the exact actions you want.
Wrap spoken lines in double quotes for lip-synced speech (e.g., he says: “Remember this moment.”).
Label reference assets explicitly in the prompt: @Image1, @Video1, @Audio1.
For edits, state both what to change and what to preserve.
Start with 5-second generations to nail the style, then increase the duration.