Seedance 2.0 is a premier multimodal generation engine designed to transform users into "AI Directors." It moves beyond simple prompt-to-video generation by allowing users to reference text, images, audio and video clips.
Key Features
- 1 Main models: Standard
-
State of the art generation:
- Excellent prompt adherence and visual quality
- Great character consistency between generations
- Realistic physics
- Great Audio Sync
- Multi-Shot: Capable of generating "one-take" tracking shots and complex "multi-shot" sequences just by prompting.
- Supports auto duration settings
- First & Last Frame Control: Specify start and end frames. The model generates a smooth transition video between them with full control over the motion process.
- Style & Motion Transfer: Migrate editing rhythm, camera technique, or visual style from a reference video to newly generated content.
- "All-in-One" Reference (@System):Use @ tags for each input. For example, @Image1 as first frame, @Video1 for camera movement reference, @Audio1 for background music.
Technical Capabilities
-
Inputs:
- Text to Video
- Start/End Frame
- Reference to Video (Supports references as Image, Video and/or Audio)
- Resolution: 480p, 720p
- Ratios: 9:16, 16:9, 1:1, 3:4, 4:3, 21:9
- Durations: Supports 4–15 seconds per generation and Auto Duration
- Asset Limits: Up to 9 reference images, 3 video clips (each 2–15s, total ≤ 15s), and 3 audio files — at least one multimodal input required
- Models: Seedance 2.0, Seedance 2.0 Fast
- Language Support: English, Chinese, Japanese, Korean, Spanish, Indonesian, Portuguese, French, German
Limitations
- No real human face uploads: Uploading images or videos containing real, recognizable human faces is not supported due to platform compliance requirements. The system will automatically block such content.
- Audio-only input not yet supported: At least one image or video must be provided if referencing audio. Audio references are described via prompt text rather than direct upload.
- Video reference constraints: Each reference video must be 2–15 seconds; combined total of all reference videos cannot exceed 15 seconds.
- Recommended shot limit: Keep prompts to 3–5 shots for best per-shot detail quality; more shots reduce individual scene fidelity.