VEED's Fabric 1.0 AI Avatar model transforms static images into talking videos, prioritizing lip-sync precision over general motion capabilities. The model is audio-driven: it accepts any image and audio as inputs, and animates the image the match the speech.
Built for avatar creation and video use cases where realistic speech animation matters more than general motion and camera movements.
Fabric is the most life-like avatar model currently available.
Key Features
- Image to Avatar: Converts any static image into a dynamic video where the subject appears to speak the provided audio.
- Audio-Driven Lip Synchronization: Precisely matches mouth movements to the phonetics and timing of the input audio, ensuring natural speech animation.
- Full One-minute Outputs: Can create up to one full minute of continuous video
- Studio-Grade Output: Produces high-quality, realistic videos suitable for professional use across various applications, especially useful for talking-heads, product commercials, and UGC
- Non-Human Faces: Is excellent at bringing animals, animations, and other abstract faces to life
Technical Capabilities
- Modalities: Image-and-audio-to-Video (I2V)
- Resolution: 480p, 720p
- Durations: Up to 60 seconds
- Asset Limits: requires one image and one audio file
- Language Support: all languages
- Input Formats (Images): JPG, JPEG, PNG, WebP, GIF, AVIF
- Input Formats (Audio): MP3, OGG, WAV, M4A, AAC
- Output Formats: MP4 video
Limitations
- Camera and general movements are static: the model specializes in audio-driven lip synchronization and does not offer broad animation capabilities like camera movements or scene dynamics.
- Max resolution is 720p
- Works for single-speaker only
- No text inputs means directing capabilities are limited, and are driven only from the audio input