Choose the language the avatar will speak. AI voice & lip-sync follow this.
Clear front-facing face photo, good lighting. Product photo optional.
Longer = higher quality tier. Script & prompt limits scale with duration.
How should the avatar's voice be produced?
Voice script = what the avatar says. Video prompt = scene & actions. Limits depend on duration.
Stored server-side only. Both keys are used: SiliconFlow for voice (TTS), WaveSpeed for the talking-head video.