Every Clipus video runs through three phases. You can watch each one progress in the dashboard.
The three phases
- Phase 1 — Strategy + Scripttarget ≤ 45s p95
DOM analysis, marketing strategy, script + evaluator.
- Phase 2 — Voice + Subtitles + Blueprinttarget ≤ 20s p95
ElevenLabs VO, Whisper subtitles, render blueprint.
- Edge — Client Rendertarget ≤ 25s p95
WebCodecs client-side render in browser.
Total target: ≤ 90s p95.
Phase 1 — Strategy + Script
A planner agent reads your DOM and proposes three marketing strategies. An evaluator agent scores them against a rubric (hook strength, specificity, pacing, CTA clarity). The winning script enters Phase 2.
Phase 2 — Voice + Music + Subtitles + Blueprint
The voice step generates the voiceover. On Scale and above, AI Music Supervisor can generate a voiceover-safe instrumental background track for the video. Lower plans and quota fallback paths use Clipus static or curated music. The subtitle step transcribes the voiceover back to timed subtitles. A blueprint compiler emits the render plan (scene list, durations, transitions, audio mix).
For report-backed videos, Studio also shows an Output Capability panel before generation. Use it to confirm the proof posture, allowed polish, audio source, plan status, and risk label.
Phase Edge — Client Render
Your browser renders the final video using WebCodecs (HW-accelerated). No server-side FFmpeg in the critical path. The result lands in your dashboard ready to publish.
What can slow it down
- Heavy SaaS pages (DataDog, HubSpot) take longer in Phase 1 because the DOM is larger.
- Voice generation queue spikes can add 5-15 seconds to Phase 2.
- AI-generated music is plan-limited. If a plan or quota cap blocks generation, Clipus falls back to static background music instead of blocking the video.
- Older browsers fall back to the FFmpeg server worker (slower, but still completes).