Every Clipus video runs through three phases. You can watch each one progress in the dashboard.
The three phases
- Phase 1 — Strategy + Scripttarget ≤ 45s p95
DOM analysis, marketing strategy, script + evaluator.
- Phase 2 — Voice + Subtitles + Blueprinttarget ≤ 20s p95
ElevenLabs VO, Whisper subtitles, render blueprint.
- Edge — Client Rendertarget ≤ 25s p95
WebCodecs client-side render in browser.
Total target: ≤ 90s p95.
Phase 1 — Strategy + Script
A planner agent reads your DOM and proposes three marketing strategies. An evaluator agent scores them against a rubric (hook strength, specificity, pacing, CTA clarity). The winning script enters Phase 2.
Phase 2 — Voice + Subtitles + Blueprint
ElevenLabs generates the voiceover. Whisper transcribes it back to timed subtitles. A blueprint compiler emits the render plan (scene list, durations, transitions).
Phase Edge — Client Render
Your browser renders the final video using WebCodecs (HW-accelerated). No server-side FFmpeg in the critical path. The result lands in your dashboard ready to publish.
What can slow it down
- Heavy SaaS pages (DataDog, HubSpot) take longer in Phase 1 because the DOM is larger.
- ElevenLabs queue spikes can add 5-15 seconds to Phase 2.
- Older browsers fall back to the FFmpeg server worker (slower, but still completes).