ReferenceUpdated 2026-05-23·2 min read

Video Generation Pipeline Overview

What runs between paste-URL and ready-to-publish: Strategy, Voice, Music, Subtitles, and Edge Render.

Every Clipus video runs through three phases. You can watch each one progress in the dashboard.

The three phases

  • Phase 1 — Strategy + Scripttarget ≤ 45s p95

    DOM analysis, marketing strategy, script + evaluator.

  • Phase 2 — Voice + Subtitles + Blueprinttarget ≤ 20s p95

    ElevenLabs VO, Whisper subtitles, render blueprint.

  • Edge — Client Rendertarget ≤ 25s p95

    WebCodecs client-side render in browser.

Total target: ≤ 90s p95.

Phase 1 — Strategy + Script

A planner agent reads your DOM and proposes three marketing strategies. An evaluator agent scores them against a rubric (hook strength, specificity, pacing, CTA clarity). The winning script enters Phase 2.

Phase 2 — Voice + Music + Subtitles + Blueprint

The voice step generates the voiceover. On Scale and above, AI Music Supervisor can generate a voiceover-safe instrumental background track for the video. Lower plans and quota fallback paths use Clipus static or curated music. The subtitle step transcribes the voiceover back to timed subtitles. A blueprint compiler emits the render plan (scene list, durations, transitions, audio mix).

For report-backed videos, Studio also shows an Output Capability panel before generation. Use it to confirm the proof posture, allowed polish, audio source, plan status, and risk label.

Phase Edge — Client Render

Your browser renders the final video using WebCodecs (HW-accelerated). No server-side FFmpeg in the critical path. The result lands in your dashboard ready to publish.

What can slow it down

  • Heavy SaaS pages (DataDog, HubSpot) take longer in Phase 1 because the DOM is larger.
  • Voice generation queue spikes can add 5-15 seconds to Phase 2.
  • AI-generated music is plan-limited. If a plan or quota cap blocks generation, Clipus falls back to static background music instead of blocking the video.
  • Older browsers fall back to the FFmpeg server worker (slower, but still completes).
Still need help? Contact us.
Was this helpful?