Frequently asked
How does ViralMint pick which moments to clip?
After Whisper produces a word-level transcript, an AI scorer evaluates every candidate window against a virality rubric (hook strength, emotional peak, actionable tip, contrarian claim, story loop, number promise, shocking claim, curiosity gap, etc.) and assigns each a 0–10 score. The picker prefers high-score clips while spreading hook types so you don't end up with five curiosity_gaps in a row. Each produced clip carries its score, its hook_type label, and a one-line virality_reason explaining the pick.
What length does the AI pick clips at?
Default is 15–60 seconds — the sweet spot for TikTok / Shorts / Reels. You can override the min and max in the Clip Studio settings (min_duration ≥ 10s, max_duration ≥ 15s, min < max). For podcasts you may want max_duration: 75 to keep a complete argument intact; for fast-paced talking-head content 30 is often enough.
Can it handle landscape source videos?
Yes. Enable force_vertical to convert 16:9 source to 9:16 with blur-fill backdrop on each produced clip. Without that flag, ViralMint preserves the source aspect. Most podcast hosts use 16:9 cameras + want 9:16 output — that's exactly what force_vertical handles.
How long does extraction actually take?
Whisper transcription dominates: ~1 minute of CPU time per 5–8 minutes of source on a typical laptop. After that, clip selection is fast (seconds), and rendering each clip with captions runs ~5–15 seconds per produced clip. End-to-end for a 30-minute podcast producing 10 clips: about 8–12 minutes total. You can walk away — the desktop app pushes a notification when it's done.
Does Clip Studio cost anything per clip?
No. Whisper transcription runs locally, the AI clip picker runs against the cloud chat API (~$0.01 of cloud call per extraction job regardless of how many clips you produce), and rendering uses FFmpeg locally. The only billable parts of ViralMint are the optional paid AI voice (gpt-4o-mini-tts), AI music (Lyria 3 Pro), and AI video clips (Sora 2 Pro / Veo 3.1 / etc.) — none of which Clip Studio uses.
Can I extract clips from a YouTube URL I don't own?
Technically yes — yt-dlp downloads any public video, ViralMint processes it. Whether that's appropriate is up to you: respect the source creator's preferences, fair-use rules in your jurisdiction, and the target platform's rules around third-party clips. For your own content, this is the workflow most creators use to repurpose long-form into shorts.
How does ViralMint compare to OpusClip?
OpusClip is a great SaaS tool with strong AI clip selection. The differences worth knowing: OpusClip is subscription-based ($19–$59/mo with monthly minute caps), watermarks free output, and runs transcription in their cloud. ViralMint's Clip Studio is free in the desktop app with no minute cap, no watermark, no per-clip cost, runs transcription locally on your machine, and the code is AGPL-3.0 open source. Quality-wise both ship strong virality scoring; ViralMint adds hook-type classification (curiosity_gap / contrarian / etc.) and pairs naturally with our Multi-Platform Export for one-click cross-posting.
Is the AI clip picker open source too?
The desktop side (Whisper integration, clip extraction pipeline, caption rendering, FFmpeg orchestration) is AGPL-3.0 at github.com/openclaw-easy/ViralMint. The AI scoring uses an LLM prompt that runs against our cloud chat handler — the prompt itself is in the repo; only the cloud routing is closed.