YouTube Clip Extractor

Paste a long YouTube URL, ViralMint transcribes with Whisper locally, scores viral moments with AI, and exports 9:16 clips with word-by-word captions burned in. Free local pipeline, no subscription.

ViralMint — Clip Studio
ViralMint channel analysis page showing a grid of video cards with virality scores and outlier badges — the upstream surface that feeds Clip Studio.
Channel analysis → Clip Studio. Pick a long video from the outlier-scored grid, ViralMint downloads it, transcribes locally with Whisper, and the AI viral-clip picker scores every 30-60s segment for clipping.

What's a YouTube clip extractor?

A YouTube clip extractor takes a long video (typically 10 minutes to 3 hours — podcasts, interviews, lectures, earnings calls) and finds the 30-60 second moments most likely to perform as short-form clips on TikTok, YouTube Shorts and Instagram Reels. The naive way is manually: scrub the video, find a good moment, mark in/out, export, caption. The AI way is: a model reads the transcript, scores every candidate segment for "viral fit", and you pick from the ranked list. The good clip extractors handle the transcription, scoring, cutting, reframing to 9:16, and captioning in one pipeline.

ViralMint's Clip Studio runs the entire pipeline locally for the heavy parts (yt-dlp download, Whisper transcription with word-level timestamps, FFmpeg cutting and reframing, ASS caption rendering) and only routes the AI viral-clip picker to the cloud. That means a 90-minute podcast doesn't cost you the per-minute transcription fees Opus Clip and Submagic charge — it costs the same as a 5-minute one because the AI call is just over the transcript text, not the audio. Typical end-to-end cost on a 90-minute video is under 20 cents.

How the clip extractor works

  1. Paste a YouTube URL. Any public video. Works on long-form podcasts, interviews, lectures, earnings calls, news segments, channel back-catalog — anywhere the long-form content has discrete moments worth surfacing.
  2. Download. yt-dlp pulls the video locally to your desktop. Universal downloader works on YouTube, YouTube Shorts, TikTok, Douyin, Bilibili, X (Twitter), Instagram, Reddit, Vimeo, Twitch and 1800+ other sites.
  3. Whisper transcribes locally. faster-whisper runs on your CPU (int8 quantized) — no cloud cost, no per-minute pricing. Word-level timestamps mean every clip's caption sync is exact, not the 2-second drift typical of cloud transcribers.
  4. AI viral-clip picker scores segments. The transcript gets scored segment-by-segment for hook strength, tension structure, emotional peaks, and standalone interpretability. Cloud AI call (gpt-5.4-mini or Sonnet 4.6 depending on length); typical cost is a few cents per long video.
  5. Export. Top-scoring segments cut to standalone mp4s, reframed to 9:16 (with face-tracking when there's a person on screen), captioned word-by-word with the viral / classic / bold preset of your choice. Bulk export of N clips at once is a ZIP download.

Why ViralMint's clip extractor is different

Local Whisper transcription

faster-whisper int8 runs on your CPU — no per-minute transcription fee, no audio upload to a 3rd-party. A 90-minute video costs the same to transcribe as a 5-minute one.

AI viral-clip picker

Scores every 30-60s segment on hook strength, tension structure, emotional peaks, standalone interpretability. Top-N exported. You can override the picks if you have specific moments in mind.

Face-tracking reframe

16:9 → 9:16 reframe keeps the speaker centered. OpenCV face detector + smooth-track interpolation means the face never jumps even when the speaker moves across frame. Manual frame-by-frame fixup available if needed.

Word-by-word captions

Whisper word-level timestamps + ASS subtitle rendering. Three presets — viral (yellow highlight, Montserrat Bold 56pt), classic (Arial 42pt), bold (Impact 64pt green). Customizable per channel.

Universal downloader

yt-dlp handles 1800+ sites — YouTube, YouTube Shorts, TikTok, Douyin, Bilibili, X, Instagram, Reddit, Vimeo, Twitch, Dailymotion, Facebook. Same pipeline for non-YouTube sources.

Bulk export

Generate 5, 10, or 20 clips at once from a single source video. Bulk export bundles all clips into a ZIP with AI-drafted Shorts titles and TikTok captions ready to copy-paste when posting.

How it compares to Opus Clip and Submagic

Opus Clip and Submagic are the two best-known YouTube clip extractors — both run their entire pipeline (download, transcription, AI scoring, captioning) in the cloud and charge subscription pricing per month with caps on minutes processed. Opus Clip's Starter plan is $19/mo for 60 minutes; Submagic's Pro plan is $23/mo for similar limits. For high-volume users (50+ long videos per month), the subscriptions can be cost-effective; for everyone else, the per-minute charge structure penalizes occasional use.

ViralMint's hybrid local + cloud architecture inverts that economics. The download (yt-dlp), transcription (Whisper), cutting (FFmpeg), reframing, and captioning all run locally for free — no per-minute charge. Only the AI viral-clip picker is a cloud call, and it's cheap because it runs over the transcript text, not the audio. Typical end-to-end cost on a 90-minute podcast is under 20 cents at ViralMint; comparable run on Opus Clip Starter consumes ~1.5x your monthly minute allotment. For comparison detail, see /alternatives/opus-clip/ and /alternatives/submagic/.

Frequently asked

How does ViralMint's YouTube clip extractor pick viral moments?

ViralMint's Clip Studio transcribes the video with Whisper (local, free, word-level timestamps), then runs an AI viral-clip picker over the transcript that scores each 30-60 second segment on hook strength (does the opening line earn the next 5 seconds?), tension structure (problem → setup → payoff), emotional peak density, and standalone interpretability (does the clip make sense without the surrounding context?). The top-scoring segments get cut, reframed to 9:16, and exported as mp4s with word-by-word captions burned in. The whole pipeline runs from a single YouTube URL.

Is the YouTube clip extractor really free?

Yes for the local pipeline. Downloading (yt-dlp), transcription (Whisper, local), clipping (FFmpeg), reframing to 9:16, and caption burn all run on your desktop with no per-video cost. The AI viral-clip picker uses cloud AI which is gated by the free daily allowance + per-use prepaid top-ups — typical cost per long video is under 10 cents. Compare to Opus Clip at $19/mo subscription or Submagic at $23/mo: ViralMint is pay-per-use, no subscription, and works on private/ unlisted videos you have access to.

Can I extract clips from a YouTube video I didn't film?

Technically yes — ViralMint's yt-dlp downloader works on any public YouTube URL, and the AI viral-clip picker doesn't care who filmed the source. But: respect copyright. Clipping someone else's video for fair-use commentary (reaction, criticism, education) is generally permissible; reuploading their footage as your own original short is not. ViralMint also supports public-domain footage, your own filmed content, and Creative-Commons-licensed videos. The tool is the same; the ethics are your call.

What about clip extraction from non-YouTube videos?

Same pipeline. yt-dlp supports 1800+ sites including TikTok, Douyin, Bilibili, X (Twitter), Instagram, Reddit, Vimeo, Twitch, Dailymotion and Facebook. Paste the URL from any of those and Clip Studio runs the same download → transcribe → score → export pipeline.

How long can the source video be?

ViralMint caps source videos at 2 hours (7200 seconds) for the download step — that covers most podcasts, interviews, lectures and earnings calls. Whisper transcription scales linearly with length; a 90-minute video takes 8-15 minutes to transcribe on a typical laptop CPU (int8 quantized base model). The AI viral-clip picker runs on the transcript text, so it's fast regardless of source length.

Extract your first clip

Sign up takes 30 seconds. Clip extraction ships in the free desktop app — the heavy pipeline (download, Whisper, FFmpeg, reframe, caption burn) is all local.