Smart Video — AI Video from Stock + Your Clips

Type a topic, ViralMint writes the script, voices it, matches Pexels stock footage clip-by-clip — blending in your own b-roll — and burns word-by-word captions into a finished mp4. No camera, no editing, no subscription.

Smart Video at a glance

{
  "tool": "ViralMint Smart Video",
  "input": "topic prompt OR custom script (≤10000 chars) + optional b-roll",
  "stock_source": "Pexels + Pixabay (keyword-matched clip-by-clip)",
  "mixed_clip_assembly": true,
  "voice_options": "13 OpenAI voices + 400+ Edge TTS voices",
  "caption_presets": ["viral", "classic", "bold"],
  "music": "bring-your-own OR Lyria 3 Pro AI",
  "aspect_ratios": ["9:16", "16:9", "1:1"],
  "pipeline_steps": 11,
  "subscription": false,
  "watermark": false
}

Pipeline in generator_video.py. See also the faceless video generator and AI clipper.

ViralMint — Smart Video
ViralMint Smart Video page showing the topic input, aspect ratio picker, voice selector, music dropdown and a script panel mid-generation.
Smart Video — type a topic on the left, ViralMint assembles script, voice, stock footage, captions and music into a finished mp4 on the right.

What is Smart Video?

Smart Video is ViralMint's stock-footage assembly pipeline. You give it a topic (or paste your own script) and it writes the script, generates a voiceover, transcribes that voiceover for word-accurate caption timing, then sources keyword-matched stock footage from Pexels and Pixabay and stitches everything into a finished mp4 with animated captions and background music. Unlike a pure text-to-video model, it builds from real stock clips — so it's fast, costs little, and looks natural rather than uncanny.

The differentiator is Mixed Clip Assembly: import your own b-roll and ViralMint blends it with the matched stock so the final cut reads as one cohesive piece, normalizing aspect ratio and pacing across your footage and the stock. Use your real product shots or talking-head segments where they count and let stock fill the rest. One generate produces 9:16, 16:9 and 1:1 versions in a single ZIP.

How to make a Smart Video in 5 steps

  1. Enter a topic or script. Type a niche and let the AI script writer draft it (it can use search-demand data and competitor transcripts), or paste your own script up to 10,000 characters.
  2. Add your own clips. Optional — import b-roll and Mixed Clip Assembly blends it with keyword-matched stock so the cuts don't look stitched together.
  3. Pick voice, captions and music. 13 OpenAI voices or 400+ Edge TTS voices in 100+ languages; viral / classic / bold caption presets; a track of your own or AI music from Lyria 3 Pro.
  4. Generate. The 11-step pipeline writes the script, voices it, transcribes for timing, matches stock clip-by-clip, stitches with FFmpeg, mixes music at −20 dB and burns word-by-word captions — about 3–6 minutes for a 60-second short.
  5. Export every aspect ratio. Download the mp4 plus 9:16, 16:9 and 1:1 in one ZIP, with AI-drafted YouTube title, description, tags and a TikTok caption. Post manually.

What's in Smart Video

AI script writer

Turns a niche or competitor URL into a retention-tuned script. Uses Whisper transcripts of reference videos and YouTube search-demand to shape the hook.

Keyword-matched stock

Pexels and Pixabay footage matched to your script clip-by-clip, so each line gets relevant b-roll automatically — no manual library hunting, no per-clip cost.

Mixed Clip Assembly

Blend your own b-roll with stock. ViralMint normalizes aspect ratio and pacing so a video built from mixed sources looks like one coherent piece, not a stitched collage.

Word-by-word captions

Whisper word-level timestamps + ASS rendering in three presets: viral (yellow highlight, Montserrat Bold), classic (Arial), bold (green highlight, Impact). The style that drives Shorts watch time.

AI or your own music

Drop in a royalty-free track or generate one with Lyria 3 Pro. Auto-mixed at −20 dB with fade in/out so the voiceover stays front and center.

Multi-aspect export

9:16 for Shorts/TikTok, 16:9 for long-form, 1:1 for Instagram — all three from one generate, in a ZIP, with no re-rendering.

Frequently asked

What is Smart Video?

Smart Video is ViralMint's stock-footage assembly pipeline: you type a topic (or paste a script) and it writes the script, generates a voiceover, transcribes it for caption timing, sources keyword-matched Pexels and Pixabay footage and stitches everything into a finished mp4 with word-by-word captions and music. It builds from real stock clips — and can blend in your own b-roll — so it's fast, cheap and looks natural.

Can I use my own footage?

Yes. Mixed Clip Assembly imports your clips and blends them with keyword-matched stock so the final video reads as one cohesive piece — it normalizes aspect ratio and pacing across your footage and the stock so the cuts don't look stitched together.

How much does Smart Video cost?

The stock tier runs with no per-video cloud cost: Pexels/Pixabay stock, Edge TTS voiceover (400+ voices, 100+ languages), word-by-word captions and FFmpeg assembly all run locally with no watermark. Optional cloud upgrades — premium voices, Lyria 3 Pro AI music, or swapping stock for AI-generated clips — cost a few cents per video from prepaid credits. No subscription.

How is it different from a faceless video generator?

They're the same engine viewed two ways. "Faceless video generator" describes the outcome — a video with no person on camera. Smart Video is the specific feature that produces it by assembling AI-matched stock (and your own b-roll) with script, voice, captions and music. See the faceless video generator page for the concept and niche guidance.

What languages does Smart Video support?

Edge TTS covers 100+ languages — Spanish, Portuguese, Mandarin, Japanese, German, French, Korean, Arabic and more. The AI script writer accepts any target language, and the captions render Unicode cleanly (Chinese, Japanese, Arabic, Hebrew). Whisper caption timing supports 99 languages.

Make your first Smart Video

Sign up takes 30 seconds. The browser version covers AI chat, image, voice and music; the full Smart Video assembly pipeline ships in the open-source desktop app.

Enlarged screenshot