IGNY8 Phase 2: Video Creator (02I)

AI Video Creation Pipeline — Stage 9

Document Version: 1.0 Date: 2026-03-23 Phase: IGNY8 Phase 2 — Feature Expansion Status: Build Ready Source of Truth: Codebase at /data/app/igny8/ Audience: Claude Code, Backend Developers, Architects

1. CURRENT STATE

Video Today

There is no video creation capability in IGNY8. No TTS, no FFmpeg pipeline, no video publishing. Images exist (generated by pipeline Stages 5-6) and can feed into video as visual assets.

What Exists

Images model (writer app) — generated images from pipeline, usable as video visual assets
SocialAccount model (02H) — provides OAuth connections to YouTube, Instagram, TikTok for video publishing
Self-hosted AI infrastructure (Phase 0F) — provides GPU for TTS and AI image generation
Content generation pipeline (01E) — content records provide source material for video scripts
Celery infrastructure with multiple queues — supports dedicated video queue for long-running renders

What Does Not Exist

No video app or models
No script generation from articles
No TTS (text-to-speech) voiceover generation
No FFmpeg/MoviePy video composition pipeline
No subtitle generation
No video publishing to platforms
No Stage 9 pipeline integration

2. WHAT TO BUILD

Overview

Build Stage 9 of the automation pipeline: an AI video creation system that converts published content into videos. The pipeline has 5 stages: script generation → voiceover → visual assets → composition → publishing. Videos publish to YouTube, Instagram Reels, and TikTok.

2.1 Video Types

Type	Duration	Aspect Ratio	Primary Platform
Short	30-90s	9:16 vertical	YouTube Shorts, Instagram Reels, TikTok
Medium	60-180s	9:16 or 16:9	TikTok, YouTube
Long	5-15m	16:9 horizontal	YouTube

2.2 Platform Specs

Platform	Max Duration	Resolution	Encoding	Max File Size
YouTube Long	Up to 12h	1920×1080	MP4 H.264, AAC audio	256GB
YouTube Shorts	≤60s	1080×1920	MP4 H.264, AAC audio	256GB
Instagram Reels	≤90s	1080×1920	MP4 H.264, AAC audio	650MB
TikTok	≤10m	1080×1920	MP4 H.264, AAC audio	72MB

2.3 Five-Stage Video Pipeline

Stage 1 — Script Generation (AI)

Input: Content record (title, content_html, meta_description, keywords, images)

AI extracts key points and produces:

{
    "hook": "text (3-5 sec)",
    "intro": "text (10-15 sec)",
    "points": [
        {"text": "...", "duration_est": 20, "visual_cue": "show chart", "text_overlay": "Key stat"}
    ],
    "cta": "text (5-10 sec)",
    "chapter_markers": [{"time": 0, "title": "Intro"}],
    "total_estimated_duration": 120
}

SEO: AI generates platform-specific title, description, tags for each target platform.

Stage 2 — Voiceover (TTS)

Cloud Providers:

Provider	Cost	Quality	Features
OpenAI TTS	$15-30/1M chars	High	Voices: alloy, echo, fable, onyx, nova, shimmer
ElevenLabs	Plan-based	Highest	Voice cloning, ultra-realistic

Self-Hosted (via 0F GPU):

Model	Quality	Speed	Notes
Coqui XTTS-v2	Good	Medium	Multi-language, free
Bark	Expressive	Slow	Emotional speech
Piper TTS	Moderate	Fast	Lightweight

Features: Voice selection, speed control, multi-language support Output: WAV/MP3 audio file + word-level timestamps (for subtitle sync)

Stage 3 — Visual Assets

Sources:

Article images from Images model (already generated by pipeline Stages 5-6)
AI-generated scenes (Runware/DALL-E/Stable Diffusion via 0F)
Stock footage APIs: Pexels, Pixabay (free, API key required)
Text overlay frames (rendered via Pillow)
Code snippet frames (via Pygments syntax highlighting)

Effects:

Ken Burns effect on still images (zoom/pan animation)
Transition effects between scenes (fade, slide, dissolve)

Stage 4 — Video Composition (FFmpeg + MoviePy)

Libraries: FFmpeg (encoding), MoviePy (high-level composition), Pillow (text overlays), pydub (audio processing)

Process:

Create visual timeline from script sections
Assign visuals to each section (image/video clip per point)
Add text overlays at specified timestamps
Mix voiceover audio with background music (royalty-free, 20% volume)
Apply transitions between sections
Render to target resolution/format

Render Presets:

Preset	Resolution	Duration Range	Encoding
`youtube_long`	1920×1080	3-15m	H.264/AAC
`youtube_short`	1080×1920	30-60s	H.264/AAC
`instagram_reel`	1080×1920	30-90s	H.264/AAC
`tiktok`	1080×1920	30-180s	H.264/AAC

Stage 5 — SEO & Publishing

Auto-generate SRT subtitle file from TTS word-level timestamps
AI thumbnail: hero image with title text overlay
Platform-specific metadata: title (optimized per platform), description (with timestamps for YouTube), tags, category
Publishing via platform APIs (reuses OAuth from 02H SocialAccount)
Confirmation logging with platform video ID

2.4 User Flow

Select content → choose video type (short/medium/long) → target platforms
AI generates script → user reviews/edits script
Select voice → preview audio → approve
Auto-assign visuals → user can swap images → preview composition
Render video → preview final → approve
Publish to selected platforms → track performance

2.5 Dedicated Celery Queue

Video rendering is CPU/GPU intensive and requires isolation:

Dedicated video queue: celery -A igny8_core worker -Q video --concurrency=1
Long-running tasks: 5-30 minutes per video render
Progress tracking: via Celery result backend (task status updates)
Temp file cleanup: after publish, clean up intermediate files (audio, frames, raw renders)

3. DATA MODELS & APIS

3.1 New Models

All models in a new video app.

VideoProject (video app)

class VideoProject(SiteSectorBaseModel):
    """
    Top-level container for a video creation project.
    Links to source content and tracks overall progress.
    """
    content = models.ForeignKey(
        'writer.Content',
        on_delete=models.SET_NULL,
        null=True,
        blank=True,
        related_name='video_projects',
        help_text='Source content (null for standalone video)'
    )
    project_type = models.CharField(
        max_length=10,
        choices=[
            ('short', 'Short (30-90s)'),
            ('medium', 'Medium (60-180s)'),
            ('long', 'Long (5-15m)'),
        ]
    )
    target_platforms = models.JSONField(
        default=list,
        help_text='List of target platform strings: youtube_long, youtube_short, instagram_reel, tiktok'
    )
    status = models.CharField(
        max_length=15,
        choices=[
            ('draft', 'Draft'),
            ('scripting', 'Script Generation'),
            ('voiceover', 'Voiceover Generation'),
            ('composing', 'Visual Composition'),
            ('rendering', 'Rendering'),
            ('review', 'Ready for Review'),
            ('published', 'Published'),
            ('failed', 'Failed'),
        ],
        default='draft'
    )
    settings = models.JSONField(
        default=dict,
        help_text='{voice_id, voice_provider, music_track, transition_style}'
    )

    class Meta:
        app_label = 'video'
        db_table = 'igny8_video_projects'

PK: BigAutoField (integer) — inherits from SiteSectorBaseModel