897 lines
31 KiB
Markdown
897 lines
31 KiB
Markdown
# IGNY8 Phase 2: Video Creator (02I)
|
||
## AI Video Creation Pipeline — Stage 9
|
||
|
||
**Document Version:** 1.0
|
||
**Date:** 2026-03-23
|
||
**Phase:** IGNY8 Phase 2 — Feature Expansion
|
||
**Status:** Build Ready
|
||
**Source of Truth:** Codebase at `/data/app/igny8/`
|
||
**Audience:** Claude Code, Backend Developers, Architects
|
||
|
||
---
|
||
|
||
## 1. CURRENT STATE
|
||
|
||
### Video Today
|
||
There is **no** video creation capability in IGNY8. No TTS, no FFmpeg pipeline, no video publishing. Images exist (generated by pipeline Stages 5-6) and can feed into video as visual assets.
|
||
|
||
### What Exists
|
||
- `Images` model (writer app) — generated images from pipeline, usable as video visual assets
|
||
- `SocialAccount` model (02H) — provides OAuth connections to YouTube, Instagram, TikTok for video publishing
|
||
- Self-hosted AI infrastructure (Phase 0F) — provides GPU for TTS and AI image generation
|
||
- Content generation pipeline (01E) — content records provide source material for video scripts
|
||
- Celery infrastructure with multiple queues — supports dedicated `video` queue for long-running renders
|
||
|
||
### What Does Not Exist
|
||
- No video app or models
|
||
- No script generation from articles
|
||
- No TTS (text-to-speech) voiceover generation
|
||
- No FFmpeg/MoviePy video composition pipeline
|
||
- No subtitle generation
|
||
- No video publishing to platforms
|
||
- No Stage 9 pipeline integration
|
||
|
||
---
|
||
|
||
## 2. WHAT TO BUILD
|
||
|
||
### Overview
|
||
Build Stage 9 of the automation pipeline: an AI video creation system that converts published content into videos. The pipeline has 5 stages: script generation → voiceover → visual assets → composition → publishing. Videos publish to YouTube, Instagram Reels, and TikTok.
|
||
|
||
### 2.1 Video Types
|
||
|
||
| Type | Duration | Aspect Ratio | Primary Platform |
|
||
|------|----------|-------------|-----------------|
|
||
| **Short** | 30-90s | 9:16 vertical | YouTube Shorts, Instagram Reels, TikTok |
|
||
| **Medium** | 60-180s | 9:16 or 16:9 | TikTok, YouTube |
|
||
| **Long** | 5-15m | 16:9 horizontal | YouTube |
|
||
|
||
### 2.2 Platform Specs
|
||
|
||
| Platform | Max Duration | Resolution | Encoding | Max File Size |
|
||
|----------|-------------|------------|----------|--------------|
|
||
| YouTube Long | Up to 12h | 1920×1080 | MP4 H.264, AAC audio | 256GB |
|
||
| YouTube Shorts | ≤60s | 1080×1920 | MP4 H.264, AAC audio | 256GB |
|
||
| Instagram Reels | ≤90s | 1080×1920 | MP4 H.264, AAC audio | 650MB |
|
||
| TikTok | ≤10m | 1080×1920 | MP4 H.264, AAC audio | 72MB |
|
||
|
||
### 2.3 Five-Stage Video Pipeline
|
||
|
||
#### Stage 1 — Script Generation (AI)
|
||
**Input:** Content record (title, content_html, meta_description, keywords, images)
|
||
|
||
AI extracts key points and produces:
|
||
```json
|
||
{
|
||
"hook": "text (3-5 sec)",
|
||
"intro": "text (10-15 sec)",
|
||
"points": [
|
||
{"text": "...", "duration_est": 20, "visual_cue": "show chart", "text_overlay": "Key stat"}
|
||
],
|
||
"cta": "text (5-10 sec)",
|
||
"chapter_markers": [{"time": 0, "title": "Intro"}],
|
||
"total_estimated_duration": 120
|
||
}
|
||
```
|
||
|
||
SEO: AI generates platform-specific title, description, tags for each target platform.
|
||
|
||
#### Stage 2 — Voiceover (TTS)
|
||
|
||
**Cloud Providers:**
|
||
| Provider | Cost | Quality | Features |
|
||
|----------|------|---------|----------|
|
||
| OpenAI TTS | $15-30/1M chars | High | Voices: alloy, echo, fable, onyx, nova, shimmer |
|
||
| ElevenLabs | Plan-based | Highest | Voice cloning, ultra-realistic |
|
||
|
||
**Self-Hosted (via 0F GPU):**
|
||
| Model | Quality | Speed | Notes |
|
||
|-------|---------|-------|-------|
|
||
| Coqui XTTS-v2 | Good | Medium | Multi-language, free |
|
||
| Bark | Expressive | Slow | Emotional speech |
|
||
| Piper TTS | Moderate | Fast | Lightweight |
|
||
|
||
**Features:** Voice selection, speed control, multi-language support
|
||
**Output:** WAV/MP3 audio file + word-level timestamps (for subtitle sync)
|
||
|
||
#### Stage 3 — Visual Assets
|
||
**Sources:**
|
||
- Article images from `Images` model (already generated by pipeline Stages 5-6)
|
||
- AI-generated scenes (Runware/DALL-E/Stable Diffusion via 0F)
|
||
- Stock footage APIs: Pexels, Pixabay (free, API key required)
|
||
- Text overlay frames (rendered via Pillow)
|
||
- Code snippet frames (via Pygments syntax highlighting)
|
||
|
||
**Effects:**
|
||
- Ken Burns effect on still images (zoom/pan animation)
|
||
- Transition effects between scenes (fade, slide, dissolve)
|
||
|
||
#### Stage 4 — Video Composition (FFmpeg + MoviePy)
|
||
|
||
**Libraries:** FFmpeg (encoding), MoviePy (high-level composition), Pillow (text overlays), pydub (audio processing)
|
||
|
||
**Process:**
|
||
1. Create visual timeline from script sections
|
||
2. Assign visuals to each section (image/video clip per point)
|
||
3. Add text overlays at specified timestamps
|
||
4. Mix voiceover audio with background music (royalty-free, 20% volume)
|
||
5. Apply transitions between sections
|
||
6. Render to target resolution/format
|
||
|
||
**Render Presets:**
|
||
| Preset | Resolution | Duration Range | Encoding |
|
||
|--------|-----------|----------------|----------|
|
||
| `youtube_long` | 1920×1080 | 3-15m | H.264/AAC |
|
||
| `youtube_short` | 1080×1920 | 30-60s | H.264/AAC |
|
||
| `instagram_reel` | 1080×1920 | 30-90s | H.264/AAC |
|
||
| `tiktok` | 1080×1920 | 30-180s | H.264/AAC |
|
||
|
||
#### Stage 5 — SEO & Publishing
|
||
- Auto-generate SRT subtitle file from TTS word-level timestamps
|
||
- AI thumbnail: hero image with title text overlay
|
||
- Platform-specific metadata: title (optimized per platform), description (with timestamps for YouTube), tags, category
|
||
- Publishing via platform APIs (reuses OAuth from 02H SocialAccount)
|
||
- Confirmation logging with platform video ID
|
||
|
||
### 2.4 User Flow
|
||
|
||
1. Select content → choose video type (short/medium/long) → target platforms
|
||
2. AI generates script → user reviews/edits script
|
||
3. Select voice → preview audio → approve
|
||
4. Auto-assign visuals → user can swap images → preview composition
|
||
5. Render video → preview final → approve
|
||
6. Publish to selected platforms → track performance
|
||
|
||
### 2.5 Dedicated Celery Queue
|
||
|
||
Video rendering is CPU/GPU intensive and requires isolation:
|
||
|
||
- **Dedicated `video` queue:** `celery -A igny8_core worker -Q video --concurrency=1`
|
||
- **Long-running tasks:** 5-30 minutes per video render
|
||
- **Progress tracking:** via Celery result backend (task status updates)
|
||
- **Temp file cleanup:** after publish, clean up intermediate files (audio, frames, raw renders)
|
||
|
||
---
|
||
|
||
## 3. DATA MODELS & APIS
|
||
|
||
### 3.1 New Models
|
||
|
||
All models in a new `video` app.
|
||
|
||
#### VideoProject (video app)
|
||
|
||
```python
|
||
class VideoProject(SiteSectorBaseModel):
|
||
"""
|
||
Top-level container for a video creation project.
|
||
Links to source content and tracks overall progress.
|
||
"""
|
||
content = models.ForeignKey(
|
||
'writer.Content',
|
||
on_delete=models.SET_NULL,
|
||
null=True,
|
||
blank=True,
|
||
related_name='video_projects',
|
||
help_text='Source content (null for standalone video)'
|
||
)
|
||
project_type = models.CharField(
|
||
max_length=10,
|
||
choices=[
|
||
('short', 'Short (30-90s)'),
|
||
('medium', 'Medium (60-180s)'),
|
||
('long', 'Long (5-15m)'),
|
||
]
|
||
)
|
||
target_platforms = models.JSONField(
|
||
default=list,
|
||
help_text='List of target platform strings: youtube_long, youtube_short, instagram_reel, tiktok'
|
||
)
|
||
status = models.CharField(
|
||
max_length=15,
|
||
choices=[
|
||
('draft', 'Draft'),
|
||
('scripting', 'Script Generation'),
|
||
('voiceover', 'Voiceover Generation'),
|
||
('composing', 'Visual Composition'),
|
||
('rendering', 'Rendering'),
|
||
('review', 'Ready for Review'),
|
||
('published', 'Published'),
|
||
('failed', 'Failed'),
|
||
],
|
||
default='draft'
|
||
)
|
||
settings = models.JSONField(
|
||
default=dict,
|
||
help_text='{voice_id, voice_provider, music_track, transition_style}'
|
||
)
|
||
|
||
class Meta:
|
||
app_label = 'video'
|
||
db_table = 'igny8_video_projects'
|
||
```
|
||
|
||
**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel
|
||
|
||
#### VideoScript (video app)
|
||
|
||
```python
|
||
class VideoScript(models.Model):
|
||
"""
|
||
Script for a video project — generated by AI, editable by user.
|
||
"""
|
||
project = models.OneToOneField(
|
||
'video.VideoProject',
|
||
on_delete=models.CASCADE,
|
||
related_name='script'
|
||
)
|
||
script_text = models.TextField(help_text='Full narration text')
|
||
sections = models.JSONField(
|
||
default=list,
|
||
help_text='[{text, duration_est, visual_cue, text_overlay}]'
|
||
)
|
||
hook = models.TextField(blank=True, default='')
|
||
cta = models.TextField(blank=True, default='')
|
||
chapter_markers = models.JSONField(
|
||
default=list,
|
||
help_text='[{time, title}]'
|
||
)
|
||
total_estimated_duration = models.IntegerField(
|
||
default=0,
|
||
help_text='Total estimated duration in seconds'
|
||
)
|
||
seo_metadata = models.JSONField(
|
||
default=dict,
|
||
help_text='{platform: {title, description, tags}}'
|
||
)
|
||
version = models.IntegerField(default=1)
|
||
|
||
class Meta:
|
||
app_label = 'video'
|
||
db_table = 'igny8_video_scripts'
|
||
```
|
||
|
||
**PK:** BigAutoField (integer) — standard Django Model
|
||
|
||
#### VideoAsset (video app)
|
||
|
||
```python
|
||
class VideoAsset(models.Model):
|
||
"""
|
||
Individual asset (image, footage, music, overlay, subtitle) for a video project.
|
||
"""
|
||
project = models.ForeignKey(
|
||
'video.VideoProject',
|
||
on_delete=models.CASCADE,
|
||
related_name='assets'
|
||
)
|
||
asset_type = models.CharField(
|
||
max_length=15,
|
||
choices=[
|
||
('image', 'Image'),
|
||
('footage', 'Footage'),
|
||
('music', 'Background Music'),
|
||
('overlay', 'Text Overlay'),
|
||
('subtitle', 'Subtitle File'),
|
||
]
|
||
)
|
||
source = models.CharField(
|
||
max_length=20,
|
||
choices=[
|
||
('article_image', 'Article Image'),
|
||
('ai_generated', 'AI Generated'),
|
||
('stock_pexels', 'Pexels Stock'),
|
||
('stock_pixabay', 'Pixabay Stock'),
|
||
('uploaded', 'User Uploaded'),
|
||
('rendered', 'Rendered'),
|
||
]
|
||
)
|
||
file_path = models.CharField(max_length=500, help_text='Path in media storage')
|
||
file_url = models.URLField(blank=True, default='')
|
||
duration = models.FloatField(
|
||
null=True, blank=True,
|
||
help_text='Duration in seconds (for video/audio assets)'
|
||
)
|
||
section_index = models.IntegerField(
|
||
null=True, blank=True,
|
||
help_text='Which script section this asset belongs to'
|
||
)
|
||
order = models.IntegerField(default=0)
|
||
|
||
class Meta:
|
||
app_label = 'video'
|
||
db_table = 'igny8_video_assets'
|
||
```
|
||
|
||
**PK:** BigAutoField (integer) — standard Django Model
|
||
|
||
#### RenderedVideo (video app)
|
||
|
||
```python
|
||
class RenderedVideo(models.Model):
|
||
"""
|
||
A rendered video file for a specific platform preset.
|
||
One project can have multiple renders (one per target platform).
|
||
"""
|
||
project = models.ForeignKey(
|
||
'video.VideoProject',
|
||
on_delete=models.CASCADE,
|
||
related_name='rendered_videos'
|
||
)
|
||
preset = models.CharField(
|
||
max_length=20,
|
||
choices=[
|
||
('youtube_long', 'YouTube Long'),
|
||
('youtube_short', 'YouTube Short'),
|
||
('instagram_reel', 'Instagram Reel'),
|
||
('tiktok', 'TikTok'),
|
||
]
|
||
)
|
||
resolution = models.CharField(
|
||
max_length=15,
|
||
help_text='e.g. 1920x1080 or 1080x1920'
|
||
)
|
||
duration = models.FloatField(help_text='Duration in seconds')
|
||
file_size = models.BigIntegerField(help_text='File size in bytes')
|
||
file_path = models.CharField(max_length=500)
|
||
file_url = models.URLField(blank=True, default='')
|
||
subtitle_file_path = models.CharField(max_length=500, blank=True, default='')
|
||
thumbnail_path = models.CharField(max_length=500, blank=True, default='')
|
||
render_started_at = models.DateTimeField()
|
||
render_completed_at = models.DateTimeField(null=True, blank=True)
|
||
status = models.CharField(
|
||
max_length=15,
|
||
choices=[
|
||
('queued', 'Queued'),
|
||
('rendering', 'Rendering'),
|
||
('completed', 'Completed'),
|
||
('failed', 'Failed'),
|
||
],
|
||
default='queued'
|
||
)
|
||
|
||
class Meta:
|
||
app_label = 'video'
|
||
db_table = 'igny8_rendered_videos'
|
||
```
|
||
|
||
**PK:** BigAutoField (integer) — standard Django Model
|
||
|
||
#### PublishedVideo (video app)
|
||
|
||
```python
|
||
class PublishedVideo(models.Model):
|
||
"""
|
||
Tracks a rendered video published to a social platform.
|
||
Uses SocialAccount from 02H for OAuth credentials.
|
||
"""
|
||
rendered_video = models.ForeignKey(
|
||
'video.RenderedVideo',
|
||
on_delete=models.CASCADE,
|
||
related_name='publications'
|
||
)
|
||
social_account = models.ForeignKey(
|
||
'social.SocialAccount',
|
||
on_delete=models.CASCADE,
|
||
related_name='published_videos'
|
||
)
|
||
platform = models.CharField(max_length=15)
|
||
platform_video_id = models.CharField(max_length=255, blank=True, default='')
|
||
published_url = models.URLField(blank=True, default='')
|
||
title = models.CharField(max_length=255)
|
||
description = models.TextField()
|
||
tags = models.JSONField(default=list)
|
||
thumbnail_url = models.URLField(blank=True, default='')
|
||
published_at = models.DateTimeField(null=True, blank=True)
|
||
status = models.CharField(
|
||
max_length=15,
|
||
choices=[
|
||
('publishing', 'Publishing'),
|
||
('published', 'Published'),
|
||
('failed', 'Failed'),
|
||
('removed', 'Removed'),
|
||
],
|
||
default='publishing'
|
||
)
|
||
|
||
class Meta:
|
||
app_label = 'video'
|
||
db_table = 'igny8_published_videos'
|
||
```
|
||
|
||
**PK:** BigAutoField (integer) — standard Django Model
|
||
|
||
#### VideoEngagement (video app)
|
||
|
||
```python
|
||
class VideoEngagement(models.Model):
|
||
"""
|
||
Engagement metrics for a published video.
|
||
Fetched periodically from platform APIs.
|
||
"""
|
||
published_video = models.ForeignKey(
|
||
'video.PublishedVideo',
|
||
on_delete=models.CASCADE,
|
||
related_name='engagement_records'
|
||
)
|
||
views = models.IntegerField(default=0)
|
||
likes = models.IntegerField(default=0)
|
||
comments = models.IntegerField(default=0)
|
||
shares = models.IntegerField(default=0)
|
||
watch_time_seconds = models.IntegerField(default=0)
|
||
avg_view_duration = models.FloatField(default=0.0)
|
||
raw_data = models.JSONField(default=dict, help_text='Full platform API response')
|
||
fetched_at = models.DateTimeField(auto_now_add=True)
|
||
|
||
class Meta:
|
||
app_label = 'video'
|
||
db_table = 'igny8_video_engagement'
|
||
```
|
||
|
||
**PK:** BigAutoField (integer) — standard Django Model
|
||
|
||
### 3.2 New App Registration
|
||
|
||
Create video app:
|
||
- **App config:** `igny8_core/modules/video/apps.py` with `app_label = 'video'`
|
||
- **Add to INSTALLED_APPS** in `igny8_core/settings.py`
|
||
|
||
### 3.3 Migration
|
||
|
||
```
|
||
igny8_core/migrations/XXXX_add_video_models.py
|
||
```
|
||
|
||
**Operations:**
|
||
1. `CreateModel('VideoProject', ...)` — with indexes on content, status
|
||
2. `CreateModel('VideoScript', ...)` — OneToOne to VideoProject
|
||
3. `CreateModel('VideoAsset', ...)` — with index on project
|
||
4. `CreateModel('RenderedVideo', ...)` — with index on project, status
|
||
5. `CreateModel('PublishedVideo', ...)` — with indexes on rendered_video, social_account
|
||
6. `CreateModel('VideoEngagement', ...)` — with index on published_video
|
||
|
||
### 3.4 API Endpoints
|
||
|
||
All endpoints under `/api/v1/video/`:
|
||
|
||
#### Project Management
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| POST | `/api/v1/video/projects/` | Create video project. Body: `{content_id, project_type, target_platforms}`. |
|
||
| GET | `/api/v1/video/projects/?site_id=X` | List projects with filters (status, project_type). |
|
||
| GET | `/api/v1/video/projects/{id}/` | Project detail with script, assets, renders. |
|
||
|
||
#### Script
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| POST | `/api/v1/video/scripts/generate/` | AI-generate script from content. Body: `{project_id}`. |
|
||
| PUT | `/api/v1/video/scripts/{project_id}/` | Edit script (user modifications). |
|
||
|
||
#### Voiceover
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| POST | `/api/v1/video/voiceover/generate/` | Generate TTS audio. Body: `{project_id, voice_id, provider}`. |
|
||
| POST | `/api/v1/video/voiceover/preview/` | Preview voice sample (short clip). Body: `{text, voice_id, provider}`. |
|
||
|
||
#### Assets
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| GET | `/api/v1/video/assets/{project_id}/` | List project assets. |
|
||
| POST | `/api/v1/video/assets/{project_id}/` | Add/replace asset. |
|
||
|
||
#### Rendering & Publishing
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| POST | `/api/v1/video/render/` | Queue video render. Body: `{project_id, presets: []}`. |
|
||
| GET | `/api/v1/video/render/{id}/status/` | Render progress (queued/rendering/completed/failed). |
|
||
| GET | `/api/v1/video/rendered/{project_id}/` | List rendered videos for project. |
|
||
| POST | `/api/v1/video/publish/` | Publish to platform. Body: `{rendered_video_id, social_account_id}`. |
|
||
|
||
#### Analytics
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| GET | `/api/v1/video/analytics/?site_id=X` | Aggregate video analytics across projects. |
|
||
| GET | `/api/v1/video/analytics/{published_video_id}/` | Single video analytics with engagement timeline. |
|
||
|
||
**Permissions:** All endpoints use `SiteSectorModelViewSet` permission patterns.
|
||
|
||
### 3.5 AI Functions
|
||
|
||
#### GenerateVideoScriptFunction
|
||
|
||
**Registry key:** `generate_video_script`
|
||
**Location:** `igny8_core/ai/functions/generate_video_script.py`
|
||
|
||
```python
|
||
class GenerateVideoScriptFunction(BaseAIFunction):
|
||
"""
|
||
Generates video script from content record.
|
||
Produces hook, intro, body points, CTA, chapter markers,
|
||
and platform-specific SEO metadata.
|
||
"""
|
||
function_name = 'generate_video_script'
|
||
|
||
def validate(self, project_id, **kwargs):
|
||
# Verify project exists, has linked content with content_html
|
||
pass
|
||
|
||
def prepare(self, project_id, **kwargs):
|
||
# Load VideoProject + Content
|
||
# Extract key points, images, meta_description
|
||
# Determine target duration from project_type
|
||
pass
|
||
|
||
def build_prompt(self):
|
||
# Include: content title, key points, meta_description
|
||
# Target duration constraints
|
||
# Per target_platform: SEO metadata requirements
|
||
pass
|
||
|
||
def parse_response(self, response):
|
||
# Parse script structure: hook, intro, points[], cta, chapter_markers
|
||
# Parse seo_metadata per platform
|
||
pass
|
||
|
||
def save_output(self, parsed):
|
||
# Create/update VideoScript record
|
||
# Update VideoProject.status = 'scripting' → 'voiceover'
|
||
pass
|
||
```
|
||
|
||
### 3.6 TTS Service
|
||
|
||
**Location:** `igny8_core/business/tts_service.py`
|
||
|
||
```python
|
||
class TTSService:
|
||
"""
|
||
Text-to-speech service. Supports cloud providers and self-hosted models.
|
||
Returns audio file + word-level timestamps.
|
||
"""
|
||
|
||
PROVIDERS = {
|
||
'openai': OpenAITTSProvider,
|
||
'elevenlabs': ElevenLabsTTSProvider,
|
||
'coqui': CoquiTTSProvider, # Self-hosted via 0F
|
||
'bark': BarkTTSProvider, # Self-hosted via 0F
|
||
'piper': PiperTTSProvider, # Self-hosted via 0F
|
||
}
|
||
|
||
def generate(self, text, voice_id, provider='openai'):
|
||
"""
|
||
Generate voiceover audio.
|
||
Returns {audio_path, duration, timestamps: [{word, start, end}]}
|
||
"""
|
||
pass
|
||
|
||
def preview(self, text, voice_id, provider='openai', max_chars=200):
|
||
"""Generate short preview clip."""
|
||
pass
|
||
|
||
def list_voices(self, provider='openai'):
|
||
"""List available voices for provider."""
|
||
pass
|
||
```
|
||
|
||
### 3.7 Video Composition Service
|
||
|
||
**Location:** `igny8_core/business/video_composition.py`
|
||
|
||
```python
|
||
class VideoCompositionService:
|
||
"""
|
||
Composes video from script + audio + visual assets using FFmpeg/MoviePy.
|
||
"""
|
||
|
||
PRESETS = {
|
||
'youtube_long': {'width': 1920, 'height': 1080, 'min_dur': 180, 'max_dur': 900},
|
||
'youtube_short': {'width': 1080, 'height': 1920, 'min_dur': 30, 'max_dur': 60},
|
||
'instagram_reel': {'width': 1080, 'height': 1920, 'min_dur': 30, 'max_dur': 90},
|
||
'tiktok': {'width': 1080, 'height': 1920, 'min_dur': 30, 'max_dur': 180},
|
||
}
|
||
|
||
def compose(self, project_id, preset):
|
||
"""
|
||
Full composition:
|
||
1. Load script + audio + visual assets
|
||
2. Create visual timeline from script sections
|
||
3. Assign visuals to sections
|
||
4. Add text overlays at timestamps
|
||
5. Mix voiceover + background music (20% volume)
|
||
6. Apply transitions
|
||
7. Render to preset resolution/format
|
||
8. Generate SRT subtitles from TTS timestamps
|
||
9. Generate thumbnail
|
||
10. Create RenderedVideo record
|
||
Returns RenderedVideo instance.
|
||
"""
|
||
pass
|
||
|
||
def _apply_ken_burns(self, image_clip, duration):
|
||
"""Apply zoom/pan animation to still image."""
|
||
pass
|
||
|
||
def _generate_subtitles(self, timestamps, output_path):
|
||
"""Generate SRT file from word-level timestamps."""
|
||
pass
|
||
|
||
def _generate_thumbnail(self, project, output_path):
|
||
"""Create thumbnail: hero image + title text overlay."""
|
||
pass
|
||
```
|
||
|
||
### 3.8 Video Publisher Service
|
||
|
||
**Location:** `igny8_core/business/video_publisher.py`
|
||
|
||
```python
|
||
class VideoPublisherService:
|
||
"""
|
||
Publishes rendered videos to platforms via their APIs.
|
||
Reuses SocialAccount OAuth from 02H.
|
||
"""
|
||
|
||
def publish(self, rendered_video_id, social_account_id):
|
||
"""
|
||
Upload and publish video to platform.
|
||
1. Load RenderedVideo + SocialAccount
|
||
2. Decrypt OAuth tokens
|
||
3. Upload video file via platform API
|
||
4. Set metadata (title, description, tags, thumbnail)
|
||
5. Create PublishedVideo record
|
||
"""
|
||
pass
|
||
```
|
||
|
||
### 3.9 Celery Tasks
|
||
|
||
**Location:** `igny8_core/tasks/video_tasks.py`
|
||
|
||
All video tasks run on the dedicated `video` queue:
|
||
|
||
```python
|
||
@shared_task(name='generate_video_script', queue='video')
|
||
def generate_video_script_task(project_id):
|
||
"""AI script generation from content."""
|
||
pass
|
||
|
||
@shared_task(name='generate_voiceover', queue='video')
|
||
def generate_voiceover_task(project_id):
|
||
"""TTS audio generation."""
|
||
pass
|
||
|
||
@shared_task(name='render_video', queue='video')
|
||
def render_video_task(project_id, preset):
|
||
"""
|
||
FFmpeg/MoviePy video composition. Long-running (5-30 min).
|
||
Updates RenderedVideo.status through lifecycle.
|
||
"""
|
||
pass
|
||
|
||
@shared_task(name='generate_thumbnail', queue='video')
|
||
def generate_thumbnail_task(project_id):
|
||
"""AI thumbnail creation with title overlay."""
|
||
pass
|
||
|
||
@shared_task(name='generate_subtitles', queue='video')
|
||
def generate_subtitles_task(project_id):
|
||
"""SRT generation from TTS timestamps."""
|
||
pass
|
||
|
||
@shared_task(name='publish_video', queue='video')
|
||
def publish_video_task(rendered_video_id, social_account_id):
|
||
"""Upload video to platform API."""
|
||
pass
|
||
|
||
@shared_task(name='fetch_video_engagement')
|
||
def fetch_video_engagement_task():
|
||
"""Periodic metric fetch for published videos. Runs on default queue."""
|
||
pass
|
||
|
||
@shared_task(name='video_pipeline_stage9', queue='video')
|
||
def video_pipeline_stage9(content_id):
|
||
"""
|
||
Full pipeline: script → voice → render → publish.
|
||
Triggered after Stage 8 (social) or directly after Stage 7 (publish).
|
||
"""
|
||
pass
|
||
```
|
||
|
||
**Beat Schedule Additions:**
|
||
|
||
| Task | Schedule | Notes |
|
||
|------|----------|-------|
|
||
| `fetch_video_engagement` | Every 12 hours | Fetches engagement metrics for published videos |
|
||
|
||
**Docker Configuration:**
|
||
```yaml
|
||
# Add to docker-compose.app.yml:
|
||
celery_video_worker:
|
||
build: ./backend
|
||
command: celery -A igny8_core worker -Q video --concurrency=1 --loglevel=info
|
||
# Requires FFmpeg installed in Docker image
|
||
```
|
||
|
||
**Dockerfile Addition:**
|
||
```dockerfile
|
||
# Add to backend/Dockerfile:
|
||
RUN apt-get update && apt-get install -y ffmpeg
|
||
```
|
||
|
||
---
|
||
|
||
## 4. IMPLEMENTATION STEPS
|
||
|
||
### Step 1: Create Video App
|
||
1. Create `igny8_core/modules/video/` directory with `__init__.py` and `apps.py`
|
||
2. Add `video` to `INSTALLED_APPS` in settings.py
|
||
3. Create 6 models: VideoProject, VideoScript, VideoAsset, RenderedVideo, PublishedVideo, VideoEngagement
|
||
|
||
### Step 2: Migration
|
||
1. Create migration for 6 new models
|
||
2. Run migration
|
||
|
||
### Step 3: System Dependencies
|
||
1. Add FFmpeg to Docker image (`apt-get install -y ffmpeg`)
|
||
2. Add to `requirements.txt`: `moviepy`, `pydub`, `Pillow` (already present), `pysrt`
|
||
3. Add docker-compose service for `celery_video_worker` with `-Q video --concurrency=1`
|
||
|
||
### Step 4: AI Function
|
||
1. Implement `GenerateVideoScriptFunction` in `igny8_core/ai/functions/generate_video_script.py`
|
||
2. Register `generate_video_script` in `igny8_core/ai/registry.py`
|
||
|
||
### Step 5: Services
|
||
1. Implement `TTSService` in `igny8_core/business/tts_service.py` (cloud + self-hosted providers)
|
||
2. Implement `VideoCompositionService` in `igny8_core/business/video_composition.py`
|
||
3. Implement `VideoPublisherService` in `igny8_core/business/video_publisher.py`
|
||
|
||
### Step 6: Pipeline Integration
|
||
Add Stage 9 trigger:
|
||
|
||
```python
|
||
# After Stage 8 (social posts) or Stage 7 (publish):
|
||
def post_social_or_publish(content_id):
|
||
site = content.site
|
||
config = AutomationConfig.objects.get(site=site)
|
||
if config.settings.get('video_enabled'):
|
||
video_pipeline_stage9.delay(content_id)
|
||
```
|
||
|
||
### Step 7: API Endpoints
|
||
1. Create `igny8_core/urls/video.py` with project, script, voiceover, asset, render, publish, analytics endpoints
|
||
2. Create views extending `SiteSectorModelViewSet`
|
||
3. Register URL patterns under `/api/v1/video/`
|
||
|
||
### Step 8: Celery Tasks
|
||
1. Implement 8 tasks in `igny8_core/tasks/video_tasks.py`
|
||
2. Add `fetch_video_engagement` to beat schedule
|
||
3. Ensure render tasks target `video` queue
|
||
|
||
### Step 9: Serializers & Admin
|
||
1. Create DRF serializers for all 6 models
|
||
2. Register models in Django admin
|
||
|
||
### Step 10: Credit Cost Configuration
|
||
Add to `CreditCostConfig` (billing app):
|
||
|
||
| operation_type | default_cost | description |
|
||
|---------------|-------------|-------------|
|
||
| `video_script_generation` | 5 | AI script generation from content |
|
||
| `video_tts_standard` | 10/min | Cloud TTS (OpenAI) — per minute of audio |
|
||
| `video_tts_selfhosted` | 2/min | Self-hosted TTS (Coqui/Piper via 0F) |
|
||
| `video_tts_hd` | 20/min | HD TTS (ElevenLabs) — per minute |
|
||
| `video_visual_generation` | 15-50 | AI visual asset generation (varies by count) |
|
||
| `video_thumbnail` | 3-10 | AI thumbnail creation |
|
||
| `video_composition` | 5 | FFmpeg render |
|
||
| `video_seo_metadata` | 1 | SEO metadata per platform |
|
||
| `video_short_total` | 40-80 | Total for short-form video |
|
||
| `video_long_total` | 100-250 | Total for long-form video |
|
||
|
||
---
|
||
|
||
## 5. ACCEPTANCE CRITERIA
|
||
|
||
### Script Generation
|
||
- [ ] AI generates structured script from content with hook, intro, body points, CTA
|
||
- [ ] Script includes chapter markers with timestamps
|
||
- [ ] Platform-specific SEO metadata generated (title, description, tags)
|
||
- [ ] Script duration estimates match project_type constraints
|
||
- [ ] User can edit script before proceeding
|
||
|
||
### Voiceover
|
||
- [ ] OpenAI TTS generates audio with voice selection
|
||
- [ ] ElevenLabs TTS works as premium option
|
||
- [ ] Self-hosted TTS (Coqui XTTS-v2) works via 0F GPU
|
||
- [ ] Word-level timestamps generated for subtitle sync
|
||
- [ ] Voice preview endpoint allows testing before full generation
|
||
|
||
### Visual Assets
|
||
- [ ] Article images from Images model used as visual assets
|
||
- [ ] Ken Burns effect applied to still images
|
||
- [ ] Text overlay frames rendered via Pillow
|
||
- [ ] Transitions applied between scenes
|
||
- [ ] User can swap assets before rendering
|
||
|
||
### Rendering
|
||
- [ ] FFmpeg/MoviePy composition produces correct resolution per preset
|
||
- [ ] Audio mix: voiceover at 100% + background music at 20%
|
||
- [ ] SRT subtitle file generated from TTS timestamps
|
||
- [ ] AI thumbnail generated with title text overlay
|
||
- [ ] Render runs on dedicated `video` Celery queue with concurrency=1
|
||
- [ ] Render progress trackable via status endpoint
|
||
|
||
### Publishing
|
||
- [ ] Video uploads to YouTube via API (reusing 02H SocialAccount)
|
||
- [ ] Video uploads to Instagram Reels
|
||
- [ ] Video uploads to TikTok
|
||
- [ ] Platform video ID and URL stored on PublishedVideo
|
||
- [ ] Engagement metrics fetched every 12 hours
|
||
|
||
### Pipeline Integration
|
||
- [ ] Stage 9 triggers automatically when video_enabled in AutomationConfig
|
||
- [ ] Full pipeline (script → voice → render → publish) runs as single Celery chain
|
||
- [ ] VideoObject schema (02G) generated for published video content
|
||
|
||
---
|
||
|
||
## 6. CLAUDE CODE INSTRUCTIONS
|
||
|
||
### File Locations
|
||
```
|
||
igny8_core/
|
||
├── modules/
|
||
│ └── video/
|
||
│ ├── __init__.py
|
||
│ ├── apps.py # app_label = 'video'
|
||
│ └── models.py # 6 models
|
||
├── ai/
|
||
│ └── functions/
|
||
│ └── generate_video_script.py # GenerateVideoScriptFunction
|
||
├── business/
|
||
│ ├── tts_service.py # TTSService (cloud + self-hosted)
|
||
│ ├── video_composition.py # VideoCompositionService
|
||
│ └── video_publisher.py # VideoPublisherService
|
||
├── tasks/
|
||
│ └── video_tasks.py # Celery tasks (video queue)
|
||
├── urls/
|
||
│ └── video.py # Video endpoints
|
||
└── migrations/
|
||
└── XXXX_add_video_models.py
|
||
```
|
||
|
||
### Conventions
|
||
- **PKs:** BigAutoField (integer) — do NOT use UUIDs
|
||
- **Table prefix:** `igny8_` on all new tables
|
||
- **App label:** `video` (new app)
|
||
- **Celery app name:** `igny8_core`
|
||
- **Celery queue:** `video` for all render/composition tasks (default queue for engagement fetch)
|
||
- **URL pattern:** `/api/v1/video/...`
|
||
- **Permissions:** Use `SiteSectorModelViewSet` permission pattern
|
||
- **Docker:** FFmpeg must be installed in Docker image; dedicated `celery_video_worker` service
|
||
- **AI functions:** Extend `BaseAIFunction`; register as `generate_video_script`
|
||
- **Frontend:** `.tsx` files with Zustand stores
|
||
|
||
### Cross-References
|
||
| Doc | Relationship |
|
||
|-----|-------------|
|
||
| **02H** | Socializer provides SocialAccount model + OAuth for YouTube/Instagram/TikTok publishing |
|
||
| **0F** | Self-hosted AI infrastructure provides GPU for TTS + image generation |
|
||
| **01E** | Pipeline Stage 9 integration — hooks after Stage 8 (social) or Stage 7 (publish) |
|
||
| **02G** | VideoObject schema generated for content with published video |
|
||
| **04A** | Managed services may include video creation as premium tier |
|
||
|
||
### Key Decisions
|
||
1. **New `video` app** — Separate app because video has 6 models and complex pipeline logic distinct from social posting
|
||
2. **Dedicated Celery queue** — Video rendering is CPU/GPU intensive (5-30 min); isolated `video` queue with concurrency=1 prevents blocking other tasks
|
||
3. **VideoScript, VideoAsset as plain models.Model** — Not SiteSectorBaseModel because they're children of VideoProject which carries the site/sector context
|
||
4. **Multiple RenderedVideo per project** — One project can target multiple platforms; each gets its own render at the correct resolution
|
||
5. **Reuse 02H OAuth** — PublishedVideo references SocialAccount from 02H; no duplicate OAuth infrastructure for video platforms
|
||
6. **Temp file cleanup** — Intermediate files (raw audio, image frames, non-final renders) cleaned up after successful publish to manage disk space
|
||
|
||
### System Requirements
|
||
- FFmpeg installed on server (add to Docker image via `apt-get install -y ffmpeg`)
|
||
- Python packages: `moviepy`, `pydub`, `Pillow`, `pysrt`
|
||
- Sufficient disk space for video temp files (cleanup after publish)
|
||
- Self-hosted GPU (from 0F) for TTS + AI image generation (optional — cloud fallback available)
|
||
- Dedicated Celery worker for `video` queue: `celery -A igny8_core worker -Q video --concurrency=1`
|