# Image Generation Implementation Plan ## Complete Plan for Generating Images from Prompts **Date:** 2025-01-XX **Scope:** Implement image generation AI function following existing AI framework patterns --- ## Table of Contents 1. [System Understanding](#1-system-understanding) 2. [Architecture Overview](#2-architecture-overview) 3. [Implementation Plan](#3-implementation-plan) 4. [Technical Details](#4-technical-details) 5. [Frontend Integration](#5-frontend-integration) 6. [Testing Strategy](#6-testing-strategy) --- ## 1. System Understanding ### 1.1 Current AI Framework Architecture The system uses a unified AI framework with the following components: **Core Flow:** ``` Frontend API Call ↓ views.py (@action endpoint) ↓ run_ai_task (ai/tasks.py) - Unified Celery task entrypoint ↓ AIEngine (ai/engine.py) - Orchestrator (6 phases: INIT, PREP, AI_CALL, PARSE, SAVE, DONE) ↓ BaseAIFunction implementation ↓ AICore (ai/ai_core.py) - Centralized AI request handler ↓ AI Provider (OpenAI/Runware) ``` **Existing AI Functions:** 1. **AutoClusterFunction** (`auto_cluster.py`) - Groups keywords into clusters 2. **GenerateIdeasFunction** (`generate_ideas.py`) - Generates content ideas from clusters 3. **GenerateContentFunction** (`generate_content.py`) - Generates article content from ideas 4. **GenerateImagePromptsFunction** (`generate_image_prompts.py`) - Extracts image prompts from content **Key Components:** - **BaseAIFunction** - Abstract base class with methods: `get_name()`, `validate()`, `prepare()`, `build_prompt()`, `parse_response()`, `save_output()` - **AIEngine** - Manages lifecycle, progress tracking, cost tracking, error handling - **PromptRegistry** - Centralized prompt management with hierarchy (task → DB → default) - **AICore** - Handles API calls to OpenAI/Runware for both text and image generation - **IntegrationSettings** - Stores account-specific configurations (models, API keys, image settings) ### 1.2 Image Generation System (WordPress Plugin Reference) **Key Learnings from WP Plugin:** 1. **Queue-Based Processing:** - Images are processed sequentially in a queue - Each image has its own progress bar (0-50% in 7s, 50-75% in 5s, 75-95% incrementally) - Progress modal shows all images being processed with individual status 2. **Image Types:** - Featured image (1 per content) - In-article images (configurable: 1-5 per content) - Desktop images (if enabled) - Mobile images (if enabled) 3. **Settings from IntegrationSettings:** - `provider`: 'openai' or 'runware' - `model`: Model name (e.g., 'dall-e-3', 'runware:97@1') - `image_type`: 'realistic', 'artistic', 'cartoon' - `max_in_article_images`: 1-5 - `image_format`: 'webp', 'jpg', 'png' - `desktop_enabled`: boolean - `mobile_enabled`: boolean 4. **Prompt Templates:** - `image_prompt_template`: Template for formatting prompts (uses {post_title}, {image_prompt}, {image_type}) - `negative_prompt`: Negative prompt for Runware (OpenAI doesn't support) 5. **Progress Tracking:** - Real-time progress updates via Celery - Individual image status tracking - Success/failure per image ### 1.3 Current Image Generation Function **Existing:** `GenerateImagesFunction` (`generate_images.py`) - **Status:** Partially implemented, uses old pattern - **Issues:** - Still references `Tasks` instead of `Content` - Doesn't follow the new unified framework pattern - Uses legacy `generate_images_core()` wrapper - Doesn't properly queue multiple images **What We Need:** - New function: `GenerateImagesFromPromptsFunction` - Should work with `Images` model (which now has `content` relationship) - Should process images in queue (one at a time) - Should use progress modal similar to other AI functions - Should use prompt templates and negative prompts from Thinker/Prompts --- ## 2. Architecture Overview ### 2.1 New Function: `GenerateImagesFromPromptsFunction` **Purpose:** Generate actual images from existing image prompts stored in `Images` model **Input:** - `ids`: List of Image IDs (or Content IDs) to generate images for - Images must have `prompt` field populated (from `GenerateImagePromptsFunction`) **Output:** - Updates `Images` records with: - `image_url`: Generated image URL - `status`: 'generated' (or 'failed' on error) **Flow:** 1. **INIT (0-10%)**: Validate image IDs, check prompts exist 2. **PREP (10-25%)**: Load images, get settings, prepare queue 3. **AI_CALL (25-70%)**: Generate images sequentially (one per AI_CALL phase) 4. **PARSE (70-85%)**: Parse image URLs from responses 5. **SAVE (85-98%)**: Update Images records with URLs 6. **DONE (98-100%)**: Complete ### 2.2 Key Differences from Other Functions **Unlike text generation functions:** - **Multiple AI calls**: One AI call per image (not one call for all) - **Sequential processing**: Images must be generated one at a time (rate limits) - **Progress per image**: Need to track progress for each individual image - **Different API**: Uses `AICore.generate_image()` instead of `AICore.run_ai_request()` **Similarities:** - Uses same `BaseAIFunction` pattern - Uses same `AIEngine` orchestrator - Uses same progress tracking system - Uses same error handling --- ## 3. Implementation Plan ### Phase 1: Backend AI Function #### 3.1 Create `GenerateImagesFromPromptsFunction` **File:** `backend/igny8_core/ai/functions/generate_images_from_prompts.py` **Class Structure:** ```python class GenerateImagesFromPromptsFunction(BaseAIFunction): def get_name(self) -> str: return 'generate_images_from_prompts' def get_metadata(self) -> Dict: return { 'display_name': 'Generate Images from Prompts', 'description': 'Generate actual images from image prompts', 'phases': { 'INIT': 'Validating image prompts...', 'PREP': 'Preparing image generation queue...', 'AI_CALL': 'Generating images with AI...', 'PARSE': 'Processing image URLs...', 'SAVE': 'Saving image URLs...', 'DONE': 'Images generated!' } } def validate(self, payload: dict, account=None) -> Dict: """Validate image IDs and check prompts exist""" # Check for 'ids' array # Check images exist and have prompts # Check images have status='pending' # Check account matches def prepare(self, payload: dict, account=None) -> Dict: """Load images and settings""" # Load Images records by IDs # Get IntegrationSettings for image_generation # Extract: provider, model, image_type, image_format, etc. # Get prompt templates from PromptRegistry # Return: { # 'images': [Image objects], # 'settings': {...}, # 'image_prompt_template': str, # 'negative_prompt': str # } def build_prompt(self, data: Dict, account=None) -> Dict: """Format prompt using template""" # For each image in queue: # - Get content title (from image.content) # - Format prompt using image_prompt_template # - Return formatted prompt + image_type # Note: This is called once per image (AIEngine handles iteration) def parse_response(self, response: Dict, step_tracker=None) -> Dict: """Parse image URL from response""" # Response from AICore.generate_image() has: # - 'url': Image URL # - 'revised_prompt': (optional) # - 'cost': (optional) # Return: {'url': str, 'revised_prompt': str, 'cost': float} def save_output(self, parsed: Dict, original_data: Dict, account=None, ...) -> Dict: """Update Images record with URL""" # Get image from original_data # Update Images record: # - image_url = parsed['url'] # - status = 'generated' # - updated_at = now() # Return: {'count': 1, 'images_generated': 1} ``` **Key Implementation Details:** 1. **Multiple AI Calls Handling:** - `AIEngine` will call `build_prompt()` → `AI_CALL` → `parse_response()` → `SAVE` for each image - Need to track which image is being processed - Use `step_tracker` to log progress per image 2. **Prompt Formatting:** ```python # Get template from PromptRegistry template = PromptRegistry.get_image_prompt_template(account) # Format with content title and prompt formatted = template.format( post_title=image.content.title or image.content.meta_title, image_prompt=image.prompt, image_type=settings['image_type'] ) ``` 3. **Image Generation:** ```python # Use AICore.generate_image() result = ai_core.generate_image( prompt=formatted_prompt, provider=settings['provider'], model=settings['model'], size='1024x1024', # Default or from settings negative_prompt=negative_prompt if provider == 'runware' else None, function_name='generate_images_from_prompts' ) ``` 4. **Progress Tracking:** - Track total images: `len(images)` - Track completed: Increment after each SAVE - Update progress: `(completed / total) * 100` #### 3.2 Update AIEngine for Multiple AI Calls **File:** `backend/igny8_core/ai/engine.py` **Changes Needed:** - Detect if function needs multiple AI calls (check function name or metadata) - For `generate_images_from_prompts`: - Loop through images in PREP data - For each image: - Call `build_prompt()` with single image - Call `AI_CALL` phase (generate image) - Call `parse_response()` - Call `SAVE` phase - Update progress: `(current_image / total_images) * 100` - After all images: Call DONE phase **Alternative Approach (Simpler):** - Process all images in `save_output()` method - Make AI calls directly in `save_output()` (not through AIEngine phases) - Update progress manually via `progress_tracker.update()` - This is simpler but less consistent with framework **Recommended Approach:** - Use AIEngine's phase system - Add metadata flag: `requires_multiple_ai_calls: True` - AIEngine detects this and loops through items #### 3.3 Register Function **File:** `backend/igny8_core/ai/registry.py` ```python def _load_generate_images_from_prompts(): from igny8_core.ai.functions.generate_images_from_prompts import GenerateImagesFromPromptsFunction return GenerateImagesFromPromptsFunction register_lazy_function('generate_images_from_prompts', _load_generate_images_from_prompts) ``` **File:** `backend/igny8_core/ai/functions/__init__.py` ```python from .generate_images_from_prompts import GenerateImagesFromPromptsFunction __all__ = [ ... 'GenerateImagesFromPromptsFunction', ] ``` #### 3.4 Add Model Configuration **File:** `backend/igny8_core/ai/settings.py` ```python MODEL_CONFIG = { ... 'generate_images_from_prompts': { 'model': 'dall-e-3', # Default, overridden by IntegrationSettings 'max_tokens': None, # Not used for images 'temperature': None, # Not used for images 'response_format': None, # Not used for images }, } FUNCTION_TO_PROMPT_TYPE = { ... 'generate_images_from_prompts': None, # Uses image_prompt_template, not text prompt } ``` #### 3.5 Update Progress Messages **File:** `backend/igny8_core/ai/engine.py` ```python def _get_prep_message(self, function_name: str, count: int, data: Any) -> str: ... elif function_name == 'generate_images_from_prompts': total_images = len(data.get('images', [])) return f"Preparing to generate {total_images} image{'s' if total_images != 1 else ''}" def _get_ai_call_message(self, function_name: str, count: int) -> str: ... elif function_name == 'generate_images_from_prompts': return f"Generating image {count} of {total} with AI" def _get_parse_message_with_count(self, function_name: str, count: int) -> str: ... elif function_name == 'generate_images_from_prompts': return f"{count} image{'s' if count != 1 else ''} generated" def _get_save_message(self, function_name: str, count: int) -> str: ... elif function_name == 'generate_images_from_prompts': return f"Saving {count} image{'s' if count != 1 else ''}" ``` ### Phase 2: API Endpoint #### 3.6 Add API Endpoint **File:** `backend/igny8_core/modules/writer/views.py` **Add to `ImagesViewSet`:** ```python @action(detail=False, methods=['post'], url_path='generate_images', url_name='generate_images') def generate_images(self, request): """Generate images from prompts for image records""" from igny8_core.ai.tasks import run_ai_task account = getattr(request, 'account', None) ids = request.data.get('ids', []) if not ids: return Response({ 'error': 'No IDs provided', 'type': 'ValidationError' }, status=status.HTTP_400_BAD_REQUEST) account_id = account.id if account else None # Queue Celery task try: if hasattr(run_ai_task, 'delay'): task = run_ai_task.delay( function_name='generate_images_from_prompts', payload={'ids': ids}, account_id=account_id ) return Response({ 'success': True, 'task_id': str(task.id), 'message': 'Image generation started' }, status=status.HTTP_200_OK) else: # Fallback to synchronous execution result = run_ai_task( function_name='generate_images_from_prompts', payload={'ids': ids}, account_id=account_id ) if result.get('success'): return Response({ 'success': True, 'images_generated': result.get('count', 0), 'message': 'Images generated successfully' }, status=status.HTTP_200_OK) else: return Response({ 'error': result.get('error', 'Image generation failed'), 'type': 'TaskExecutionError' }, status=status.HTTP_500_INTERNAL_SERVER_ERROR) except Exception as e: return Response({ 'error': str(e), 'type': 'ExecutionError' }, status=status.HTTP_500_INTERNAL_SERVER_ERROR) ``` ### Phase 3: Frontend Integration #### 3.7 Add API Function **File:** `frontend/src/services/api.ts` ```typescript export async function generateImages(imageIds: number[]): Promise { return fetchAPI('/v1/writer/images/generate_images/', { method: 'POST', body: JSON.stringify({ ids: imageIds }), }); } ``` #### 3.8 Add Generate Images Button **File:** `frontend/src/config/pages/images.config.tsx` **Add to row actions or status column:** - Add "Generate Images" button in status column - Only show if status is 'pending' and prompt exists - Button should trigger generation for all images for that content **File:** `frontend/src/pages/Writer/Images.tsx` **Add handler:** ```typescript const handleGenerateImages = useCallback(async (contentId: number) => { try { // Get all pending images for this content const contentImages = images.find(g => g.content_id === contentId); if (!contentImages) return; // Collect all image IDs with prompts const imageIds: number[] = []; if (contentImages.featured_image?.id && contentImages.featured_image.status === 'pending') { imageIds.push(contentImages.featured_image.id); } contentImages.in_article_images.forEach(img => { if (img.id && img.status === 'pending' && img.prompt) { imageIds.push(img.id); } }); if (imageIds.length === 0) { toast.info('No pending images with prompts found'); return; } const result = await generateImages(imageIds); if (result.success) { if (result.task_id) { // Open progress modal progressModal.openModal( result.task_id, 'Generate Images', 'ai-generate-images-from-prompts-01-desktop' ); } else { toast.success(`Images generated: ${result.images_generated || 0} image${(result.images_generated || 0) === 1 ? '' : 's'} created`); loadImages(); } } else { toast.error(result.error || 'Failed to generate images'); } } catch (error: any) { toast.error(`Failed to generate images: ${error.message}`); } }, [toast, progressModal, loadImages, images]); ``` #### 3.9 Update Progress Modal **File:** `frontend/src/components/common/ProgressModal.tsx` **Add support for image generation:** - Update step labels for `generate_images_from_prompts` - Show progress per image - Display generated images in modal (optional, like WP plugin) **Step Labels:** ```typescript if (funcName.includes('generate_images_from_prompts')) { return [ { phase: 'INIT', label: 'Validating image prompts' }, { phase: 'PREP', label: 'Preparing image generation queue' }, { phase: 'AI_CALL', label: 'Generating images with AI' }, { phase: 'PARSE', label: 'Processing image URLs' }, { phase: 'SAVE', label: 'Saving image URLs' }, ]; } ``` **Success Message:** ```typescript if (funcName.includes('generate_images_from_prompts')) { const imageCount = extractCount(/(\d+)\s+image/i, stepLogs || []); if (imageCount) { return `${imageCount} image${imageCount !== '1' ? 's' : ''} generated successfully`; } return 'Images generated successfully'; } ``` --- ## 4. Technical Details ### 4.1 Image Generation API **AICore.generate_image()** already exists and handles: - OpenAI DALL-E (dall-e-2, dall-e-3) - Runware API - Negative prompts (Runware only) - Cost tracking - Error handling **Usage:** ```python result = ai_core.generate_image( prompt=formatted_prompt, provider='openai', # or 'runware' model='dall-e-3', # or 'runware:97@1' size='1024x1024', negative_prompt=negative_prompt, # Only for Runware function_name='generate_images_from_prompts' ) ``` **Response:** ```python { 'url': 'https://...', # Image URL 'revised_prompt': '...', # OpenAI may revise prompt 'cost': 0.04, # Cost in USD 'error': None # Error message if failed } ``` ### 4.2 Settings Retrieval **From IntegrationSettings:** ```python integration = IntegrationSettings.objects.get( account=account, integration_type='image_generation', is_active=True ) config = integration.config provider = config.get('provider') or config.get('service', 'openai') if provider == 'runware': model = config.get('model') or config.get('runwareModel', 'runware:97@1') else: model = config.get('model', 'dall-e-3') image_type = config.get('image_type', 'realistic') image_format = config.get('image_format', 'webp') ``` ### 4.3 Prompt Templates **From PromptRegistry:** ```python image_prompt_template = PromptRegistry.get_image_prompt_template(account) negative_prompt = PromptRegistry.get_negative_prompt(account) ``` **Formatting:** ```python formatted = image_prompt_template.format( post_title=content.title or content.meta_title, image_prompt=image.prompt, image_type=image_type # 'realistic', 'artistic', 'cartoon' ) ``` ### 4.4 Error Handling **Per-Image Errors:** - If one image fails, continue with others - Mark failed image: `status='failed'` - Log error in `Images` record or separate error field - Return success with partial count: `{'success': True, 'images_generated': 3, 'images_failed': 1}` **Validation Errors:** - No prompts: Skip image, log warning - No settings: Return error, don't start generation - Invalid provider/model: Return error --- ## 5. Frontend Integration ### 5.1 Images Page Updates **File:** `frontend/src/pages/Writer/Images.tsx` **Changes:** 1. Add "Generate Images" button in status column (or row actions) 2. Button only enabled if: - Status is 'pending' - Prompt exists - Content has at least one pending image 3. On click: Collect all pending image IDs for that content 4. Call API: `generateImages(imageIds)` 5. Open progress modal if async 6. Reload images on completion ### 5.2 Progress Modal Updates **File:** `frontend/src/components/common/ProgressModal.tsx` **Changes:** 1. Add step definitions for `generate_images_from_prompts` 2. Update progress messages 3. Show image count in messages 4. Optional: Display generated images in modal (like WP plugin) ### 5.3 Table Actions Config **File:** `frontend/src/config/pages/table-actions.config.tsx` **Add row action (optional):** ```typescript '/writer/images': { rowActions: [ { key: 'generate_images', label: 'Generate Images', icon: , variant: 'primary', }, ], } ``` --- ## 6. Testing Strategy ### 6.1 Unit Tests **Test Function Methods:** - `validate()`: Test with valid/invalid IDs, missing prompts, wrong status - `prepare()`: Test settings retrieval, prompt template loading - `build_prompt()`: Test prompt formatting - `parse_response()`: Test URL extraction - `save_output()`: Test Images record update ### 6.2 Integration Tests **Test Full Flow:** 1. Create Images records with prompts 2. Call API endpoint 3. Verify Celery task created 4. Verify progress updates 5. Verify Images records updated with URLs 6. Verify status changed to 'generated' ### 6.3 Error Scenarios **Test:** - Missing IntegrationSettings - Invalid provider/model - API errors (rate limits, invalid API key) - Partial failures (some images succeed, some fail) - Missing prompts - Invalid image IDs --- ## 7. Implementation Checklist ### Backend - [ ] Create `GenerateImagesFromPromptsFunction` class - [ ] Implement `validate()` method - [ ] Implement `prepare()` method - [ ] Implement `build_prompt()` method - [ ] Implement `parse_response()` method - [ ] Implement `save_output()` method - [ ] Register function in `registry.py` - [ ] Add to `__init__.py` exports - [ ] Add model config in `settings.py` - [ ] Update `AIEngine` progress messages - [ ] Add API endpoint in `ImagesViewSet` - [ ] Test with OpenAI provider - [ ] Test with Runware provider - [ ] Test error handling ### Frontend - [ ] Add `generateImages()` API function - [ ] Add "Generate Images" button to Images page - [ ] Add click handler - [ ] Integrate progress modal - [ ] Update progress modal step labels - [ ] Update success messages - [ ] Test UI flow - [ ] Test error handling ### Documentation - [ ] Update AI_MASTER_ARCHITECTURE.md - [ ] Add function to AI_FUNCTIONS_AUDIT_REPORT.md - [ ] Document API endpoint - [ ] Document settings requirements --- ## 8. Key Considerations ### 8.1 Rate Limiting **Issue:** Image generation APIs have rate limits **Solution:** Process images sequentially (one at a time) **Implementation:** AIEngine loops through images, waits for each to complete ### 8.2 Cost Tracking **Issue:** Need to track costs per image **Solution:** AICore already tracks costs, store in AITaskLog **Implementation:** Cost is returned from `generate_image()`, log in step_tracker ### 8.3 Progress Updates **Issue:** Need granular progress (per image) **Solution:** Update progress after each image: `(completed / total) * 100` **Implementation:** Track in `save_output()`, update via `progress_tracker.update()` ### 8.4 Error Recovery **Issue:** If one image fails, should continue with others **Solution:** Catch errors per image, mark as failed, continue **Implementation:** Try-catch in `save_output()` per image ### 8.5 Image Display **Issue:** Should show generated images in progress modal? **Solution:** Optional enhancement, can add later **Implementation:** Store image URLs in step logs, display in modal --- ## 9. Alternative Approaches Considered ### 9.1 Process All in save_output() **Pros:** - Simpler implementation - Direct control over loop **Cons:** - Doesn't use AIEngine phases properly - Harder to track progress per image - Less consistent with framework **Decision:** Use AIEngine phases with loop detection ### 9.2 Separate Function Per Image **Pros:** - Each image is independent task - Better error isolation **Cons:** - Too many Celery tasks - Harder to track overall progress - More complex frontend **Decision:** Single function processes all images sequentially --- ## 10. Success Criteria ✅ Function follows `BaseAIFunction` pattern ✅ Uses `AIEngine` orchestrator ✅ Integrates with progress modal ✅ Uses prompt templates from Thinker/Prompts ✅ Uses settings from IntegrationSettings ✅ Handles errors gracefully ✅ Tracks progress per image ✅ Updates Images records correctly ✅ Works with both OpenAI and Runware ✅ Frontend button triggers generation ✅ Progress modal shows correct steps ✅ Success message shows image count --- ## 11. Next Steps 1. **Start with Backend Function** - Create `GenerateImagesFromPromptsFunction` - Implement all methods - Test with single image 2. **Add API Endpoint** - Add to `ImagesViewSet` - Test endpoint 3. **Frontend Integration** - Add button - Add handler - Test flow 4. **Progress Modal** - Update step labels - Test progress updates 5. **Error Handling** - Test error scenarios - Verify graceful failures 6. **Documentation** - Update architecture docs - Add API docs --- **End of Plan**