Files
igny8/docs/IMAGE_GENERATION_IMPLEMENTATION_PLAN.md

25 KiB

Image Generation Implementation Plan

Complete Plan for Generating Images from Prompts

Date: 2025-01-XX
Scope: Implement image generation AI function following existing AI framework patterns


Table of Contents

  1. System Understanding
  2. Architecture Overview
  3. Implementation Plan
  4. Technical Details
  5. Frontend Integration
  6. Testing Strategy

1. System Understanding

1.1 Current AI Framework Architecture

The system uses a unified AI framework with the following components:

Core Flow:

Frontend API Call
  ↓
views.py (@action endpoint)
  ↓
run_ai_task (ai/tasks.py) - Unified Celery task entrypoint
  ↓
AIEngine (ai/engine.py) - Orchestrator (6 phases: INIT, PREP, AI_CALL, PARSE, SAVE, DONE)
  ↓
BaseAIFunction implementation
  ↓
AICore (ai/ai_core.py) - Centralized AI request handler
  ↓
AI Provider (OpenAI/Runware)

Existing AI Functions:

  1. AutoClusterFunction (auto_cluster.py) - Groups keywords into clusters
  2. GenerateIdeasFunction (generate_ideas.py) - Generates content ideas from clusters
  3. GenerateContentFunction (generate_content.py) - Generates article content from ideas
  4. GenerateImagePromptsFunction (generate_image_prompts.py) - Extracts image prompts from content

Key Components:

  • BaseAIFunction - Abstract base class with methods: get_name(), validate(), prepare(), build_prompt(), parse_response(), save_output()
  • AIEngine - Manages lifecycle, progress tracking, cost tracking, error handling
  • PromptRegistry - Centralized prompt management with hierarchy (task → DB → default)
  • AICore - Handles API calls to OpenAI/Runware for both text and image generation
  • IntegrationSettings - Stores account-specific configurations (models, API keys, image settings)

1.2 Image Generation System (WordPress Plugin Reference)

Key Learnings from WP Plugin:

  1. Queue-Based Processing:

    • Images are processed sequentially in a queue
    • Each image has its own progress bar (0-50% in 7s, 50-75% in 5s, 75-95% incrementally)
    • Progress modal shows all images being processed with individual status
  2. Image Types:

    • Featured image (1 per content)
    • In-article images (configurable: 1-5 per content)
    • Desktop images (if enabled)
    • Mobile images (if enabled)
  3. Settings from IntegrationSettings:

    • provider: 'openai' or 'runware'
    • model: Model name (e.g., 'dall-e-3', 'runware:97@1')
    • image_type: 'realistic', 'artistic', 'cartoon'
    • max_in_article_images: 1-5
    • image_format: 'webp', 'jpg', 'png'
    • desktop_enabled: boolean
    • mobile_enabled: boolean
  4. Prompt Templates:

    • image_prompt_template: Template for formatting prompts (uses {post_title}, {image_prompt}, {image_type})
    • negative_prompt: Negative prompt for Runware (OpenAI doesn't support)
  5. Progress Tracking:

    • Real-time progress updates via Celery
    • Individual image status tracking
    • Success/failure per image

1.3 Current Image Generation Function

Existing: GenerateImagesFunction (generate_images.py)

  • Status: Partially implemented, uses old pattern
  • Issues:
    • Still references Tasks instead of Content
    • Doesn't follow the new unified framework pattern
    • Uses legacy generate_images_core() wrapper
    • Doesn't properly queue multiple images

What We Need:

  • New function: GenerateImagesFromPromptsFunction
  • Should work with Images model (which now has content relationship)
  • Should process images in queue (one at a time)
  • Should use progress modal similar to other AI functions
  • Should use prompt templates and negative prompts from Thinker/Prompts

2. Architecture Overview

2.1 New Function: GenerateImagesFromPromptsFunction

Purpose: Generate actual images from existing image prompts stored in Images model

Input:

  • ids: List of Image IDs (or Content IDs) to generate images for
  • Images must have prompt field populated (from GenerateImagePromptsFunction)

Output:

  • Updates Images records with:
    • image_url: Generated image URL
    • status: 'generated' (or 'failed' on error)

Flow:

  1. INIT (0-10%): Validate image IDs, check prompts exist
  2. PREP (10-25%): Load images, get settings, prepare queue
  3. AI_CALL (25-70%): Generate images sequentially (one per AI_CALL phase)
  4. PARSE (70-85%): Parse image URLs from responses
  5. SAVE (85-98%): Update Images records with URLs
  6. DONE (98-100%): Complete

2.2 Key Differences from Other Functions

Unlike text generation functions:

  • Multiple AI calls: One AI call per image (not one call for all)
  • Sequential processing: Images must be generated one at a time (rate limits)
  • Progress per image: Need to track progress for each individual image
  • Different API: Uses AICore.generate_image() instead of AICore.run_ai_request()

Similarities:

  • Uses same BaseAIFunction pattern
  • Uses same AIEngine orchestrator
  • Uses same progress tracking system
  • Uses same error handling

3. Implementation Plan

Phase 1: Backend AI Function

3.1 Create GenerateImagesFromPromptsFunction

File: backend/igny8_core/ai/functions/generate_images_from_prompts.py

Class Structure:

class GenerateImagesFromPromptsFunction(BaseAIFunction):
    def get_name(self) -> str:
        return 'generate_images_from_prompts'
    
    def get_metadata(self) -> Dict:
        return {
            'display_name': 'Generate Images from Prompts',
            'description': 'Generate actual images from image prompts',
            'phases': {
                'INIT': 'Validating image prompts...',
                'PREP': 'Preparing image generation queue...',
                'AI_CALL': 'Generating images with AI...',
                'PARSE': 'Processing image URLs...',
                'SAVE': 'Saving image URLs...',
                'DONE': 'Images generated!'
            }
        }
    
    def validate(self, payload: dict, account=None) -> Dict:
        """Validate image IDs and check prompts exist"""
        # Check for 'ids' array
        # Check images exist and have prompts
        # Check images have status='pending'
        # Check account matches
    
    def prepare(self, payload: dict, account=None) -> Dict:
        """Load images and settings"""
        # Load Images records by IDs
        # Get IntegrationSettings for image_generation
        # Extract: provider, model, image_type, image_format, etc.
        # Get prompt templates from PromptRegistry
        # Return: {
        #   'images': [Image objects],
        #   'settings': {...},
        #   'image_prompt_template': str,
        #   'negative_prompt': str
        # }
    
    def build_prompt(self, data: Dict, account=None) -> Dict:
        """Format prompt using template"""
        # For each image in queue:
        # - Get content title (from image.content)
        # - Format prompt using image_prompt_template
        # - Return formatted prompt + image_type
        # Note: This is called once per image (AIEngine handles iteration)
    
    def parse_response(self, response: Dict, step_tracker=None) -> Dict:
        """Parse image URL from response"""
        # Response from AICore.generate_image() has:
        # - 'url': Image URL
        # - 'revised_prompt': (optional)
        # - 'cost': (optional)
        # Return: {'url': str, 'revised_prompt': str, 'cost': float}
    
    def save_output(self, parsed: Dict, original_data: Dict, account=None, ...) -> Dict:
        """Update Images record with URL"""
        # Get image from original_data
        # Update Images record:
        # - image_url = parsed['url']
        # - status = 'generated'
        # - updated_at = now()
        # Return: {'count': 1, 'images_generated': 1}

Key Implementation Details:

  1. Multiple AI Calls Handling:

    • AIEngine will call build_prompt()AI_CALLparse_response()SAVE for each image
    • Need to track which image is being processed
    • Use step_tracker to log progress per image
  2. Prompt Formatting:

    # Get template from PromptRegistry
    template = PromptRegistry.get_image_prompt_template(account)
    
    # Format with content title and prompt
    formatted = template.format(
        post_title=image.content.title or image.content.meta_title,
        image_prompt=image.prompt,
        image_type=settings['image_type']
    )
    
  3. Image Generation:

    # Use AICore.generate_image()
    result = ai_core.generate_image(
        prompt=formatted_prompt,
        provider=settings['provider'],
        model=settings['model'],
        size='1024x1024',  # Default or from settings
        negative_prompt=negative_prompt if provider == 'runware' else None,
        function_name='generate_images_from_prompts'
    )
    
  4. Progress Tracking:

    • Track total images: len(images)
    • Track completed: Increment after each SAVE
    • Update progress: (completed / total) * 100

3.2 Update AIEngine for Multiple AI Calls

File: backend/igny8_core/ai/engine.py

Changes Needed:

  • Detect if function needs multiple AI calls (check function name or metadata)
  • For generate_images_from_prompts:
    • Loop through images in PREP data
    • For each image:
      • Call build_prompt() with single image
      • Call AI_CALL phase (generate image)
      • Call parse_response()
      • Call SAVE phase
      • Update progress: (current_image / total_images) * 100
    • After all images: Call DONE phase

Alternative Approach (Simpler):

  • Process all images in save_output() method
  • Make AI calls directly in save_output() (not through AIEngine phases)
  • Update progress manually via progress_tracker.update()
  • This is simpler but less consistent with framework

Recommended Approach:

  • Use AIEngine's phase system
  • Add metadata flag: requires_multiple_ai_calls: True
  • AIEngine detects this and loops through items

3.3 Register Function

File: backend/igny8_core/ai/registry.py

def _load_generate_images_from_prompts():
    from igny8_core.ai.functions.generate_images_from_prompts import GenerateImagesFromPromptsFunction
    return GenerateImagesFromPromptsFunction

register_lazy_function('generate_images_from_prompts', _load_generate_images_from_prompts)

File: backend/igny8_core/ai/functions/__init__.py

from .generate_images_from_prompts import GenerateImagesFromPromptsFunction

__all__ = [
    ...
    'GenerateImagesFromPromptsFunction',
]

3.4 Add Model Configuration

File: backend/igny8_core/ai/settings.py

MODEL_CONFIG = {
    ...
    'generate_images_from_prompts': {
        'model': 'dall-e-3',  # Default, overridden by IntegrationSettings
        'max_tokens': None,  # Not used for images
        'temperature': None,  # Not used for images
        'response_format': None,  # Not used for images
    },
}

FUNCTION_TO_PROMPT_TYPE = {
    ...
    'generate_images_from_prompts': None,  # Uses image_prompt_template, not text prompt
}

3.5 Update Progress Messages

File: backend/igny8_core/ai/engine.py

def _get_prep_message(self, function_name: str, count: int, data: Any) -> str:
    ...
    elif function_name == 'generate_images_from_prompts':
        total_images = len(data.get('images', []))
        return f"Preparing to generate {total_images} image{'s' if total_images != 1 else ''}"

def _get_ai_call_message(self, function_name: str, count: int) -> str:
    ...
    elif function_name == 'generate_images_from_prompts':
        return f"Generating image {count} of {total} with AI"

def _get_parse_message_with_count(self, function_name: str, count: int) -> str:
    ...
    elif function_name == 'generate_images_from_prompts':
        return f"{count} image{'s' if count != 1 else ''} generated"

def _get_save_message(self, function_name: str, count: int) -> str:
    ...
    elif function_name == 'generate_images_from_prompts':
        return f"Saving {count} image{'s' if count != 1 else ''}"

Phase 2: API Endpoint

3.6 Add API Endpoint

File: backend/igny8_core/modules/writer/views.py

Add to ImagesViewSet:

@action(detail=False, methods=['post'], url_path='generate_images', url_name='generate_images')
def generate_images(self, request):
    """Generate images from prompts for image records"""
    from igny8_core.ai.tasks import run_ai_task
    
    account = getattr(request, 'account', None)
    ids = request.data.get('ids', [])
    
    if not ids:
        return Response({
            'error': 'No IDs provided',
            'type': 'ValidationError'
        }, status=status.HTTP_400_BAD_REQUEST)
    
    account_id = account.id if account else None
    
    # Queue Celery task
    try:
        if hasattr(run_ai_task, 'delay'):
            task = run_ai_task.delay(
                function_name='generate_images_from_prompts',
                payload={'ids': ids},
                account_id=account_id
            )
            return Response({
                'success': True,
                'task_id': str(task.id),
                'message': 'Image generation started'
            }, status=status.HTTP_200_OK)
        else:
            # Fallback to synchronous execution
            result = run_ai_task(
                function_name='generate_images_from_prompts',
                payload={'ids': ids},
                account_id=account_id
            )
            if result.get('success'):
                return Response({
                    'success': True,
                    'images_generated': result.get('count', 0),
                    'message': 'Images generated successfully'
                }, status=status.HTTP_200_OK)
            else:
                return Response({
                    'error': result.get('error', 'Image generation failed'),
                    'type': 'TaskExecutionError'
                }, status=status.HTTP_500_INTERNAL_SERVER_ERROR)
    except Exception as e:
        return Response({
            'error': str(e),
            'type': 'ExecutionError'
        }, status=status.HTTP_500_INTERNAL_SERVER_ERROR)

Phase 3: Frontend Integration

3.7 Add API Function

File: frontend/src/services/api.ts

export async function generateImages(imageIds: number[]): Promise<any> {
  return fetchAPI('/v1/writer/images/generate_images/', {
    method: 'POST',
    body: JSON.stringify({ ids: imageIds }),
  });
}

3.8 Add Generate Images Button

File: frontend/src/config/pages/images.config.tsx

Add to row actions or status column:

  • Add "Generate Images" button in status column
  • Only show if status is 'pending' and prompt exists
  • Button should trigger generation for all images for that content

File: frontend/src/pages/Writer/Images.tsx

Add handler:

const handleGenerateImages = useCallback(async (contentId: number) => {
  try {
    // Get all pending images for this content
    const contentImages = images.find(g => g.content_id === contentId);
    if (!contentImages) return;
    
    // Collect all image IDs with prompts
    const imageIds: number[] = [];
    if (contentImages.featured_image?.id && contentImages.featured_image.status === 'pending') {
      imageIds.push(contentImages.featured_image.id);
    }
    contentImages.in_article_images.forEach(img => {
      if (img.id && img.status === 'pending' && img.prompt) {
        imageIds.push(img.id);
      }
    });
    
    if (imageIds.length === 0) {
      toast.info('No pending images with prompts found');
      return;
    }
    
    const result = await generateImages(imageIds);
    if (result.success) {
      if (result.task_id) {
        // Open progress modal
        progressModal.openModal(
          result.task_id,
          'Generate Images',
          'ai-generate-images-from-prompts-01-desktop'
        );
      } else {
        toast.success(`Images generated: ${result.images_generated || 0} image${(result.images_generated || 0) === 1 ? '' : 's'} created`);
        loadImages();
      }
    } else {
      toast.error(result.error || 'Failed to generate images');
    }
  } catch (error: any) {
    toast.error(`Failed to generate images: ${error.message}`);
  }
}, [toast, progressModal, loadImages, images]);

3.9 Update Progress Modal

File: frontend/src/components/common/ProgressModal.tsx

Add support for image generation:

  • Update step labels for generate_images_from_prompts
  • Show progress per image
  • Display generated images in modal (optional, like WP plugin)

Step Labels:

if (funcName.includes('generate_images_from_prompts')) {
  return [
    { phase: 'INIT', label: 'Validating image prompts' },
    { phase: 'PREP', label: 'Preparing image generation queue' },
    { phase: 'AI_CALL', label: 'Generating images with AI' },
    { phase: 'PARSE', label: 'Processing image URLs' },
    { phase: 'SAVE', label: 'Saving image URLs' },
  ];
}

Success Message:

if (funcName.includes('generate_images_from_prompts')) {
  const imageCount = extractCount(/(\d+)\s+image/i, stepLogs || []);
  if (imageCount) {
    return `${imageCount} image${imageCount !== '1' ? 's' : ''} generated successfully`;
  }
  return 'Images generated successfully';
}

4. Technical Details

4.1 Image Generation API

AICore.generate_image() already exists and handles:

  • OpenAI DALL-E (dall-e-2, dall-e-3)
  • Runware API
  • Negative prompts (Runware only)
  • Cost tracking
  • Error handling

Usage:

result = ai_core.generate_image(
    prompt=formatted_prompt,
    provider='openai',  # or 'runware'
    model='dall-e-3',  # or 'runware:97@1'
    size='1024x1024',
    negative_prompt=negative_prompt,  # Only for Runware
    function_name='generate_images_from_prompts'
)

Response:

{
    'url': 'https://...',  # Image URL
    'revised_prompt': '...',  # OpenAI may revise prompt
    'cost': 0.04,  # Cost in USD
    'error': None  # Error message if failed
}

4.2 Settings Retrieval

From IntegrationSettings:

integration = IntegrationSettings.objects.get(
    account=account,
    integration_type='image_generation',
    is_active=True
)
config = integration.config

provider = config.get('provider') or config.get('service', 'openai')
if provider == 'runware':
    model = config.get('model') or config.get('runwareModel', 'runware:97@1')
else:
    model = config.get('model', 'dall-e-3')

image_type = config.get('image_type', 'realistic')
image_format = config.get('image_format', 'webp')

4.3 Prompt Templates

From PromptRegistry:

image_prompt_template = PromptRegistry.get_image_prompt_template(account)
negative_prompt = PromptRegistry.get_negative_prompt(account)

Formatting:

formatted = image_prompt_template.format(
    post_title=content.title or content.meta_title,
    image_prompt=image.prompt,
    image_type=image_type  # 'realistic', 'artistic', 'cartoon'
)

4.4 Error Handling

Per-Image Errors:

  • If one image fails, continue with others
  • Mark failed image: status='failed'
  • Log error in Images record or separate error field
  • Return success with partial count: {'success': True, 'images_generated': 3, 'images_failed': 1}

Validation Errors:

  • No prompts: Skip image, log warning
  • No settings: Return error, don't start generation
  • Invalid provider/model: Return error

5. Frontend Integration

5.1 Images Page Updates

File: frontend/src/pages/Writer/Images.tsx

Changes:

  1. Add "Generate Images" button in status column (or row actions)
  2. Button only enabled if:
    • Status is 'pending'
    • Prompt exists
    • Content has at least one pending image
  3. On click: Collect all pending image IDs for that content
  4. Call API: generateImages(imageIds)
  5. Open progress modal if async
  6. Reload images on completion

5.2 Progress Modal Updates

File: frontend/src/components/common/ProgressModal.tsx

Changes:

  1. Add step definitions for generate_images_from_prompts
  2. Update progress messages
  3. Show image count in messages
  4. Optional: Display generated images in modal (like WP plugin)

5.3 Table Actions Config

File: frontend/src/config/pages/table-actions.config.tsx

Add row action (optional):

'/writer/images': {
  rowActions: [
    {
      key: 'generate_images',
      label: 'Generate Images',
      icon: <BoltIcon className="w-5 h-5" />,
      variant: 'primary',
    },
  ],
}

6. Testing Strategy

6.1 Unit Tests

Test Function Methods:

  • validate(): Test with valid/invalid IDs, missing prompts, wrong status
  • prepare(): Test settings retrieval, prompt template loading
  • build_prompt(): Test prompt formatting
  • parse_response(): Test URL extraction
  • save_output(): Test Images record update

6.2 Integration Tests

Test Full Flow:

  1. Create Images records with prompts
  2. Call API endpoint
  3. Verify Celery task created
  4. Verify progress updates
  5. Verify Images records updated with URLs
  6. Verify status changed to 'generated'

6.3 Error Scenarios

Test:

  • Missing IntegrationSettings
  • Invalid provider/model
  • API errors (rate limits, invalid API key)
  • Partial failures (some images succeed, some fail)
  • Missing prompts
  • Invalid image IDs

7. Implementation Checklist

Backend

  • Create GenerateImagesFromPromptsFunction class
  • Implement validate() method
  • Implement prepare() method
  • Implement build_prompt() method
  • Implement parse_response() method
  • Implement save_output() method
  • Register function in registry.py
  • Add to __init__.py exports
  • Add model config in settings.py
  • Update AIEngine progress messages
  • Add API endpoint in ImagesViewSet
  • Test with OpenAI provider
  • Test with Runware provider
  • Test error handling

Frontend

  • Add generateImages() API function
  • Add "Generate Images" button to Images page
  • Add click handler
  • Integrate progress modal
  • Update progress modal step labels
  • Update success messages
  • Test UI flow
  • Test error handling

Documentation

  • Update AI_MASTER_ARCHITECTURE.md
  • Add function to AI_FUNCTIONS_AUDIT_REPORT.md
  • Document API endpoint
  • Document settings requirements

8. Key Considerations

8.1 Rate Limiting

Issue: Image generation APIs have rate limits Solution: Process images sequentially (one at a time) Implementation: AIEngine loops through images, waits for each to complete

8.2 Cost Tracking

Issue: Need to track costs per image Solution: AICore already tracks costs, store in AITaskLog Implementation: Cost is returned from generate_image(), log in step_tracker

8.3 Progress Updates

Issue: Need granular progress (per image) Solution: Update progress after each image: (completed / total) * 100 Implementation: Track in save_output(), update via progress_tracker.update()

8.4 Error Recovery

Issue: If one image fails, should continue with others Solution: Catch errors per image, mark as failed, continue Implementation: Try-catch in save_output() per image

8.5 Image Display

Issue: Should show generated images in progress modal? Solution: Optional enhancement, can add later Implementation: Store image URLs in step logs, display in modal


9. Alternative Approaches Considered

9.1 Process All in save_output()

Pros:

  • Simpler implementation
  • Direct control over loop

Cons:

  • Doesn't use AIEngine phases properly
  • Harder to track progress per image
  • Less consistent with framework

Decision: Use AIEngine phases with loop detection

9.2 Separate Function Per Image

Pros:

  • Each image is independent task
  • Better error isolation

Cons:

  • Too many Celery tasks
  • Harder to track overall progress
  • More complex frontend

Decision: Single function processes all images sequentially


10. Success Criteria

Function follows BaseAIFunction pattern Uses AIEngine orchestrator Integrates with progress modal Uses prompt templates from Thinker/Prompts Uses settings from IntegrationSettings Handles errors gracefully Tracks progress per image Updates Images records correctly Works with both OpenAI and Runware Frontend button triggers generation Progress modal shows correct steps Success message shows image count


11. Next Steps

  1. Start with Backend Function

    • Create GenerateImagesFromPromptsFunction
    • Implement all methods
    • Test with single image
  2. Add API Endpoint

    • Add to ImagesViewSet
    • Test endpoint
  3. Frontend Integration

    • Add button
    • Add handler
    • Test flow
  4. Progress Modal

    • Update step labels
    • Test progress updates
  5. Error Handling

    • Test error scenarios
    • Verify graceful failures
  6. Documentation

    • Update architecture docs
    • Add API docs

End of Plan