Files

IGNY8 VPS (Salman) 5638ea78df Add Image Generation from Prompts: Implement new functionality to generate images from prompts, including backend processing, API integration, and frontend handling with progress modal. Update settings and registry for new AI function.

2025-11-11 20:49:11 +00:00

25 KiB

Raw Blame History

Image Generation Implementation Plan

Complete Plan for Generating Images from Prompts

Date: 2025-01-XX
Scope: Implement image generation AI function following existing AI framework patterns

System Understanding
Architecture Overview
Implementation Plan
Technical Details
Frontend Integration
Testing Strategy

1. System Understanding

1.1 Current AI Framework Architecture

The system uses a unified AI framework with the following components:

Core Flow:

Frontend API Call
  ↓
views.py (@action endpoint)
  ↓
run_ai_task (ai/tasks.py) - Unified Celery task entrypoint
  ↓
AIEngine (ai/engine.py) - Orchestrator (6 phases: INIT, PREP, AI_CALL, PARSE, SAVE, DONE)
  ↓
BaseAIFunction implementation
  ↓
AICore (ai/ai_core.py) - Centralized AI request handler
  ↓
AI Provider (OpenAI/Runware)

Existing AI Functions:

AutoClusterFunction (auto_cluster.py) - Groups keywords into clusters
GenerateIdeasFunction (generate_ideas.py) - Generates content ideas from clusters
GenerateContentFunction (generate_content.py) - Generates article content from ideas
GenerateImagePromptsFunction (generate_image_prompts.py) - Extracts image prompts from content

Key Components:

BaseAIFunction - Abstract base class with methods: get_name(), validate(), prepare(), build_prompt(), parse_response(), save_output()
AIEngine - Manages lifecycle, progress tracking, cost tracking, error handling
PromptRegistry - Centralized prompt management with hierarchy (task → DB → default)
AICore - Handles API calls to OpenAI/Runware for both text and image generation
IntegrationSettings - Stores account-specific configurations (models, API keys, image settings)

1.2 Image Generation System (WordPress Plugin Reference)

Key Learnings from WP Plugin:

Queue-Based Processing:
- Images are processed sequentially in a queue
- Each image has its own progress bar (0-50% in 7s, 50-75% in 5s, 75-95% incrementally)
- Progress modal shows all images being processed with individual status
Image Types:
- Featured image (1 per content)
- In-article images (configurable: 1-5 per content)
- Desktop images (if enabled)
- Mobile images (if enabled)
Settings from IntegrationSettings:
- provider: 'openai' or 'runware'
- model: Model name (e.g., 'dall-e-3', 'runware:97@1')
- image_type: 'realistic', 'artistic', 'cartoon'
- max_in_article_images: 1-5
- image_format: 'webp', 'jpg', 'png'
- desktop_enabled: boolean
- mobile_enabled: boolean
Prompt Templates:
- image_prompt_template: Template for formatting prompts (uses {post_title}, {image_prompt}, {image_type})
- negative_prompt: Negative prompt for Runware (OpenAI doesn't support)
Progress Tracking:
- Real-time progress updates via Celery
- Individual image status tracking
- Success/failure per image

1.3 Current Image Generation Function

Existing: GenerateImagesFunction (generate_images.py)

Status: Partially implemented, uses old pattern
Issues:
- Still references Tasks instead of Content
- Doesn't follow the new unified framework pattern
- Uses legacy generate_images_core() wrapper
- Doesn't properly queue multiple images

What We Need:

New function: GenerateImagesFromPromptsFunction
Should work with Images model (which now has content relationship)
Should process images in queue (one at a time)
Should use progress modal similar to other AI functions
Should use prompt templates and negative prompts from Thinker/Prompts

2. Architecture Overview

2.1 New Function: `GenerateImagesFromPromptsFunction`

Purpose: Generate actual images from existing image prompts stored in Images model

Input:

ids: List of Image IDs (or Content IDs) to generate images for
Images must have prompt field populated (from GenerateImagePromptsFunction)

Output:

Updates Images records with:
- image_url: Generated image URL
- status: 'generated' (or 'failed' on error)

Flow:

INIT (0-10%): Validate image IDs, check prompts exist
PREP (10-25%): Load images, get settings, prepare queue
AI_CALL (25-70%): Generate images sequentially (one per AI_CALL phase)
PARSE (70-85%): Parse image URLs from responses
SAVE (85-98%): Update Images records with URLs
DONE (98-100%): Complete

2.2 Key Differences from Other Functions

Unlike text generation functions:

Multiple AI calls: One AI call per image (not one call for all)
Sequential processing: Images must be generated one at a time (rate limits)
Progress per image: Need to track progress for each individual image
Different API: Uses AICore.generate_image() instead of AICore.run_ai_request()

Similarities:

Uses same BaseAIFunction pattern
Uses same AIEngine orchestrator
Uses same progress tracking system
Uses same error handling

3. Implementation Plan

Phase 1: Backend AI Function

3.1 Create `GenerateImagesFromPromptsFunction`

File: backend/igny8_core/ai/functions/generate_images_from_prompts.py

Class Structure:

class GenerateImagesFromPromptsFunction(BaseAIFunction):
    def get_name(self) -> str:
        return 'generate_images_from_prompts'
    
    def get_metadata(self) -> Dict:
        return {
            'display_name': 'Generate Images from Prompts',
            'description': 'Generate actual images from image prompts',
            'phases': {
                'INIT': 'Validating image prompts...',
                'PREP': 'Preparing image generation queue...',
                'AI_CALL': 'Generating images with AI...',
                'PARSE': 'Processing image URLs...',
                'SAVE': 'Saving image URLs...',
                'DONE': 'Images generated!'
            }
        }
    
    def validate(self, payload: dict, account=None) -> Dict:
        """Validate image IDs and check prompts exist"""
        # Check for 'ids' array
        # Check images exist and have prompts
        # Check images have status='pending'
        # Check account matches
    
    def prepare(self, payload: dict, account=None) -> Dict:
        """Load images and settings"""
        # Load Images records by IDs
        # Get IntegrationSettings for image_generation
        # Extract: provider, model, image_type, image_format, etc.
        # Get prompt templates from PromptRegistry
        # Return: {
        #   'images': [Image objects],
        #   'settings': {...},
        #   'image_prompt_template': str,
        #   'negative_prompt': str
        # }
    
    def build_prompt(self, data: Dict, account=None) -> Dict:
        """Format prompt using template"""
        # For each image in queue:
        # - Get content title (from image.content)
        # - Format prompt using image_prompt_template
        # - Return formatted prompt + image_type
        # Note: This is called once per image (AIEngine handles iteration)
    
    def parse_response(self, response: Dict, step_tracker=None) -> Dict:
        """Parse image URL from response"""
        # Response from AICore.generate_image() has:
        # - 'url': Image URL
        # - 'revised_prompt': (optional)
        # - 'cost': (optional)
        # Return: {'url': str, 'revised_prompt': str, 'cost': float}
    
    def save_output(self, parsed: Dict, original_data: Dict, account=None, ...) -> Dict:
        """Update Images record with URL"""
        # Get image from original_data
        # Update Images record:
        # - image_url = parsed['url']
        # - status = 'generated'
        # - updated_at = now()
        # Return: {'count': 1, 'images_generated': 1}

Key Implementation Details:

Multiple AI Calls Handling:
- AIEngine will call build_prompt() → AI_CALL → parse_response() → SAVE for each image
- Need to track which image is being processed
- Use step_tracker to log progress per image

Prompt Formatting:

# Get template from PromptRegistry
template = PromptRegistry.get_image_prompt_template(account)

# Format with content title and prompt
formatted = template.format(
    post_title=image.content.title or image.content.meta_title,
    image_prompt=image.prompt,
    image_type=settings['image_type']
)

Image Generation:

# Use AICore.generate_image()
result = ai_core.generate_image(
    prompt=formatted_prompt,
    provider=settings['provider'],
    model=settings['model'],
    size='1024x1024',  # Default or from settings
    negative_prompt=negative_prompt if provider == 'runware' else None,
    function_name='generate_images_from_prompts'
)

Progress Tracking:
- Track total images: len(images)
- Track completed: Increment after each SAVE
- Update progress: (completed / total) * 100

3.2 Update AIEngine for Multiple AI Calls

File: backend/igny8_core/ai/engine.py

Changes Needed:

Detect if function needs multiple AI calls (check function name or metadata)
For generate_images_from_prompts:
- Loop through images in PREP data
- For each image:
  - Call build_prompt() with single image
  - Call AI_CALL phase (generate image)
  - Call parse_response()
  - Call SAVE phase
  - Update progress: (current_image / total_images) * 100
- After all images: Call DONE phase

Alternative Approach (Simpler):

Process all images in save_output() method
Make AI calls directly in save_output() (not through AIEngine phases)
Update progress manually via progress_tracker.update()
This is simpler but less consistent with framework

Recommended Approach:

Use AIEngine's phase system
Add metadata flag: requires_multiple_ai_calls: True
AIEngine detects this and loops through items

3.3 Register Function

File: backend/igny8_core/ai/registry.py

def _load_generate_images_from_prompts():
    from igny8_core.ai.functions.generate_images_from_prompts import GenerateImagesFromPromptsFunction
    return GenerateImagesFromPromptsFunction

register_lazy_function('generate_images_from_prompts', _load_generate_images_from_prompts)

File: backend/igny8_core/ai/functions/__init__.py

from .generate_images_from_prompts import GenerateImagesFromPromptsFunction

__all__ = [
    ...
    'GenerateImagesFromPromptsFunction',
]

3.4 Add Model Configuration

File: backend/igny8_core/ai/settings.py

MODEL_CONFIG = {
    ...
    'generate_images_from_prompts': {
        'model': 'dall-e-3',  # Default, overridden by IntegrationSettings
        'max_tokens': None,  # Not used for images
        'temperature': None,  # Not used for images
        'response_format': None,  # Not used for images
    },
}

FUNCTION_TO_PROMPT_TYPE = {
    ...
    'generate_images_from_prompts': None,  # Uses image_prompt_template, not text prompt
}

3.5 Update Progress Messages

File: backend/igny8_core/ai/engine.py

def _get_prep_message(self, function_name: str, count: int, data: Any) -> str:
    ...
    elif function_name == 'generate_images_from_prompts':
        total_images = len(data.get('images', []))
        return f"Preparing to generate {total_images} image{'s' if total_images != 1 else ''}"

def _get_ai_call_message(self, function_name: str, count: int) -> str:
    ...
    elif function_name == 'generate_images_from_prompts':
        return f"Generating image {count} of {total} with AI"

def _get_parse_message_with_count(self, function_name: str, count: int) -> str:
    ...
    elif function_name == 'generate_images_from_prompts':
        return f"{count} image{'s' if count != 1 else ''} generated"

def _get_save_message(self, function_name: str, count: int) -> str:
    ...
    elif function_name == 'generate_images_from_prompts':
        return f"Saving {count} image{'s' if count != 1 else ''}"

Phase 2: API Endpoint

3.6 Add API Endpoint

File: backend/igny8_core/modules/writer/views.py

Add to ImagesViewSet:

@action(detail=False, methods=['post'], url_path='generate_images', url_name='generate_images')
def generate_images(self, request):
    """Generate images from prompts for image records"""
    from igny8_core.ai.tasks import run_ai_task
    
    account = getattr(request, 'account', None)
    ids = request.data.get('ids', [])
    
    if not ids:
        return Response({
            'error': 'No IDs provided',
            'type': 'ValidationError'
        }, status=status.HTTP_400_BAD_REQUEST)
    
    account_id = account.id if account else None
    
    # Queue Celery task
    try:
        if hasattr(run_ai_task, 'delay'):
            task = run_ai_task.delay(
                function_name='generate_images_from_prompts',
                payload={'ids': ids},
                account_id=account_id
            )
            return Response({
                'success': True,
                'task_id': str(task.id),
                'message': 'Image generation started'
            }, status=status.HTTP_200_OK)
        else:
            # Fallback to synchronous execution
            result = run_ai_task(
                function_name='generate_images_from_prompts',
                payload={'ids': ids},
                account_id=account_id
            )
            if result.get('success'):
                return Response({
                    'success': True,
                    'images_generated': result.get('count', 0),
                    'message': 'Images generated successfully'
                }, status=status.HTTP_200_OK)
            else:
                return Response({
                    'error': result.get('error', 'Image generation failed'),
                    'type': 'TaskExecutionError'
                }, status=status.HTTP_500_INTERNAL_SERVER_ERROR)
    except Exception as e:
        return Response({
            'error': str(e),
            'type': 'ExecutionError'
        }, status=status.HTTP_500_INTERNAL_SERVER_ERROR)

Phase 3: Frontend Integration

3.7 Add API Function

File: frontend/src/services/api.ts

export async function generateImages(imageIds: number[]): Promise<any> {
  return fetchAPI('/v1/writer/images/generate_images/', {
    method: 'POST',
    body: JSON.stringify({ ids: imageIds }),
  });
}

3.8 Add Generate Images Button

File: frontend/src/config/pages/images.config.tsx

Add to row actions or status column:

Add "Generate Images" button in status column
Only show if status is 'pending' and prompt exists
Button should trigger generation for all images for that content

File: frontend/src/pages/Writer/Images.tsx

Add handler:

const handleGenerateImages = useCallback(async (contentId: number) => {
  try {
    // Get all pending images for this content
    const contentImages = images.find(g => g.content_id === contentId);
    if (!contentImages) return;
    
    // Collect all image IDs with prompts
    const imageIds: number[] = [];
    if (contentImages.featured_image?.id && contentImages.featured_image.status === 'pending') {
      imageIds.push(contentImages.featured_image.id);
    }
    contentImages.in_article_images.forEach(img => {
      if (img.id && img.status === 'pending' && img.prompt) {
        imageIds.push(img.id);
      }
    });
    
    if (imageIds.length === 0) {
      toast.info('No pending images with prompts found');
      return;
    }
    
    const result = await generateImages(imageIds);
    if (result.success) {
      if (result.task_id) {
        // Open progress modal
        progressModal.openModal(
          result.task_id,
          'Generate Images',
          'ai-generate-images-from-prompts-01-desktop'
        );
      } else {
        toast.success(`Images generated: ${result.images_generated || 0} image${(result.images_generated || 0) === 1 ? '' : 's'} created`);
        loadImages();
      }
    } else {
      toast.error(result.error || 'Failed to generate images');
    }
  } catch (error: any) {
    toast.error(`Failed to generate images: ${error.message}`);
  }
}, [toast, progressModal, loadImages, images]);

File: frontend/src/components/common/ProgressModal.tsx

Add support for image generation:

Update step labels for generate_images_from_prompts
Show progress per image
Display generated images in modal (optional, like WP plugin)

Step Labels:

if (funcName.includes('generate_images_from_prompts')) {
  return [
    { phase: 'INIT', label: 'Validating image prompts' },
    { phase: 'PREP', label: 'Preparing image generation queue' },
    { phase: 'AI_CALL', label: 'Generating images with AI' },
    { phase: 'PARSE', label: 'Processing image URLs' },
    { phase: 'SAVE', label: 'Saving image URLs' },
  ];
}

Success Message:

if (funcName.includes('generate_images_from_prompts')) {
  const imageCount = extractCount(/(\d+)\s+image/i, stepLogs || []);
  if (imageCount) {
    return `${imageCount} image${imageCount !== '1' ? 's' : ''} generated successfully`;
  }
  return 'Images generated successfully';
}

4. Technical Details

4.1 Image Generation API

AICore.generate_image() already exists and handles:

OpenAI DALL-E (dall-e-2, dall-e-3)
Runware API
Negative prompts (Runware only)
Cost tracking
Error handling

Usage:

result = ai_core.generate_image(
    prompt=formatted_prompt,
    provider='openai',  # or 'runware'
    model='dall-e-3',  # or 'runware:97@1'
    size='1024x1024',
    negative_prompt=negative_prompt,  # Only for Runware
    function_name='generate_images_from_prompts'
)

Response:

{
    'url': 'https://...',  # Image URL
    'revised_prompt': '...',  # OpenAI may revise prompt
    'cost': 0.04,  # Cost in USD
    'error': None  # Error message if failed
}

4.2 Settings Retrieval

From IntegrationSettings:

integration = IntegrationSettings.objects.get(
    account=account,
    integration_type='image_generation',
    is_active=True
)
config = integration.config

provider = config.get('provider') or config.get('service', 'openai')
if provider == 'runware':
    model = config.get('model') or config.get('runwareModel', 'runware:97@1')
else:
    model = config.get('model', 'dall-e-3')

image_type = config.get('image_type', 'realistic')
image_format = config.get('image_format', 'webp')

4.3 Prompt Templates

From PromptRegistry:

image_prompt_template = PromptRegistry.get_image_prompt_template(account)
negative_prompt = PromptRegistry.get_negative_prompt(account)

Formatting:

formatted = image_prompt_template.format(
    post_title=content.title or content.meta_title,
    image_prompt=image.prompt,
    image_type=image_type  # 'realistic', 'artistic', 'cartoon'
)

4.4 Error Handling

Per-Image Errors:

If one image fails, continue with others
Mark failed image: status='failed'
Log error in Images record or separate error field
Return success with partial count: {'success': True, 'images_generated': 3, 'images_failed': 1}

Validation Errors:

No prompts: Skip image, log warning
No settings: Return error, don't start generation
Invalid provider/model: Return error

5. Frontend Integration

5.1 Images Page Updates

File: frontend/src/pages/Writer/Images.tsx

Changes:

Add "Generate Images" button in status column (or row actions)
Button only enabled if:
- Status is 'pending'
- Prompt exists
- Content has at least one pending image
On click: Collect all pending image IDs for that content
Call API: generateImages(imageIds)
Open progress modal if async
Reload images on completion

File: frontend/src/components/common/ProgressModal.tsx

Changes:

Add step definitions for generate_images_from_prompts
Update progress messages
Show image count in messages
Optional: Display generated images in modal (like WP plugin)

5.3 Table Actions Config

File: frontend/src/config/pages/table-actions.config.tsx

Add row action (optional):

'/writer/images': {
  rowActions: [
    {
      key: 'generate_images',
      label: 'Generate Images',
      icon: <BoltIcon className="w-5 h-5" />,
      variant: 'primary',
    },
  ],
}

6. Testing Strategy

6.1 Unit Tests

Test Function Methods:

validate(): Test with valid/invalid IDs, missing prompts, wrong status
prepare(): Test settings retrieval, prompt template loading
build_prompt(): Test prompt formatting
parse_response(): Test URL extraction
save_output(): Test Images record update

6.2 Integration Tests

Test Full Flow:

Create Images records with prompts
Call API endpoint
Verify Celery task created
Verify progress updates
Verify Images records updated with URLs
Verify status changed to 'generated'

6.3 Error Scenarios

Test:

Missing IntegrationSettings
Invalid provider/model
API errors (rate limits, invalid API key)
Partial failures (some images succeed, some fail)
Missing prompts
Invalid image IDs

7. Implementation Checklist

Backend

Create GenerateImagesFromPromptsFunction class
Implement validate() method
Implement prepare() method
Implement build_prompt() method
Implement parse_response() method
Implement save_output() method
Register function in registry.py
Add to __init__.py exports
Add model config in settings.py
Update AIEngine progress messages
Add API endpoint in ImagesViewSet
Test with OpenAI provider
Test with Runware provider
Test error handling

Frontend

Add generateImages() API function
Add "Generate Images" button to Images page
Add click handler
Integrate progress modal
Update progress modal step labels
Update success messages
Test UI flow
Test error handling

Documentation

Update AI_MASTER_ARCHITECTURE.md
Add function to AI_FUNCTIONS_AUDIT_REPORT.md
Document API endpoint
Document settings requirements

8. Key Considerations

8.1 Rate Limiting

Issue: Image generation APIs have rate limits Solution: Process images sequentially (one at a time) Implementation: AIEngine loops through images, waits for each to complete

8.2 Cost Tracking

Issue: Need to track costs per image Solution: AICore already tracks costs, store in AITaskLog Implementation: Cost is returned from generate_image(), log in step_tracker

8.3 Progress Updates

Issue: Need granular progress (per image) Solution: Update progress after each image: (completed / total) * 100 Implementation: Track in save_output(), update via progress_tracker.update()

8.4 Error Recovery

Issue: If one image fails, should continue with others Solution: Catch errors per image, mark as failed, continue Implementation: Try-catch in save_output() per image

8.5 Image Display

Issue: Should show generated images in progress modal? Solution: Optional enhancement, can add later Implementation: Store image URLs in step logs, display in modal

9. Alternative Approaches Considered

9.1 Process All in save_output()

Pros:

Simpler implementation
Direct control over loop

Cons:

Doesn't use AIEngine phases properly
Harder to track progress per image
Less consistent with framework

Decision: Use AIEngine phases with loop detection

9.2 Separate Function Per Image

Pros:

Each image is independent task
Better error isolation

Cons:

Too many Celery tasks
Harder to track overall progress
More complex frontend

Decision: Single function processes all images sequentially

10. Success Criteria

✅ Function follows BaseAIFunction pattern ✅ Uses AIEngine orchestrator ✅ Integrates with progress modal ✅ Uses prompt templates from Thinker/Prompts ✅ Uses settings from IntegrationSettings ✅ Handles errors gracefully ✅ Tracks progress per image ✅ Updates Images records correctly ✅ Works with both OpenAI and Runware ✅ Frontend button triggers generation ✅ Progress modal shows correct steps ✅ Success message shows image count

11. Next Steps

Start with Backend Function
- Create GenerateImagesFromPromptsFunction
- Implement all methods
- Test with single image
Add API Endpoint
- Add to ImagesViewSet
- Test endpoint
Frontend Integration
- Add button
- Add handler
- Test flow
Progress Modal
- Update step labels
- Test progress updates
Error Handling
- Test error scenarios
- Verify graceful failures
Documentation
- Update architecture docs
- Add API docs

End of Plan

25 KiB Raw Blame History

Image Generation Implementation Plan

Complete Plan for Generating Images from Prompts

Table of Contents

1. System Understanding

1.1 Current AI Framework Architecture

1.2 Image Generation System (WordPress Plugin Reference)

1.3 Current Image Generation Function

2. Architecture Overview

2.1 New Function: GenerateImagesFromPromptsFunction

2.2 Key Differences from Other Functions

3. Implementation Plan

Phase 1: Backend AI Function

3.1 Create GenerateImagesFromPromptsFunction

3.2 Update AIEngine for Multiple AI Calls

3.3 Register Function

3.4 Add Model Configuration

3.5 Update Progress Messages

Phase 2: API Endpoint

3.6 Add API Endpoint

Phase 3: Frontend Integration

3.7 Add API Function

3.8 Add Generate Images Button

3.9 Update Progress Modal

4. Technical Details

4.1 Image Generation API

4.2 Settings Retrieval

4.3 Prompt Templates

4.4 Error Handling

5. Frontend Integration

5.1 Images Page Updates

5.2 Progress Modal Updates

5.3 Table Actions Config

6. Testing Strategy

6.1 Unit Tests

6.2 Integration Tests

6.3 Error Scenarios

7. Implementation Checklist

Backend

Frontend

Documentation

8. Key Considerations

8.1 Rate Limiting

8.2 Cost Tracking

8.3 Progress Updates

8.4 Error Recovery

8.5 Image Display

9. Alternative Approaches Considered

9.1 Process All in save_output()

9.2 Separate Function Per Image

10. Success Criteria

11. Next Steps

25 KiB

Raw Blame History

2.1 New Function: `GenerateImagesFromPromptsFunction`

3.1 Create `GenerateImagesFromPromptsFunction`