25 KiB
Image Generation Implementation Plan
Complete Plan for Generating Images from Prompts
Date: 2025-01-XX
Scope: Implement image generation AI function following existing AI framework patterns
Table of Contents
- System Understanding
- Architecture Overview
- Implementation Plan
- Technical Details
- Frontend Integration
- Testing Strategy
1. System Understanding
1.1 Current AI Framework Architecture
The system uses a unified AI framework with the following components:
Core Flow:
Frontend API Call
↓
views.py (@action endpoint)
↓
run_ai_task (ai/tasks.py) - Unified Celery task entrypoint
↓
AIEngine (ai/engine.py) - Orchestrator (6 phases: INIT, PREP, AI_CALL, PARSE, SAVE, DONE)
↓
BaseAIFunction implementation
↓
AICore (ai/ai_core.py) - Centralized AI request handler
↓
AI Provider (OpenAI/Runware)
Existing AI Functions:
- AutoClusterFunction (
auto_cluster.py) - Groups keywords into clusters - GenerateIdeasFunction (
generate_ideas.py) - Generates content ideas from clusters - GenerateContentFunction (
generate_content.py) - Generates article content from ideas - GenerateImagePromptsFunction (
generate_image_prompts.py) - Extracts image prompts from content
Key Components:
- BaseAIFunction - Abstract base class with methods:
get_name(),validate(),prepare(),build_prompt(),parse_response(),save_output() - AIEngine - Manages lifecycle, progress tracking, cost tracking, error handling
- PromptRegistry - Centralized prompt management with hierarchy (task → DB → default)
- AICore - Handles API calls to OpenAI/Runware for both text and image generation
- IntegrationSettings - Stores account-specific configurations (models, API keys, image settings)
1.2 Image Generation System (WordPress Plugin Reference)
Key Learnings from WP Plugin:
-
Queue-Based Processing:
- Images are processed sequentially in a queue
- Each image has its own progress bar (0-50% in 7s, 50-75% in 5s, 75-95% incrementally)
- Progress modal shows all images being processed with individual status
-
Image Types:
- Featured image (1 per content)
- In-article images (configurable: 1-5 per content)
- Desktop images (if enabled)
- Mobile images (if enabled)
-
Settings from IntegrationSettings:
provider: 'openai' or 'runware'model: Model name (e.g., 'dall-e-3', 'runware:97@1')image_type: 'realistic', 'artistic', 'cartoon'max_in_article_images: 1-5image_format: 'webp', 'jpg', 'png'desktop_enabled: booleanmobile_enabled: boolean
-
Prompt Templates:
image_prompt_template: Template for formatting prompts (uses {post_title}, {image_prompt}, {image_type})negative_prompt: Negative prompt for Runware (OpenAI doesn't support)
-
Progress Tracking:
- Real-time progress updates via Celery
- Individual image status tracking
- Success/failure per image
1.3 Current Image Generation Function
Existing: GenerateImagesFunction (generate_images.py)
- Status: Partially implemented, uses old pattern
- Issues:
- Still references
Tasksinstead ofContent - Doesn't follow the new unified framework pattern
- Uses legacy
generate_images_core()wrapper - Doesn't properly queue multiple images
- Still references
What We Need:
- New function:
GenerateImagesFromPromptsFunction - Should work with
Imagesmodel (which now hascontentrelationship) - Should process images in queue (one at a time)
- Should use progress modal similar to other AI functions
- Should use prompt templates and negative prompts from Thinker/Prompts
2. Architecture Overview
2.1 New Function: GenerateImagesFromPromptsFunction
Purpose: Generate actual images from existing image prompts stored in Images model
Input:
ids: List of Image IDs (or Content IDs) to generate images for- Images must have
promptfield populated (fromGenerateImagePromptsFunction)
Output:
- Updates
Imagesrecords with:image_url: Generated image URLstatus: 'generated' (or 'failed' on error)
Flow:
- INIT (0-10%): Validate image IDs, check prompts exist
- PREP (10-25%): Load images, get settings, prepare queue
- AI_CALL (25-70%): Generate images sequentially (one per AI_CALL phase)
- PARSE (70-85%): Parse image URLs from responses
- SAVE (85-98%): Update Images records with URLs
- DONE (98-100%): Complete
2.2 Key Differences from Other Functions
Unlike text generation functions:
- Multiple AI calls: One AI call per image (not one call for all)
- Sequential processing: Images must be generated one at a time (rate limits)
- Progress per image: Need to track progress for each individual image
- Different API: Uses
AICore.generate_image()instead ofAICore.run_ai_request()
Similarities:
- Uses same
BaseAIFunctionpattern - Uses same
AIEngineorchestrator - Uses same progress tracking system
- Uses same error handling
3. Implementation Plan
Phase 1: Backend AI Function
3.1 Create GenerateImagesFromPromptsFunction
File: backend/igny8_core/ai/functions/generate_images_from_prompts.py
Class Structure:
class GenerateImagesFromPromptsFunction(BaseAIFunction):
def get_name(self) -> str:
return 'generate_images_from_prompts'
def get_metadata(self) -> Dict:
return {
'display_name': 'Generate Images from Prompts',
'description': 'Generate actual images from image prompts',
'phases': {
'INIT': 'Validating image prompts...',
'PREP': 'Preparing image generation queue...',
'AI_CALL': 'Generating images with AI...',
'PARSE': 'Processing image URLs...',
'SAVE': 'Saving image URLs...',
'DONE': 'Images generated!'
}
}
def validate(self, payload: dict, account=None) -> Dict:
"""Validate image IDs and check prompts exist"""
# Check for 'ids' array
# Check images exist and have prompts
# Check images have status='pending'
# Check account matches
def prepare(self, payload: dict, account=None) -> Dict:
"""Load images and settings"""
# Load Images records by IDs
# Get IntegrationSettings for image_generation
# Extract: provider, model, image_type, image_format, etc.
# Get prompt templates from PromptRegistry
# Return: {
# 'images': [Image objects],
# 'settings': {...},
# 'image_prompt_template': str,
# 'negative_prompt': str
# }
def build_prompt(self, data: Dict, account=None) -> Dict:
"""Format prompt using template"""
# For each image in queue:
# - Get content title (from image.content)
# - Format prompt using image_prompt_template
# - Return formatted prompt + image_type
# Note: This is called once per image (AIEngine handles iteration)
def parse_response(self, response: Dict, step_tracker=None) -> Dict:
"""Parse image URL from response"""
# Response from AICore.generate_image() has:
# - 'url': Image URL
# - 'revised_prompt': (optional)
# - 'cost': (optional)
# Return: {'url': str, 'revised_prompt': str, 'cost': float}
def save_output(self, parsed: Dict, original_data: Dict, account=None, ...) -> Dict:
"""Update Images record with URL"""
# Get image from original_data
# Update Images record:
# - image_url = parsed['url']
# - status = 'generated'
# - updated_at = now()
# Return: {'count': 1, 'images_generated': 1}
Key Implementation Details:
-
Multiple AI Calls Handling:
AIEnginewill callbuild_prompt()→AI_CALL→parse_response()→SAVEfor each image- Need to track which image is being processed
- Use
step_trackerto log progress per image
-
Prompt Formatting:
# Get template from PromptRegistry template = PromptRegistry.get_image_prompt_template(account) # Format with content title and prompt formatted = template.format( post_title=image.content.title or image.content.meta_title, image_prompt=image.prompt, image_type=settings['image_type'] ) -
Image Generation:
# Use AICore.generate_image() result = ai_core.generate_image( prompt=formatted_prompt, provider=settings['provider'], model=settings['model'], size='1024x1024', # Default or from settings negative_prompt=negative_prompt if provider == 'runware' else None, function_name='generate_images_from_prompts' ) -
Progress Tracking:
- Track total images:
len(images) - Track completed: Increment after each SAVE
- Update progress:
(completed / total) * 100
- Track total images:
3.2 Update AIEngine for Multiple AI Calls
File: backend/igny8_core/ai/engine.py
Changes Needed:
- Detect if function needs multiple AI calls (check function name or metadata)
- For
generate_images_from_prompts:- Loop through images in PREP data
- For each image:
- Call
build_prompt()with single image - Call
AI_CALLphase (generate image) - Call
parse_response() - Call
SAVEphase - Update progress:
(current_image / total_images) * 100
- Call
- After all images: Call DONE phase
Alternative Approach (Simpler):
- Process all images in
save_output()method - Make AI calls directly in
save_output()(not through AIEngine phases) - Update progress manually via
progress_tracker.update() - This is simpler but less consistent with framework
Recommended Approach:
- Use AIEngine's phase system
- Add metadata flag:
requires_multiple_ai_calls: True - AIEngine detects this and loops through items
3.3 Register Function
File: backend/igny8_core/ai/registry.py
def _load_generate_images_from_prompts():
from igny8_core.ai.functions.generate_images_from_prompts import GenerateImagesFromPromptsFunction
return GenerateImagesFromPromptsFunction
register_lazy_function('generate_images_from_prompts', _load_generate_images_from_prompts)
File: backend/igny8_core/ai/functions/__init__.py
from .generate_images_from_prompts import GenerateImagesFromPromptsFunction
__all__ = [
...
'GenerateImagesFromPromptsFunction',
]
3.4 Add Model Configuration
File: backend/igny8_core/ai/settings.py
MODEL_CONFIG = {
...
'generate_images_from_prompts': {
'model': 'dall-e-3', # Default, overridden by IntegrationSettings
'max_tokens': None, # Not used for images
'temperature': None, # Not used for images
'response_format': None, # Not used for images
},
}
FUNCTION_TO_PROMPT_TYPE = {
...
'generate_images_from_prompts': None, # Uses image_prompt_template, not text prompt
}
3.5 Update Progress Messages
File: backend/igny8_core/ai/engine.py
def _get_prep_message(self, function_name: str, count: int, data: Any) -> str:
...
elif function_name == 'generate_images_from_prompts':
total_images = len(data.get('images', []))
return f"Preparing to generate {total_images} image{'s' if total_images != 1 else ''}"
def _get_ai_call_message(self, function_name: str, count: int) -> str:
...
elif function_name == 'generate_images_from_prompts':
return f"Generating image {count} of {total} with AI"
def _get_parse_message_with_count(self, function_name: str, count: int) -> str:
...
elif function_name == 'generate_images_from_prompts':
return f"{count} image{'s' if count != 1 else ''} generated"
def _get_save_message(self, function_name: str, count: int) -> str:
...
elif function_name == 'generate_images_from_prompts':
return f"Saving {count} image{'s' if count != 1 else ''}"
Phase 2: API Endpoint
3.6 Add API Endpoint
File: backend/igny8_core/modules/writer/views.py
Add to ImagesViewSet:
@action(detail=False, methods=['post'], url_path='generate_images', url_name='generate_images')
def generate_images(self, request):
"""Generate images from prompts for image records"""
from igny8_core.ai.tasks import run_ai_task
account = getattr(request, 'account', None)
ids = request.data.get('ids', [])
if not ids:
return Response({
'error': 'No IDs provided',
'type': 'ValidationError'
}, status=status.HTTP_400_BAD_REQUEST)
account_id = account.id if account else None
# Queue Celery task
try:
if hasattr(run_ai_task, 'delay'):
task = run_ai_task.delay(
function_name='generate_images_from_prompts',
payload={'ids': ids},
account_id=account_id
)
return Response({
'success': True,
'task_id': str(task.id),
'message': 'Image generation started'
}, status=status.HTTP_200_OK)
else:
# Fallback to synchronous execution
result = run_ai_task(
function_name='generate_images_from_prompts',
payload={'ids': ids},
account_id=account_id
)
if result.get('success'):
return Response({
'success': True,
'images_generated': result.get('count', 0),
'message': 'Images generated successfully'
}, status=status.HTTP_200_OK)
else:
return Response({
'error': result.get('error', 'Image generation failed'),
'type': 'TaskExecutionError'
}, status=status.HTTP_500_INTERNAL_SERVER_ERROR)
except Exception as e:
return Response({
'error': str(e),
'type': 'ExecutionError'
}, status=status.HTTP_500_INTERNAL_SERVER_ERROR)
Phase 3: Frontend Integration
3.7 Add API Function
File: frontend/src/services/api.ts
export async function generateImages(imageIds: number[]): Promise<any> {
return fetchAPI('/v1/writer/images/generate_images/', {
method: 'POST',
body: JSON.stringify({ ids: imageIds }),
});
}
3.8 Add Generate Images Button
File: frontend/src/config/pages/images.config.tsx
Add to row actions or status column:
- Add "Generate Images" button in status column
- Only show if status is 'pending' and prompt exists
- Button should trigger generation for all images for that content
File: frontend/src/pages/Writer/Images.tsx
Add handler:
const handleGenerateImages = useCallback(async (contentId: number) => {
try {
// Get all pending images for this content
const contentImages = images.find(g => g.content_id === contentId);
if (!contentImages) return;
// Collect all image IDs with prompts
const imageIds: number[] = [];
if (contentImages.featured_image?.id && contentImages.featured_image.status === 'pending') {
imageIds.push(contentImages.featured_image.id);
}
contentImages.in_article_images.forEach(img => {
if (img.id && img.status === 'pending' && img.prompt) {
imageIds.push(img.id);
}
});
if (imageIds.length === 0) {
toast.info('No pending images with prompts found');
return;
}
const result = await generateImages(imageIds);
if (result.success) {
if (result.task_id) {
// Open progress modal
progressModal.openModal(
result.task_id,
'Generate Images',
'ai-generate-images-from-prompts-01-desktop'
);
} else {
toast.success(`Images generated: ${result.images_generated || 0} image${(result.images_generated || 0) === 1 ? '' : 's'} created`);
loadImages();
}
} else {
toast.error(result.error || 'Failed to generate images');
}
} catch (error: any) {
toast.error(`Failed to generate images: ${error.message}`);
}
}, [toast, progressModal, loadImages, images]);
3.9 Update Progress Modal
File: frontend/src/components/common/ProgressModal.tsx
Add support for image generation:
- Update step labels for
generate_images_from_prompts - Show progress per image
- Display generated images in modal (optional, like WP plugin)
Step Labels:
if (funcName.includes('generate_images_from_prompts')) {
return [
{ phase: 'INIT', label: 'Validating image prompts' },
{ phase: 'PREP', label: 'Preparing image generation queue' },
{ phase: 'AI_CALL', label: 'Generating images with AI' },
{ phase: 'PARSE', label: 'Processing image URLs' },
{ phase: 'SAVE', label: 'Saving image URLs' },
];
}
Success Message:
if (funcName.includes('generate_images_from_prompts')) {
const imageCount = extractCount(/(\d+)\s+image/i, stepLogs || []);
if (imageCount) {
return `${imageCount} image${imageCount !== '1' ? 's' : ''} generated successfully`;
}
return 'Images generated successfully';
}
4. Technical Details
4.1 Image Generation API
AICore.generate_image() already exists and handles:
- OpenAI DALL-E (dall-e-2, dall-e-3)
- Runware API
- Negative prompts (Runware only)
- Cost tracking
- Error handling
Usage:
result = ai_core.generate_image(
prompt=formatted_prompt,
provider='openai', # or 'runware'
model='dall-e-3', # or 'runware:97@1'
size='1024x1024',
negative_prompt=negative_prompt, # Only for Runware
function_name='generate_images_from_prompts'
)
Response:
{
'url': 'https://...', # Image URL
'revised_prompt': '...', # OpenAI may revise prompt
'cost': 0.04, # Cost in USD
'error': None # Error message if failed
}
4.2 Settings Retrieval
From IntegrationSettings:
integration = IntegrationSettings.objects.get(
account=account,
integration_type='image_generation',
is_active=True
)
config = integration.config
provider = config.get('provider') or config.get('service', 'openai')
if provider == 'runware':
model = config.get('model') or config.get('runwareModel', 'runware:97@1')
else:
model = config.get('model', 'dall-e-3')
image_type = config.get('image_type', 'realistic')
image_format = config.get('image_format', 'webp')
4.3 Prompt Templates
From PromptRegistry:
image_prompt_template = PromptRegistry.get_image_prompt_template(account)
negative_prompt = PromptRegistry.get_negative_prompt(account)
Formatting:
formatted = image_prompt_template.format(
post_title=content.title or content.meta_title,
image_prompt=image.prompt,
image_type=image_type # 'realistic', 'artistic', 'cartoon'
)
4.4 Error Handling
Per-Image Errors:
- If one image fails, continue with others
- Mark failed image:
status='failed' - Log error in
Imagesrecord or separate error field - Return success with partial count:
{'success': True, 'images_generated': 3, 'images_failed': 1}
Validation Errors:
- No prompts: Skip image, log warning
- No settings: Return error, don't start generation
- Invalid provider/model: Return error
5. Frontend Integration
5.1 Images Page Updates
File: frontend/src/pages/Writer/Images.tsx
Changes:
- Add "Generate Images" button in status column (or row actions)
- Button only enabled if:
- Status is 'pending'
- Prompt exists
- Content has at least one pending image
- On click: Collect all pending image IDs for that content
- Call API:
generateImages(imageIds) - Open progress modal if async
- Reload images on completion
5.2 Progress Modal Updates
File: frontend/src/components/common/ProgressModal.tsx
Changes:
- Add step definitions for
generate_images_from_prompts - Update progress messages
- Show image count in messages
- Optional: Display generated images in modal (like WP plugin)
5.3 Table Actions Config
File: frontend/src/config/pages/table-actions.config.tsx
Add row action (optional):
'/writer/images': {
rowActions: [
{
key: 'generate_images',
label: 'Generate Images',
icon: <BoltIcon className="w-5 h-5" />,
variant: 'primary',
},
],
}
6. Testing Strategy
6.1 Unit Tests
Test Function Methods:
validate(): Test with valid/invalid IDs, missing prompts, wrong statusprepare(): Test settings retrieval, prompt template loadingbuild_prompt(): Test prompt formattingparse_response(): Test URL extractionsave_output(): Test Images record update
6.2 Integration Tests
Test Full Flow:
- Create Images records with prompts
- Call API endpoint
- Verify Celery task created
- Verify progress updates
- Verify Images records updated with URLs
- Verify status changed to 'generated'
6.3 Error Scenarios
Test:
- Missing IntegrationSettings
- Invalid provider/model
- API errors (rate limits, invalid API key)
- Partial failures (some images succeed, some fail)
- Missing prompts
- Invalid image IDs
7. Implementation Checklist
Backend
- Create
GenerateImagesFromPromptsFunctionclass - Implement
validate()method - Implement
prepare()method - Implement
build_prompt()method - Implement
parse_response()method - Implement
save_output()method - Register function in
registry.py - Add to
__init__.pyexports - Add model config in
settings.py - Update
AIEngineprogress messages - Add API endpoint in
ImagesViewSet - Test with OpenAI provider
- Test with Runware provider
- Test error handling
Frontend
- Add
generateImages()API function - Add "Generate Images" button to Images page
- Add click handler
- Integrate progress modal
- Update progress modal step labels
- Update success messages
- Test UI flow
- Test error handling
Documentation
- Update AI_MASTER_ARCHITECTURE.md
- Add function to AI_FUNCTIONS_AUDIT_REPORT.md
- Document API endpoint
- Document settings requirements
8. Key Considerations
8.1 Rate Limiting
Issue: Image generation APIs have rate limits Solution: Process images sequentially (one at a time) Implementation: AIEngine loops through images, waits for each to complete
8.2 Cost Tracking
Issue: Need to track costs per image
Solution: AICore already tracks costs, store in AITaskLog
Implementation: Cost is returned from generate_image(), log in step_tracker
8.3 Progress Updates
Issue: Need granular progress (per image)
Solution: Update progress after each image: (completed / total) * 100
Implementation: Track in save_output(), update via progress_tracker.update()
8.4 Error Recovery
Issue: If one image fails, should continue with others
Solution: Catch errors per image, mark as failed, continue
Implementation: Try-catch in save_output() per image
8.5 Image Display
Issue: Should show generated images in progress modal? Solution: Optional enhancement, can add later Implementation: Store image URLs in step logs, display in modal
9. Alternative Approaches Considered
9.1 Process All in save_output()
Pros:
- Simpler implementation
- Direct control over loop
Cons:
- Doesn't use AIEngine phases properly
- Harder to track progress per image
- Less consistent with framework
Decision: Use AIEngine phases with loop detection
9.2 Separate Function Per Image
Pros:
- Each image is independent task
- Better error isolation
Cons:
- Too many Celery tasks
- Harder to track overall progress
- More complex frontend
Decision: Single function processes all images sequentially
10. Success Criteria
✅ Function follows BaseAIFunction pattern
✅ Uses AIEngine orchestrator
✅ Integrates with progress modal
✅ Uses prompt templates from Thinker/Prompts
✅ Uses settings from IntegrationSettings
✅ Handles errors gracefully
✅ Tracks progress per image
✅ Updates Images records correctly
✅ Works with both OpenAI and Runware
✅ Frontend button triggers generation
✅ Progress modal shows correct steps
✅ Success message shows image count
11. Next Steps
-
Start with Backend Function
- Create
GenerateImagesFromPromptsFunction - Implement all methods
- Test with single image
- Create
-
Add API Endpoint
- Add to
ImagesViewSet - Test endpoint
- Add to
-
Frontend Integration
- Add button
- Add handler
- Test flow
-
Progress Modal
- Update step labels
- Test progress updates
-
Error Handling
- Test error scenarios
- Verify graceful failures
-
Documentation
- Update architecture docs
- Add API docs
End of Plan