Files
igny8/docs/PRE-LAUNCH/ITEM-3-PROMPT-OPTIMIZATION.md
2025-12-11 07:20:21 +00:00

714 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Item 3: Prompt Improvement and Model Optimization
**Priority:** High
**Target:** Production Launch
**Last Updated:** December 11, 2025
---
## Overview
Redesign and optimize all AI prompts for clustering, idea generation, content generation, and image prompt extraction to achieve:
- Extreme accuracy and consistent outputs
- Faster processing with optimized token usage
- Correct word count adherence (500, 1000, 1500 words)
- Improved clustering quality and idea relevance
- Better image prompt clarity and relevance
---
## Current Prompt System Architecture
### Prompt Registry
**Location:** `backend/igny8_core/ai/prompts.py`
**Class:** `PromptRegistry`
**Hierarchy** (resolution order):
1. Task-level `prompt_override` (if exists on specific task)
2. Database prompt from `AIPrompt` model (account-specific)
3. Default fallback from `PromptRegistry.DEFAULT_PROMPTS`
**Storage:**
- Default prompts: Hardcoded in `prompts.py`
- Account overrides: `system_aiprompt` database table
- Task overrides: `prompt_override` field on task object
---
## Current Prompts Analysis
### 1. Clustering Prompt
**Function:** `auto_cluster`
**File:** `backend/igny8_core/ai/functions/auto_cluster.py`
**Prompt Key:** `'clustering'`
#### Current Prompt Structure
**Approach:** Semantic strategist + intent-driven clustering
**Key Instructions:**
- Return single JSON with "clusters" array
- Each cluster: name, description, keywords[]
- Multi-dimensional grouping (intent, use-case, function, persona, context)
- Model real search behavior and user journeys
- Avoid superficial groupings and duplicates
- 3-10 keywords per cluster
**Strengths:**
✅ Clear JSON output format
✅ Detailed grouping logic with dimensions
✅ Emphasis on semantic strength over keyword matching
✅ User journey modeling (Problem → Solution, General → Specific)
**Issues:**
❌ Very long prompt (~400+ tokens) - may confuse model
❌ No examples provided - model must guess formatting
❌ Doesn't specify what to do with outliers explicitly
❌ No guidance on cluster count (outputs variable)
❌ Description length not constrained
**Real-World Performance Issues:**
- Sometimes creates too many small clusters (1-2 keywords each)
- Inconsistent cluster naming convention
- Descriptions sometimes generic ("Keywords related to...")
---
### 2. Idea Generation Prompt
**Function:** `generate_ideas`
**File:** `backend/igny8_core/ai/functions/generate_ideas.py`
**Prompt Key:** `'ideas'`
#### Current Prompt Structure
**Approach:** SEO-optimized content ideas + outlines
**Key Instructions:**
- Input: Clusters + Keywords
- Output: JSON "ideas" array
- 1 cluster_hub + 2-4 supporting ideas per cluster
- Fields: title, description, content_type, content_structure, cluster_id, estimated_word_count, covered_keywords
- Outline format: intro (hook + 2 paragraphs), 5-8 H2 sections with 2-3 H3s each
- Content mixing: paragraphs, lists, tables, blockquotes
- No bullets/lists at start
- Professional tone, no generic phrasing
**Strengths:**
✅ Detailed outline structure
✅ Content mixing guidance (lists, tables, blockquotes)
✅ Clear JSON format
✅ Tone guidelines
**Issues:**
❌ Very complex prompt (600+ tokens)
❌ Outline format too prescriptive (might limit creativity)
❌ No examples provided
❌ Estimated word count often inaccurate (too high or too low)
❌ "hook" guidance unclear (what makes a good hook?)
❌ Content structure validation not enforced
**Real-World Performance Issues:**
- Generated ideas sometimes too similar within cluster
- Outlines don't always respect structure types (e.g., "review" vs "guide")
- covered_keywords field sometimes empty or incorrect
- cluster_hub vs supporting ideas distinction unclear
---
### 3. Content Generation Prompt
**Function:** `generate_content`
**File:** `backend/igny8_core/ai/functions/generate_content.py`
**Prompt Key:** `'content_generation'`
#### Current Prompt Structure
**Approach:** Editorial content strategist
**Key Instructions:**
- Output: JSON {title, content (HTML)}
- Introduction: 1 italic hook (30-40 words) + 2 paragraphs (50-60 words each), no headings
- H2 sections: 5-8 total, 250-300 words each
- Section format: 2 narrative paragraphs → list/table → optional closing paragraph → 2-3 subsections
- Vary list/table types
- Never start section with list/table
- Tone: professional, no passive voice, no generic intros
- Keyword usage: natural in title, intro, headings
**Strengths:**
✅ Detailed structure guidance
✅ Strong tone/style rules
✅ HTML output format
✅ Keyword integration guidance
**Issues:**
**Word count not mentioned in prompt** - critical flaw
❌ No guidance on 500 vs 1000 vs 1500 word versions
❌ Hook word count (30-40) + paragraph counts (50-60 × 2) don't scale proportionally
❌ Section word count (250-300) doesn't adapt to total target
❌ No example output
❌ Content structure (article vs guide vs review) not clearly differentiated
❌ Table column guidance missing (what columns? how many?)
**Real-World Performance Issues:**
- **Output length wildly inconsistent** (generates 800 words when asked for 1500)
- Introductions sometimes have headings despite instructions
- Lists appear at start of sections
- Table structure unclear (random columns)
- Doesn't adapt content density to word count
---
### 4. Image Prompt Extraction
**Function:** `generate_image_prompts`
**File:** `backend/igny8_core/ai/functions/generate_image_prompts.py`
**Prompt Key:** `'image_prompt_extraction'`
#### Current Prompt Structure
**Approach:** Extract visual descriptions from article
**Key Instructions:**
- Input: article title + content
- Output: JSON {featured_prompt, in_article_prompts[]}
- Extract featured image (main topic)
- Extract up to {max_images} in-article images
- Each prompt detailed for image generation (visual elements, style, mood, composition)
**Strengths:**
✅ Clear structure
✅ Separates featured vs in-article
✅ Emphasizes detail in descriptions
**Issues:**
❌ No guidance on what makes a good image prompt
❌ No style/mood specifications
❌ Doesn't specify where in article to place images
❌ No examples
❌ "Detailed enough" is subjective
**Real-World Performance Issues:**
- Prompts sometimes too generic ("Image of a person using a laptop")
- No context from article content (extracts irrelevant visuals)
- Featured image prompt sometimes identical to in-article prompt
- No guidance on image diversity (all similar)
---
### 5. Image Generation Template
**Prompt Key:** `'image_prompt_template'`
#### Current Template
**Approach:** Template-based prompt assembly
**Format:**
```
Create a high-quality {image_type} image... "{post_title}"... {image_prompt}...
Focus on realistic, well-composed scene... lifestyle/editorial web content...
Avoid text, watermarks, logos... **not blurry.**
```
**Issues:**
❌ {image_type} not always populated
❌ "high-quality" and "not blurry" redundant/unclear
❌ No style guidance (photographic, illustration, 3D, etc.)
❌ No aspect ratio specification
---
## Required Improvements
### A. Clustering Prompt Redesign
#### Goals
- Reduce prompt length by 30-40%
- Add 2-3 concrete examples
- Enforce consistent cluster count (5-15 clusters ideal)
- Standardize cluster naming (title case, descriptive)
- Limit description to 20-30 words
#### Proposed Structure
**Section 1: Role & Task** (50 tokens)
- Clear, concise role definition
- Task: group keywords into intent-driven clusters
**Section 2: Output Format with Example** (100 tokens)
- JSON structure
- Show 1 complete example cluster
- Specify exact field requirements
**Section 3: Clustering Rules** (150 tokens)
- List 5-7 key rules (bullet format)
- Keyword-first approach
- Intent dimensions (brief)
- Quality thresholds (3-10 keywords per cluster)
- No duplicates
**Section 4: Quality Checklist** (50 tokens)
- Checklist of 4-5 validation points
- Model self-validates before output
**Total:** ~350 tokens (vs current ~420)
#### Example Output Format to Include
```json
{
"clusters": [
{
"name": "Organic Bedding Benefits",
"description": "Health, eco-friendly, and comfort aspects of organic cotton bedding materials",
"keywords": ["organic sheets", "eco-friendly bedding", "chemical-free cotton", "hypoallergenic sheets", "sustainable bedding"]
}
]
}
```
---
### B. Idea Generation Prompt Redesign
#### Goals
- Simplify outline structure (less prescriptive)
- Add examples of cluster_hub vs supporting ideas
- Better covered_keywords extraction
- Adaptive word count estimation
- Content structure differentiation
#### Proposed Structure
**Section 1: Role & Objective** (40 tokens)
- SEO content strategist
- Task: generate content ideas from clusters
**Section 2: Output Format with Examples** (150 tokens)
- Show 1 cluster_hub example
- Show 1 supporting idea example
- Highlight key differences
**Section 3: Idea Generation Rules** (100 tokens)
- 1 cluster_hub (comprehensive, authoritative)
- 2-4 supporting ideas (specific angles)
- Word count: 1500-2200 for hubs, 1000-1500 for supporting
- covered_keywords: extract from cluster keywords
**Section 4: Outline Guidance** (100 tokens)
- Simplified: Intro + 5-8 sections + Conclusion
- Section types by content_structure:
- article: narrative + data
- guide: step-by-step + tips
- review: pros/cons + comparison
- listicle: numbered + categories
- comparison: side-by-side + verdict
**Total:** ~390 tokens (vs current ~610)
---
### C. Content Generation Prompt Redesign
**Most Critical Improvement:** Word Count Adherence
#### Goals
- **Primary:** Generate exact word count (±5% tolerance)
- Scale structure proportionally to word count
- Differentiate content structures clearly
- Improve HTML quality and consistency
- Better keyword integration
#### Proposed Adaptive Word Count System
**Word Count Targets:**
- 500 words: Short-form (5 sections × 80 words + intro/outro 60 words)
- 1000 words: Standard (6 sections × 140 words + intro/outro 120 words)
- 1500 words: Long-form (7 sections × 180 words + intro/outro 180 words)
**Prompt Variable Replacement:**
Before sending to AI, calculate:
- `{TARGET_WORD_COUNT}` - from task.word_count
- `{INTRO_WORDS}` - 60 / 120 / 180 based on target
- `{SECTION_COUNT}` - 5 / 6 / 7 based on target
- `{SECTION_WORDS}` - 80 / 140 / 180 based on target
- `{HOOK_WORDS}` - 25 / 35 / 45 based on target
#### Proposed Structure
**Section 1: Role & Objective** (30 tokens)
```
You are an editorial content writer. Generate a {TARGET_WORD_COUNT}-word article...
```
**Section 2: Word Count Requirements** (80 tokens)
```
CRITICAL: The content must be exactly {TARGET_WORD_COUNT} words (±5% tolerance).
Structure breakdown:
- Introduction: {INTRO_WORDS} words total
- Hook (italic): {HOOK_WORDS} words
- Paragraphs: 2 × ~{INTRO_WORDS/2} words each
- Main Sections: {SECTION_COUNT} H2 sections
- Each section: {SECTION_WORDS} words
- Conclusion: 60 words
Word count validation: Count words in final output and adjust if needed.
```
**Section 3: Content Flow & HTML** (120 tokens)
- Detailed structure per section
- HTML tag usage (<p>, <h2>, <h3>, <ul>, <ol>, <table>)
- Formatting rules
**Section 4: Style & Quality** (80 tokens)
- Tone guidance
- Keyword usage
- Avoid generic phrases
- Examples of good vs bad openings
**Section 5: Content Structure Types** (90 tokens)
- article: {structure description}
- guide: {structure description}
- review: {structure description}
- comparison: {structure description}
- listicle: {structure description}
- cluster_hub: {structure description}
**Section 6: Output Format with Example** (100 tokens)
- JSON structure
- Show abbreviated example with proper HTML
**Total:** ~500 tokens (vs current ~550, but much more precise)
---
### D. Image Prompt Improvements
#### Goals
- Generate visually diverse prompts
- Better context from article content
- Specify image placement guidelines
- Improve prompt detail and clarity
#### Proposed Extraction Prompt Structure
**Section 1: Task & Context** (50 tokens)
```
Extract image prompts from this article for visual content placement.
Article: {title}
Content: {content}
Required: 1 featured + {max_images} in-article images
```
**Section 2: Image Types & Guidelines** (100 tokens)
```
Featured Image:
- Hero visual representing article's main theme
- Broad, engaging, high-quality
- Should work at large sizes (1200×630+)
In-Article Images (place strategically):
1. After introduction
2. Mid-article (before major H2 sections)
3. Supporting specific concepts or examples
4. Before conclusion
Each prompt must describe:
- Subject & composition
- Visual style (photographic, minimal, editorial)
- Mood & lighting
- Color palette suggestions
- Avoid: text, logos, faces (unless relevant)
```
**Section 3: Prompt Quality Rules** (80 tokens)
- Be specific and descriptive (not generic)
- Include scene details, angles, perspective
- Specify lighting, time of day if relevant
- Mention style references
- Ensure diversity across all images
- No duplicate concepts
**Section 4: Output Format** (50 tokens)
- JSON structure
- Show example with good vs bad prompts
#### Proposed Template Prompt Improvement
Replace current template with:
```
A {style} photograph for "{post_title}". {image_prompt}.
Composition: {composition_hint}. Lighting: {lighting_hint}.
Mood: {mood}. Style: clean, modern, editorial web content.
No text, watermarks, or logos.
```
Where:
- {style} - photographic, minimalist, lifestyle, etc.
- {composition_hint} - center-framed, rule-of-thirds, wide-angle, etc.
- {lighting_hint} - natural daylight, soft indoor, dramatic, etc.
- {mood} - professional, warm, energetic, calm, etc.
---
## Implementation Plan
### Phase 1: Clustering Prompt (Week 1)
**Tasks:**
1. ✅ Draft new clustering prompt with examples
2. ✅ Test with sample keyword sets (20, 50, 100 keywords)
3. ✅ Compare outputs: old vs new
4. ✅ Validate cluster quality (manual review)
5. ✅ Update `PromptRegistry.DEFAULT_PROMPTS['clustering']`
6. ✅ Deploy and monitor
**Success Criteria:**
- Consistent cluster count (5-15)
- No single-keyword clusters
- Clear, descriptive names
- Concise descriptions (20-30 words)
- 95%+ of keywords clustered
---
### Phase 2: Idea Generation Prompt (Week 1-2)
**Tasks:**
1. ✅ Draft new ideas prompt with examples
2. ✅ Test with 5-10 clusters
3. ✅ Validate cluster_hub vs supporting idea distinction
4. ✅ Check covered_keywords accuracy
5. ✅ Verify content_structure alignment
6. ✅ Update `PromptRegistry.DEFAULT_PROMPTS['ideas']`
7. ✅ Deploy and monitor
**Success Criteria:**
- Clear distinction between hub and supporting ideas
- Accurate covered_keywords extraction
- Appropriate word count estimates
- Outlines match content_structure type
- No duplicate ideas within cluster
---
### Phase 3: Content Generation Prompt (Week 2)
**Tasks:**
1. ✅ Draft new content prompt with word count logic
2. ✅ Implement dynamic variable replacement in `build_prompt()`
3. ✅ Test with 500, 1000, 1500 word targets
4. ✅ Validate actual word counts (automated counting)
5. ✅ Test all content_structure types
6. ✅ Verify HTML quality and consistency
7. ✅ Update `PromptRegistry.DEFAULT_PROMPTS['content_generation']`
8. ✅ Deploy and monitor
**Code Change Required:**
**File:** `backend/igny8_core/ai/functions/generate_content.py`
**Method:** `build_prompt()`
**Add word count calculation:**
```python
def build_prompt(self, data: Any, account=None) -> str:
task = data if not isinstance(data, list) else data[0]
# Calculate adaptive word count parameters
target_words = task.word_count or 1000
if target_words <= 600:
intro_words = 60
section_count = 5
section_words = 80
hook_words = 25
elif target_words <= 1200:
intro_words = 120
section_count = 6
section_words = 140
hook_words = 35
else:
intro_words = 180
section_count = 7
section_words = 180
hook_words = 45
# Get prompt and replace variables
prompt = PromptRegistry.get_prompt(
function_name='generate_content',
account=account,
task=task,
context={
'TARGET_WORD_COUNT': target_words,
'INTRO_WORDS': intro_words,
'SECTION_COUNT': section_count,
'SECTION_WORDS': section_words,
'HOOK_WORDS': hook_words,
# ... existing context
}
)
return prompt
```
**Success Criteria:**
- 95%+ of generated content within ±5% of target word count
- HTML structure consistent
- Content structure types clearly differentiated
- Keyword integration natural
- No sections starting with lists
---
### Phase 4: Image Prompt Improvements (Week 2-3)
**Tasks:**
1. ✅ Draft new extraction prompt with placement guidelines
2. ✅ Draft new template prompt with style variables
3. ✅ Test with 10 sample articles
4. ✅ Validate image diversity and relevance
5. ✅ Update both prompts in registry
6. ✅ Update `GenerateImagePromptsFunction` to use new template
7. ✅ Deploy and monitor
**Success Criteria:**
- No duplicate image concepts in same article
- Prompts are specific and detailed
- Featured image distinct from in-article images
- Image placement logically distributed
- Generated images relevant to content
---
## Prompt Versioning & Testing
### Version Control
**Recommendation:** Store prompt versions in database for A/B testing
**Schema:**
```python
class AIPromptVersion(models.Model):
prompt_type = CharField(choices=PROMPT_TYPE_CHOICES)
version = IntegerField()
prompt_value = TextField()
is_active = BooleanField(default=False)
created_at = DateTimeField(auto_now_add=True)
performance_metrics = JSONField(default=dict) # Track success rates
```
**Process:**
1. Test new prompt version alongside current
2. Compare outputs on same inputs
3. Measure quality metrics (manual + automated)
4. Gradually roll out if better
5. Keep old version as fallback
---
### Automated Quality Metrics
**Implement automated checks:**
| Metric | Check | Threshold |
|--------|-------|-----------|
| Word Count Accuracy | `abs(actual - target) / target` | < 0.05 (±5%) |
| HTML Validity | Parse with BeautifulSoup | 100% valid |
| Keyword Presence | Count keyword mentions | ≥ 3 for primary |
| Structure Compliance | Check H2/H3 hierarchy | Valid structure |
| Cluster Count | Number of clusters | 5-15 |
| Cluster Size | Keywords per cluster | 3-10 |
| No Duplicates | Keyword appears once | 100% unique |
**Log results:**
- Track per prompt version
- Identify patterns in failures
- Use for prompt iteration
---
## Model Selection & Optimization
### Current Models
**Location:** `backend/igny8_core/ai/settings.py`
**Default Models per Function:**
- Clustering: GPT-4 (expensive but accurate)
- Ideas: GPT-4 (creative)
- Content: GPT-4 (quality)
- Image Prompts: GPT-3.5-turbo (simpler task)
- Images: DALL-E 3 / Runware
### Optimization Opportunities
**Cost vs Quality Tradeoffs:**
| Function | Current | Alternative | Cost Savings | Quality Impact |
|----------|---------|-------------|--------------|----------------|
| Clustering | GPT-4 | GPT-4-turbo | 50% | Minimal |
| Ideas | GPT-4 | GPT-4-turbo | 50% | Minimal |
| Content | GPT-4 | GPT-4-turbo | 50% | Test required |
| Image Prompts | GPT-3.5 | Keep | - | - |
**Recommendation:** Test GPT-4-turbo for all text generation tasks
- Faster response time
- 50% cost reduction
- Similar quality for structured outputs
---
## Success Metrics
- ✅ Word count accuracy: 95%+ within ±5%
- ✅ Clustering quality: No single-keyword clusters
- ✅ Idea generation: Clear hub vs supporting distinction
- ✅ HTML validity: 100%
- ✅ Keyword integration: Natural, not stuffed
- ✅ Image prompt diversity: No duplicates
- ✅ User satisfaction: Fewer manual edits needed
- ✅ Processing time: <10s for 1000-word article
- ✅ Credit cost: 30% reduction with model optimization
---
## Related Files Reference
### Backend
- `backend/igny8_core/ai/prompts.py` - Prompt registry and defaults
- `backend/igny8_core/ai/functions/auto_cluster.py` - Clustering function
- `backend/igny8_core/ai/functions/generate_ideas.py` - Ideas function
- `backend/igny8_core/ai/functions/generate_content.py` - Content function
- `backend/igny8_core/ai/functions/generate_image_prompts.py` - Image prompts
- `backend/igny8_core/ai/settings.py` - Model configuration
- `backend/igny8_core/modules/system/models.py` - AIPrompt model
### Testing
- Create test suite: `backend/igny8_core/ai/tests/test_prompts.py`
- Test fixtures with sample inputs
- Automated quality validation
- Performance benchmarks
---
## Notes
- All prompt changes should be tested on real data first
- Keep old prompts in version history for rollback
- Monitor user feedback on content quality
- Consider user-customizable prompt templates (advanced feature)
- Document prompt engineering best practices for team
- SAG clustering prompt (mentioned in original doc) to be handled separately as specialized architecture