pre-launch-final mods-docs

2025-12-11 07:20:21 +00:00
parent 20fdd3b295
commit a736bc3d34
8 changed files with 5464 additions and 0 deletions
--- a/docs/PRE-LAUNCH/ITEM-3-PROMPT-OPTIMIZATION.md
+++ b/docs/PRE-LAUNCH/ITEM-3-PROMPT-OPTIMIZATION.md
@@ -0,0 +1,713 @@
+# Item 3: Prompt Improvement and Model Optimization
+
+**Priority:** High  
+**Target:** Production Launch  
+**Last Updated:** December 11, 2025
+
+---
+
+## Overview
+
+Redesign and optimize all AI prompts for clustering, idea generation, content generation, and image prompt extraction to achieve:
+- Extreme accuracy and consistent outputs
+- Faster processing with optimized token usage
+- Correct word count adherence (500, 1000, 1500 words)
+- Improved clustering quality and idea relevance
+- Better image prompt clarity and relevance
+
+---
+
+## Current Prompt System Architecture
+
+### Prompt Registry
+
+**Location:** `backend/igny8_core/ai/prompts.py`
+
+**Class:** `PromptRegistry`
+
+**Hierarchy** (resolution order):
+1. Task-level `prompt_override` (if exists on specific task)
+2. Database prompt from `AIPrompt` model (account-specific)
+3. Default fallback from `PromptRegistry.DEFAULT_PROMPTS`
+
+**Storage:**
+- Default prompts: Hardcoded in `prompts.py`
+- Account overrides: `system_aiprompt` database table
+- Task overrides: `prompt_override` field on task object
+
+---
+
+## Current Prompts Analysis
+
+### 1. Clustering Prompt
+
+**Function:** `auto_cluster`  
+**File:** `backend/igny8_core/ai/functions/auto_cluster.py`  
+**Prompt Key:** `'clustering'`
+
+#### Current Prompt Structure
+
+**Approach:** Semantic strategist + intent-driven clustering
+
+**Key Instructions:**
+- Return single JSON with "clusters" array
+- Each cluster: name, description, keywords[]
+- Multi-dimensional grouping (intent, use-case, function, persona, context)
+- Model real search behavior and user journeys
+- Avoid superficial groupings and duplicates
+- 3-10 keywords per cluster
+
+**Strengths:**
+✅ Clear JSON output format  
+✅ Detailed grouping logic with dimensions  
+✅ Emphasis on semantic strength over keyword matching  
+✅ User journey modeling (Problem → Solution, General → Specific)  
+
+**Issues:**
+❌ Very long prompt (~400+ tokens) - may confuse model  
+❌ No examples provided - model must guess formatting  
+❌ Doesn't specify what to do with outliers explicitly  
+❌ No guidance on cluster count (outputs variable)  
+❌ Description length not constrained  
+
+**Real-World Performance Issues:**
+- Sometimes creates too many small clusters (1-2 keywords each)
+- Inconsistent cluster naming convention
+- Descriptions sometimes generic ("Keywords related to...")
+
+---
+
+### 2. Idea Generation Prompt
+
+**Function:** `generate_ideas`  
+**File:** `backend/igny8_core/ai/functions/generate_ideas.py`  
+**Prompt Key:** `'ideas'`
+
+#### Current Prompt Structure
+
+**Approach:** SEO-optimized content ideas + outlines
+
+**Key Instructions:**
+- Input: Clusters + Keywords
+- Output: JSON "ideas" array
+- 1 cluster_hub + 2-4 supporting ideas per cluster
+- Fields: title, description, content_type, content_structure, cluster_id, estimated_word_count, covered_keywords
+- Outline format: intro (hook + 2 paragraphs), 5-8 H2 sections with 2-3 H3s each
+- Content mixing: paragraphs, lists, tables, blockquotes
+- No bullets/lists at start
+- Professional tone, no generic phrasing
+
+**Strengths:**
+✅ Detailed outline structure  
+✅ Content mixing guidance (lists, tables, blockquotes)  
+✅ Clear JSON format  
+✅ Tone guidelines  
+
+**Issues:**
+❌ Very complex prompt (600+ tokens)  
+❌ Outline format too prescriptive (might limit creativity)  
+❌ No examples provided  
+❌ Estimated word count often inaccurate (too high or too low)  
+❌ "hook" guidance unclear (what makes a good hook?)  
+❌ Content structure validation not enforced  
+
+**Real-World Performance Issues:**
+- Generated ideas sometimes too similar within cluster
+- Outlines don't always respect structure types (e.g., "review" vs "guide")
+- covered_keywords field sometimes empty or incorrect
+- cluster_hub vs supporting ideas distinction unclear
+
+---
+
+### 3. Content Generation Prompt
+
+**Function:** `generate_content`  
+**File:** `backend/igny8_core/ai/functions/generate_content.py`  
+**Prompt Key:** `'content_generation'`
+
+#### Current Prompt Structure
+
+**Approach:** Editorial content strategist
+
+**Key Instructions:**
+- Output: JSON {title, content (HTML)}
+- Introduction: 1 italic hook (30-40 words) + 2 paragraphs (50-60 words each), no headings
+- H2 sections: 5-8 total, 250-300 words each
+- Section format: 2 narrative paragraphs → list/table → optional closing paragraph → 2-3 subsections
+- Vary list/table types
+- Never start section with list/table
+- Tone: professional, no passive voice, no generic intros
+- Keyword usage: natural in title, intro, headings
+
+**Strengths:**
+✅ Detailed structure guidance  
+✅ Strong tone/style rules  
+✅ HTML output format  
+✅ Keyword integration guidance  
+
+**Issues:**
+❌ **Word count not mentioned in prompt** - critical flaw  
+❌ No guidance on 500 vs 1000 vs 1500 word versions  
+❌ Hook word count (30-40) + paragraph counts (50-60 × 2) don't scale proportionally  
+❌ Section word count (250-300) doesn't adapt to total target  
+❌ No example output  
+❌ Content structure (article vs guide vs review) not clearly differentiated  
+❌ Table column guidance missing (what columns? how many?)  
+
+**Real-World Performance Issues:**
+- **Output length wildly inconsistent** (generates 800 words when asked for 1500)
+- Introductions sometimes have headings despite instructions
+- Lists appear at start of sections
+- Table structure unclear (random columns)
+- Doesn't adapt content density to word count
+
+---
+
+### 4. Image Prompt Extraction
+
+**Function:** `generate_image_prompts`  
+**File:** `backend/igny8_core/ai/functions/generate_image_prompts.py`  
+**Prompt Key:** `'image_prompt_extraction'`
+
+#### Current Prompt Structure
+
+**Approach:** Extract visual descriptions from article
+
+**Key Instructions:**
+- Input: article title + content
+- Output: JSON {featured_prompt, in_article_prompts[]}
+- Extract featured image (main topic)
+- Extract up to {max_images} in-article images
+- Each prompt detailed for image generation (visual elements, style, mood, composition)
+
+**Strengths:**
+✅ Clear structure  
+✅ Separates featured vs in-article  
+✅ Emphasizes detail in descriptions  
+
+**Issues:**
+❌ No guidance on what makes a good image prompt  
+❌ No style/mood specifications  
+❌ Doesn't specify where in article to place images  
+❌ No examples  
+❌ "Detailed enough" is subjective  
+
+**Real-World Performance Issues:**
+- Prompts sometimes too generic ("Image of a person using a laptop")
+- No context from article content (extracts irrelevant visuals)
+- Featured image prompt sometimes identical to in-article prompt
+- No guidance on image diversity (all similar)
+
+---
+
+### 5. Image Generation Template
+
+**Prompt Key:** `'image_prompt_template'`
+
+#### Current Template
+
+**Approach:** Template-based prompt assembly
+
+**Format:**
+```
+Create a high-quality {image_type} image... "{post_title}"... {image_prompt}...
+Focus on realistic, well-composed scene... lifestyle/editorial web content...
+Avoid text, watermarks, logos... **not blurry.**
+```
+
+**Issues:**
+❌ {image_type} not always populated  
+❌ "high-quality" and "not blurry" redundant/unclear  
+❌ No style guidance (photographic, illustration, 3D, etc.)  
+❌ No aspect ratio specification  
+
+---
+
+## Required Improvements
+
+### A. Clustering Prompt Redesign
+
+#### Goals
+- Reduce prompt length by 30-40%
+- Add 2-3 concrete examples
+- Enforce consistent cluster count (5-15 clusters ideal)
+- Standardize cluster naming (title case, descriptive)
+- Limit description to 20-30 words
+
+#### Proposed Structure
+
+**Section 1: Role & Task** (50 tokens)
+- Clear, concise role definition
+- Task: group keywords into intent-driven clusters
+
+**Section 2: Output Format with Example** (100 tokens)
+- JSON structure
+- Show 1 complete example cluster
+- Specify exact field requirements
+
+**Section 3: Clustering Rules** (150 tokens)
+- List 5-7 key rules (bullet format)
+- Keyword-first approach
+- Intent dimensions (brief)
+- Quality thresholds (3-10 keywords per cluster)
+- No duplicates
+
+**Section 4: Quality Checklist** (50 tokens)
+- Checklist of 4-5 validation points
+- Model self-validates before output
+
+**Total:** ~350 tokens (vs current ~420)
+
+#### Example Output Format to Include
+
+```json
+{
+  "clusters": [
+    {
+      "name": "Organic Bedding Benefits",
+      "description": "Health, eco-friendly, and comfort aspects of organic cotton bedding materials",
+      "keywords": ["organic sheets", "eco-friendly bedding", "chemical-free cotton", "hypoallergenic sheets", "sustainable bedding"]
+    }
+  ]
+}
+```
+
+---
+
+### B. Idea Generation Prompt Redesign
+
+#### Goals
+- Simplify outline structure (less prescriptive)
+- Add examples of cluster_hub vs supporting ideas
+- Better covered_keywords extraction
+- Adaptive word count estimation
+- Content structure differentiation
+
+#### Proposed Structure
+
+**Section 1: Role & Objective** (40 tokens)
+- SEO content strategist
+- Task: generate content ideas from clusters
+
+**Section 2: Output Format with Examples** (150 tokens)
+- Show 1 cluster_hub example
+- Show 1 supporting idea example
+- Highlight key differences
+
+**Section 3: Idea Generation Rules** (100 tokens)
+- 1 cluster_hub (comprehensive, authoritative)
+- 2-4 supporting ideas (specific angles)
+- Word count: 1500-2200 for hubs, 1000-1500 for supporting
+- covered_keywords: extract from cluster keywords
+
+**Section 4: Outline Guidance** (100 tokens)
+- Simplified: Intro + 5-8 sections + Conclusion
+- Section types by content_structure:
+  - article: narrative + data
+  - guide: step-by-step + tips
+  - review: pros/cons + comparison
+  - listicle: numbered + categories
+  - comparison: side-by-side + verdict
+
+**Total:** ~390 tokens (vs current ~610)
+
+---
+
+### C. Content Generation Prompt Redesign
+
+**Most Critical Improvement:** Word Count Adherence
+
+#### Goals
+- **Primary:** Generate exact word count (±5% tolerance)
+- Scale structure proportionally to word count
+- Differentiate content structures clearly
+- Improve HTML quality and consistency
+- Better keyword integration
+
+#### Proposed Adaptive Word Count System
+
+**Word Count Targets:**
+- 500 words: Short-form (5 sections × 80 words + intro/outro 60 words)
+- 1000 words: Standard (6 sections × 140 words + intro/outro 120 words)
+- 1500 words: Long-form (7 sections × 180 words + intro/outro 180 words)
+
+**Prompt Variable Replacement:**
+
+Before sending to AI, calculate:
+- `{TARGET_WORD_COUNT}` - from task.word_count
+- `{INTRO_WORDS}` - 60 / 120 / 180 based on target
+- `{SECTION_COUNT}` - 5 / 6 / 7 based on target
+- `{SECTION_WORDS}` - 80 / 140 / 180 based on target
+- `{HOOK_WORDS}` - 25 / 35 / 45 based on target
+
+#### Proposed Structure
+
+**Section 1: Role & Objective** (30 tokens)
+```
+You are an editorial content writer. Generate a {TARGET_WORD_COUNT}-word article...
+```
+
+**Section 2: Word Count Requirements** (80 tokens)
+```
+CRITICAL: The content must be exactly {TARGET_WORD_COUNT} words (±5% tolerance).
+
+Structure breakdown:
+- Introduction: {INTRO_WORDS} words total
+  - Hook (italic): {HOOK_WORDS} words
+  - Paragraphs: 2 × ~{INTRO_WORDS/2} words each
+- Main Sections: {SECTION_COUNT} H2 sections
+  - Each section: {SECTION_WORDS} words
+- Conclusion: 60 words
+
+Word count validation: Count words in final output and adjust if needed.
+```
+
+**Section 3: Content Flow & HTML** (120 tokens)
+- Detailed structure per section
+- HTML tag usage (<p>, <h2>, <h3>, <ul>, <ol>, <table>)
+- Formatting rules
+
+**Section 4: Style & Quality** (80 tokens)
+- Tone guidance
+- Keyword usage
+- Avoid generic phrases
+- Examples of good vs bad openings
+
+**Section 5: Content Structure Types** (90 tokens)
+- article: {structure description}
+- guide: {structure description}
+- review: {structure description}
+- comparison: {structure description}
+- listicle: {structure description}
+- cluster_hub: {structure description}
+
+**Section 6: Output Format with Example** (100 tokens)
+- JSON structure
+- Show abbreviated example with proper HTML
+
+**Total:** ~500 tokens (vs current ~550, but much more precise)
+
+---
+
+### D. Image Prompt Improvements
+
+#### Goals
+- Generate visually diverse prompts
+- Better context from article content
+- Specify image placement guidelines
+- Improve prompt detail and clarity
+
+#### Proposed Extraction Prompt Structure
+
+**Section 1: Task & Context** (50 tokens)
+```
+Extract image prompts from this article for visual content placement.
+
+Article: {title}
+Content: {content}
+Required: 1 featured + {max_images} in-article images
+```
+
+**Section 2: Image Types & Guidelines** (100 tokens)
+```
+Featured Image:
+- Hero visual representing article's main theme
+- Broad, engaging, high-quality
+- Should work at large sizes (1200×630+)
+
+In-Article Images (place strategically):
+1. After introduction
+2. Mid-article (before major H2 sections)
+3. Supporting specific concepts or examples
+4. Before conclusion
+
+Each prompt must describe:
+- Subject & composition
+- Visual style (photographic, minimal, editorial)
+- Mood & lighting
+- Color palette suggestions
+- Avoid: text, logos, faces (unless relevant)
+```
+
+**Section 3: Prompt Quality Rules** (80 tokens)
+- Be specific and descriptive (not generic)
+- Include scene details, angles, perspective
+- Specify lighting, time of day if relevant
+- Mention style references
+- Ensure diversity across all images
+- No duplicate concepts
+
+**Section 4: Output Format** (50 tokens)
+- JSON structure
+- Show example with good vs bad prompts
+
+#### Proposed Template Prompt Improvement
+
+Replace current template with:
+
+```
+A {style} photograph for "{post_title}". {image_prompt}. 
+Composition: {composition_hint}. Lighting: {lighting_hint}. 
+Mood: {mood}. Style: clean, modern, editorial web content. 
+No text, watermarks, or logos.
+```
+
+Where:
+- {style} - photographic, minimalist, lifestyle, etc.
+- {composition_hint} - center-framed, rule-of-thirds, wide-angle, etc.
+- {lighting_hint} - natural daylight, soft indoor, dramatic, etc.
+- {mood} - professional, warm, energetic, calm, etc.
+
+---
+
+## Implementation Plan
+
+### Phase 1: Clustering Prompt (Week 1)
+
+**Tasks:**
+1. ✅ Draft new clustering prompt with examples
+2. ✅ Test with sample keyword sets (20, 50, 100 keywords)
+3. ✅ Compare outputs: old vs new
+4. ✅ Validate cluster quality (manual review)
+5. ✅ Update `PromptRegistry.DEFAULT_PROMPTS['clustering']`
+6. ✅ Deploy and monitor
+
+**Success Criteria:**
+- Consistent cluster count (5-15)
+- No single-keyword clusters
+- Clear, descriptive names
+- Concise descriptions (20-30 words)
+- 95%+ of keywords clustered
+
+---
+
+### Phase 2: Idea Generation Prompt (Week 1-2)
+
+**Tasks:**
+1. ✅ Draft new ideas prompt with examples
+2. ✅ Test with 5-10 clusters
+3. ✅ Validate cluster_hub vs supporting idea distinction
+4. ✅ Check covered_keywords accuracy
+5. ✅ Verify content_structure alignment
+6. ✅ Update `PromptRegistry.DEFAULT_PROMPTS['ideas']`
+7. ✅ Deploy and monitor
+
+**Success Criteria:**
+- Clear distinction between hub and supporting ideas
+- Accurate covered_keywords extraction
+- Appropriate word count estimates
+- Outlines match content_structure type
+- No duplicate ideas within cluster
+
+---
+
+### Phase 3: Content Generation Prompt (Week 2)
+
+**Tasks:**
+1. ✅ Draft new content prompt with word count logic
+2. ✅ Implement dynamic variable replacement in `build_prompt()`
+3. ✅ Test with 500, 1000, 1500 word targets
+4. ✅ Validate actual word counts (automated counting)
+5. ✅ Test all content_structure types
+6. ✅ Verify HTML quality and consistency
+7. ✅ Update `PromptRegistry.DEFAULT_PROMPTS['content_generation']`
+8. ✅ Deploy and monitor
+
+**Code Change Required:**
+
+**File:** `backend/igny8_core/ai/functions/generate_content.py`
+
+**Method:** `build_prompt()`
+
+**Add word count calculation:**
+
+```python
+def build_prompt(self, data: Any, account=None) -> str:
+    task = data if not isinstance(data, list) else data[0]
+    
+    # Calculate adaptive word count parameters
+    target_words = task.word_count or 1000
+    
+    if target_words <= 600:
+        intro_words = 60
+        section_count = 5
+        section_words = 80
+        hook_words = 25
+    elif target_words <= 1200:
+        intro_words = 120
+        section_count = 6
+        section_words = 140
+        hook_words = 35
+    else:
+        intro_words = 180
+        section_count = 7
+        section_words = 180
+        hook_words = 45
+    
+    # Get prompt and replace variables
+    prompt = PromptRegistry.get_prompt(
+        function_name='generate_content',
+        account=account,
+        task=task,
+        context={
+            'TARGET_WORD_COUNT': target_words,
+            'INTRO_WORDS': intro_words,
+            'SECTION_COUNT': section_count,
+            'SECTION_WORDS': section_words,
+            'HOOK_WORDS': hook_words,
+            # ... existing context
+        }
+    )
+    
+    return prompt
+```
+
+**Success Criteria:**
+- 95%+ of generated content within ±5% of target word count
+- HTML structure consistent
+- Content structure types clearly differentiated
+- Keyword integration natural
+- No sections starting with lists
+
+---
+
+### Phase 4: Image Prompt Improvements (Week 2-3)
+
+**Tasks:**
+1. ✅ Draft new extraction prompt with placement guidelines
+2. ✅ Draft new template prompt with style variables
+3. ✅ Test with 10 sample articles
+4. ✅ Validate image diversity and relevance
+5. ✅ Update both prompts in registry
+6. ✅ Update `GenerateImagePromptsFunction` to use new template
+7. ✅ Deploy and monitor
+
+**Success Criteria:**
+- No duplicate image concepts in same article
+- Prompts are specific and detailed
+- Featured image distinct from in-article images
+- Image placement logically distributed
+- Generated images relevant to content
+
+---
+
+## Prompt Versioning & Testing
+
+### Version Control
+
+**Recommendation:** Store prompt versions in database for A/B testing
+
+**Schema:**
+
+```python
+class AIPromptVersion(models.Model):
+    prompt_type = CharField(choices=PROMPT_TYPE_CHOICES)
+    version = IntegerField()
+    prompt_value = TextField()
+    is_active = BooleanField(default=False)
+    created_at = DateTimeField(auto_now_add=True)
+    performance_metrics = JSONField(default=dict)  # Track success rates
+```
+
+**Process:**
+1. Test new prompt version alongside current
+2. Compare outputs on same inputs
+3. Measure quality metrics (manual + automated)
+4. Gradually roll out if better
+5. Keep old version as fallback
+
+---
+
+### Automated Quality Metrics
+
+**Implement automated checks:**
+
+| Metric | Check | Threshold |
+|--------|-------|-----------|
+| Word Count Accuracy | `abs(actual - target) / target` | < 0.05 (±5%) |
+| HTML Validity | Parse with BeautifulSoup | 100% valid |
+| Keyword Presence | Count keyword mentions | ≥ 3 for primary |
+| Structure Compliance | Check H2/H3 hierarchy | Valid structure |
+| Cluster Count | Number of clusters | 5-15 |
+| Cluster Size | Keywords per cluster | 3-10 |
+| No Duplicates | Keyword appears once | 100% unique |
+
+**Log results:**
+- Track per prompt version
+- Identify patterns in failures
+- Use for prompt iteration
+
+---
+
+## Model Selection & Optimization
+
+### Current Models
+
+**Location:** `backend/igny8_core/ai/settings.py`
+
+**Default Models per Function:**
+- Clustering: GPT-4 (expensive but accurate)
+- Ideas: GPT-4 (creative)
+- Content: GPT-4 (quality)
+- Image Prompts: GPT-3.5-turbo (simpler task)
+- Images: DALL-E 3 / Runware
+
+### Optimization Opportunities
+
+**Cost vs Quality Tradeoffs:**
+
+| Function | Current | Alternative | Cost Savings | Quality Impact |
+|----------|---------|-------------|--------------|----------------|
+| Clustering | GPT-4 | GPT-4-turbo | 50% | Minimal |
+| Ideas | GPT-4 | GPT-4-turbo | 50% | Minimal |
+| Content | GPT-4 | GPT-4-turbo | 50% | Test required |
+| Image Prompts | GPT-3.5 | Keep | - | - |
+
+**Recommendation:** Test GPT-4-turbo for all text generation tasks
+- Faster response time
+- 50% cost reduction
+- Similar quality for structured outputs
+
+---
+
+## Success Metrics
+
+- ✅ Word count accuracy: 95%+ within ±5%
+- ✅ Clustering quality: No single-keyword clusters
+- ✅ Idea generation: Clear hub vs supporting distinction
+- ✅ HTML validity: 100%
+- ✅ Keyword integration: Natural, not stuffed
+- ✅ Image prompt diversity: No duplicates
+- ✅ User satisfaction: Fewer manual edits needed
+- ✅ Processing time: <10s for 1000-word article
+- ✅ Credit cost: 30% reduction with model optimization
+
+---
+
+## Related Files Reference
+
+### Backend
+- `backend/igny8_core/ai/prompts.py` - Prompt registry and defaults
+- `backend/igny8_core/ai/functions/auto_cluster.py` - Clustering function
+- `backend/igny8_core/ai/functions/generate_ideas.py` - Ideas function
+- `backend/igny8_core/ai/functions/generate_content.py` - Content function
+- `backend/igny8_core/ai/functions/generate_image_prompts.py` - Image prompts
+- `backend/igny8_core/ai/settings.py` - Model configuration
+- `backend/igny8_core/modules/system/models.py` - AIPrompt model
+
+### Testing
+- Create test suite: `backend/igny8_core/ai/tests/test_prompts.py`
+- Test fixtures with sample inputs
+- Automated quality validation
+- Performance benchmarks
+
+---
+
+## Notes
+
+- All prompt changes should be tested on real data first
+- Keep old prompts in version history for rollback
+- Monitor user feedback on content quality
+- Consider user-customizable prompt templates (advanced feature)
+- Document prompt engineering best practices for team
+- SAG clustering prompt (mentioned in original doc) to be handled separately as specialized architecture