pre-launch-final mods-docs

This commit is contained in:
IGNY8 VPS (Salman)
2025-12-11 07:20:21 +00:00
parent 20fdd3b295
commit a736bc3d34
8 changed files with 5464 additions and 0 deletions

View File

@@ -0,0 +1,713 @@
# Item 3: Prompt Improvement and Model Optimization
**Priority:** High
**Target:** Production Launch
**Last Updated:** December 11, 2025
---
## Overview
Redesign and optimize all AI prompts for clustering, idea generation, content generation, and image prompt extraction to achieve:
- Extreme accuracy and consistent outputs
- Faster processing with optimized token usage
- Correct word count adherence (500, 1000, 1500 words)
- Improved clustering quality and idea relevance
- Better image prompt clarity and relevance
---
## Current Prompt System Architecture
### Prompt Registry
**Location:** `backend/igny8_core/ai/prompts.py`
**Class:** `PromptRegistry`
**Hierarchy** (resolution order):
1. Task-level `prompt_override` (if exists on specific task)
2. Database prompt from `AIPrompt` model (account-specific)
3. Default fallback from `PromptRegistry.DEFAULT_PROMPTS`
**Storage:**
- Default prompts: Hardcoded in `prompts.py`
- Account overrides: `system_aiprompt` database table
- Task overrides: `prompt_override` field on task object
---
## Current Prompts Analysis
### 1. Clustering Prompt
**Function:** `auto_cluster`
**File:** `backend/igny8_core/ai/functions/auto_cluster.py`
**Prompt Key:** `'clustering'`
#### Current Prompt Structure
**Approach:** Semantic strategist + intent-driven clustering
**Key Instructions:**
- Return single JSON with "clusters" array
- Each cluster: name, description, keywords[]
- Multi-dimensional grouping (intent, use-case, function, persona, context)
- Model real search behavior and user journeys
- Avoid superficial groupings and duplicates
- 3-10 keywords per cluster
**Strengths:**
✅ Clear JSON output format
✅ Detailed grouping logic with dimensions
✅ Emphasis on semantic strength over keyword matching
✅ User journey modeling (Problem → Solution, General → Specific)
**Issues:**
❌ Very long prompt (~400+ tokens) - may confuse model
❌ No examples provided - model must guess formatting
❌ Doesn't specify what to do with outliers explicitly
❌ No guidance on cluster count (outputs variable)
❌ Description length not constrained
**Real-World Performance Issues:**
- Sometimes creates too many small clusters (1-2 keywords each)
- Inconsistent cluster naming convention
- Descriptions sometimes generic ("Keywords related to...")
---
### 2. Idea Generation Prompt
**Function:** `generate_ideas`
**File:** `backend/igny8_core/ai/functions/generate_ideas.py`
**Prompt Key:** `'ideas'`
#### Current Prompt Structure
**Approach:** SEO-optimized content ideas + outlines
**Key Instructions:**
- Input: Clusters + Keywords
- Output: JSON "ideas" array
- 1 cluster_hub + 2-4 supporting ideas per cluster
- Fields: title, description, content_type, content_structure, cluster_id, estimated_word_count, covered_keywords
- Outline format: intro (hook + 2 paragraphs), 5-8 H2 sections with 2-3 H3s each
- Content mixing: paragraphs, lists, tables, blockquotes
- No bullets/lists at start
- Professional tone, no generic phrasing
**Strengths:**
✅ Detailed outline structure
✅ Content mixing guidance (lists, tables, blockquotes)
✅ Clear JSON format
✅ Tone guidelines
**Issues:**
❌ Very complex prompt (600+ tokens)
❌ Outline format too prescriptive (might limit creativity)
❌ No examples provided
❌ Estimated word count often inaccurate (too high or too low)
❌ "hook" guidance unclear (what makes a good hook?)
❌ Content structure validation not enforced
**Real-World Performance Issues:**
- Generated ideas sometimes too similar within cluster
- Outlines don't always respect structure types (e.g., "review" vs "guide")
- covered_keywords field sometimes empty or incorrect
- cluster_hub vs supporting ideas distinction unclear
---
### 3. Content Generation Prompt
**Function:** `generate_content`
**File:** `backend/igny8_core/ai/functions/generate_content.py`
**Prompt Key:** `'content_generation'`
#### Current Prompt Structure
**Approach:** Editorial content strategist
**Key Instructions:**
- Output: JSON {title, content (HTML)}
- Introduction: 1 italic hook (30-40 words) + 2 paragraphs (50-60 words each), no headings
- H2 sections: 5-8 total, 250-300 words each
- Section format: 2 narrative paragraphs → list/table → optional closing paragraph → 2-3 subsections
- Vary list/table types
- Never start section with list/table
- Tone: professional, no passive voice, no generic intros
- Keyword usage: natural in title, intro, headings
**Strengths:**
✅ Detailed structure guidance
✅ Strong tone/style rules
✅ HTML output format
✅ Keyword integration guidance
**Issues:**
**Word count not mentioned in prompt** - critical flaw
❌ No guidance on 500 vs 1000 vs 1500 word versions
❌ Hook word count (30-40) + paragraph counts (50-60 × 2) don't scale proportionally
❌ Section word count (250-300) doesn't adapt to total target
❌ No example output
❌ Content structure (article vs guide vs review) not clearly differentiated
❌ Table column guidance missing (what columns? how many?)
**Real-World Performance Issues:**
- **Output length wildly inconsistent** (generates 800 words when asked for 1500)
- Introductions sometimes have headings despite instructions
- Lists appear at start of sections
- Table structure unclear (random columns)
- Doesn't adapt content density to word count
---
### 4. Image Prompt Extraction
**Function:** `generate_image_prompts`
**File:** `backend/igny8_core/ai/functions/generate_image_prompts.py`
**Prompt Key:** `'image_prompt_extraction'`
#### Current Prompt Structure
**Approach:** Extract visual descriptions from article
**Key Instructions:**
- Input: article title + content
- Output: JSON {featured_prompt, in_article_prompts[]}
- Extract featured image (main topic)
- Extract up to {max_images} in-article images
- Each prompt detailed for image generation (visual elements, style, mood, composition)
**Strengths:**
✅ Clear structure
✅ Separates featured vs in-article
✅ Emphasizes detail in descriptions
**Issues:**
❌ No guidance on what makes a good image prompt
❌ No style/mood specifications
❌ Doesn't specify where in article to place images
❌ No examples
❌ "Detailed enough" is subjective
**Real-World Performance Issues:**
- Prompts sometimes too generic ("Image of a person using a laptop")
- No context from article content (extracts irrelevant visuals)
- Featured image prompt sometimes identical to in-article prompt
- No guidance on image diversity (all similar)
---
### 5. Image Generation Template
**Prompt Key:** `'image_prompt_template'`
#### Current Template
**Approach:** Template-based prompt assembly
**Format:**
```
Create a high-quality {image_type} image... "{post_title}"... {image_prompt}...
Focus on realistic, well-composed scene... lifestyle/editorial web content...
Avoid text, watermarks, logos... **not blurry.**
```
**Issues:**
❌ {image_type} not always populated
❌ "high-quality" and "not blurry" redundant/unclear
❌ No style guidance (photographic, illustration, 3D, etc.)
❌ No aspect ratio specification
---
## Required Improvements
### A. Clustering Prompt Redesign
#### Goals
- Reduce prompt length by 30-40%
- Add 2-3 concrete examples
- Enforce consistent cluster count (5-15 clusters ideal)
- Standardize cluster naming (title case, descriptive)
- Limit description to 20-30 words
#### Proposed Structure
**Section 1: Role & Task** (50 tokens)
- Clear, concise role definition
- Task: group keywords into intent-driven clusters
**Section 2: Output Format with Example** (100 tokens)
- JSON structure
- Show 1 complete example cluster
- Specify exact field requirements
**Section 3: Clustering Rules** (150 tokens)
- List 5-7 key rules (bullet format)
- Keyword-first approach
- Intent dimensions (brief)
- Quality thresholds (3-10 keywords per cluster)
- No duplicates
**Section 4: Quality Checklist** (50 tokens)
- Checklist of 4-5 validation points
- Model self-validates before output
**Total:** ~350 tokens (vs current ~420)
#### Example Output Format to Include
```json
{
"clusters": [
{
"name": "Organic Bedding Benefits",
"description": "Health, eco-friendly, and comfort aspects of organic cotton bedding materials",
"keywords": ["organic sheets", "eco-friendly bedding", "chemical-free cotton", "hypoallergenic sheets", "sustainable bedding"]
}
]
}
```
---
### B. Idea Generation Prompt Redesign
#### Goals
- Simplify outline structure (less prescriptive)
- Add examples of cluster_hub vs supporting ideas
- Better covered_keywords extraction
- Adaptive word count estimation
- Content structure differentiation
#### Proposed Structure
**Section 1: Role & Objective** (40 tokens)
- SEO content strategist
- Task: generate content ideas from clusters
**Section 2: Output Format with Examples** (150 tokens)
- Show 1 cluster_hub example
- Show 1 supporting idea example
- Highlight key differences
**Section 3: Idea Generation Rules** (100 tokens)
- 1 cluster_hub (comprehensive, authoritative)
- 2-4 supporting ideas (specific angles)
- Word count: 1500-2200 for hubs, 1000-1500 for supporting
- covered_keywords: extract from cluster keywords
**Section 4: Outline Guidance** (100 tokens)
- Simplified: Intro + 5-8 sections + Conclusion
- Section types by content_structure:
- article: narrative + data
- guide: step-by-step + tips
- review: pros/cons + comparison
- listicle: numbered + categories
- comparison: side-by-side + verdict
**Total:** ~390 tokens (vs current ~610)
---
### C. Content Generation Prompt Redesign
**Most Critical Improvement:** Word Count Adherence
#### Goals
- **Primary:** Generate exact word count (±5% tolerance)
- Scale structure proportionally to word count
- Differentiate content structures clearly
- Improve HTML quality and consistency
- Better keyword integration
#### Proposed Adaptive Word Count System
**Word Count Targets:**
- 500 words: Short-form (5 sections × 80 words + intro/outro 60 words)
- 1000 words: Standard (6 sections × 140 words + intro/outro 120 words)
- 1500 words: Long-form (7 sections × 180 words + intro/outro 180 words)
**Prompt Variable Replacement:**
Before sending to AI, calculate:
- `{TARGET_WORD_COUNT}` - from task.word_count
- `{INTRO_WORDS}` - 60 / 120 / 180 based on target
- `{SECTION_COUNT}` - 5 / 6 / 7 based on target
- `{SECTION_WORDS}` - 80 / 140 / 180 based on target
- `{HOOK_WORDS}` - 25 / 35 / 45 based on target
#### Proposed Structure
**Section 1: Role & Objective** (30 tokens)
```
You are an editorial content writer. Generate a {TARGET_WORD_COUNT}-word article...
```
**Section 2: Word Count Requirements** (80 tokens)
```
CRITICAL: The content must be exactly {TARGET_WORD_COUNT} words (±5% tolerance).
Structure breakdown:
- Introduction: {INTRO_WORDS} words total
- Hook (italic): {HOOK_WORDS} words
- Paragraphs: 2 × ~{INTRO_WORDS/2} words each
- Main Sections: {SECTION_COUNT} H2 sections
- Each section: {SECTION_WORDS} words
- Conclusion: 60 words
Word count validation: Count words in final output and adjust if needed.
```
**Section 3: Content Flow & HTML** (120 tokens)
- Detailed structure per section
- HTML tag usage (<p>, <h2>, <h3>, <ul>, <ol>, <table>)
- Formatting rules
**Section 4: Style & Quality** (80 tokens)
- Tone guidance
- Keyword usage
- Avoid generic phrases
- Examples of good vs bad openings
**Section 5: Content Structure Types** (90 tokens)
- article: {structure description}
- guide: {structure description}
- review: {structure description}
- comparison: {structure description}
- listicle: {structure description}
- cluster_hub: {structure description}
**Section 6: Output Format with Example** (100 tokens)
- JSON structure
- Show abbreviated example with proper HTML
**Total:** ~500 tokens (vs current ~550, but much more precise)
---
### D. Image Prompt Improvements
#### Goals
- Generate visually diverse prompts
- Better context from article content
- Specify image placement guidelines
- Improve prompt detail and clarity
#### Proposed Extraction Prompt Structure
**Section 1: Task & Context** (50 tokens)
```
Extract image prompts from this article for visual content placement.
Article: {title}
Content: {content}
Required: 1 featured + {max_images} in-article images
```
**Section 2: Image Types & Guidelines** (100 tokens)
```
Featured Image:
- Hero visual representing article's main theme
- Broad, engaging, high-quality
- Should work at large sizes (1200×630+)
In-Article Images (place strategically):
1. After introduction
2. Mid-article (before major H2 sections)
3. Supporting specific concepts or examples
4. Before conclusion
Each prompt must describe:
- Subject & composition
- Visual style (photographic, minimal, editorial)
- Mood & lighting
- Color palette suggestions
- Avoid: text, logos, faces (unless relevant)
```
**Section 3: Prompt Quality Rules** (80 tokens)
- Be specific and descriptive (not generic)
- Include scene details, angles, perspective
- Specify lighting, time of day if relevant
- Mention style references
- Ensure diversity across all images
- No duplicate concepts
**Section 4: Output Format** (50 tokens)
- JSON structure
- Show example with good vs bad prompts
#### Proposed Template Prompt Improvement
Replace current template with:
```
A {style} photograph for "{post_title}". {image_prompt}.
Composition: {composition_hint}. Lighting: {lighting_hint}.
Mood: {mood}. Style: clean, modern, editorial web content.
No text, watermarks, or logos.
```
Where:
- {style} - photographic, minimalist, lifestyle, etc.
- {composition_hint} - center-framed, rule-of-thirds, wide-angle, etc.
- {lighting_hint} - natural daylight, soft indoor, dramatic, etc.
- {mood} - professional, warm, energetic, calm, etc.
---
## Implementation Plan
### Phase 1: Clustering Prompt (Week 1)
**Tasks:**
1. ✅ Draft new clustering prompt with examples
2. ✅ Test with sample keyword sets (20, 50, 100 keywords)
3. ✅ Compare outputs: old vs new
4. ✅ Validate cluster quality (manual review)
5. ✅ Update `PromptRegistry.DEFAULT_PROMPTS['clustering']`
6. ✅ Deploy and monitor
**Success Criteria:**
- Consistent cluster count (5-15)
- No single-keyword clusters
- Clear, descriptive names
- Concise descriptions (20-30 words)
- 95%+ of keywords clustered
---
### Phase 2: Idea Generation Prompt (Week 1-2)
**Tasks:**
1. ✅ Draft new ideas prompt with examples
2. ✅ Test with 5-10 clusters
3. ✅ Validate cluster_hub vs supporting idea distinction
4. ✅ Check covered_keywords accuracy
5. ✅ Verify content_structure alignment
6. ✅ Update `PromptRegistry.DEFAULT_PROMPTS['ideas']`
7. ✅ Deploy and monitor
**Success Criteria:**
- Clear distinction between hub and supporting ideas
- Accurate covered_keywords extraction
- Appropriate word count estimates
- Outlines match content_structure type
- No duplicate ideas within cluster
---
### Phase 3: Content Generation Prompt (Week 2)
**Tasks:**
1. ✅ Draft new content prompt with word count logic
2. ✅ Implement dynamic variable replacement in `build_prompt()`
3. ✅ Test with 500, 1000, 1500 word targets
4. ✅ Validate actual word counts (automated counting)
5. ✅ Test all content_structure types
6. ✅ Verify HTML quality and consistency
7. ✅ Update `PromptRegistry.DEFAULT_PROMPTS['content_generation']`
8. ✅ Deploy and monitor
**Code Change Required:**
**File:** `backend/igny8_core/ai/functions/generate_content.py`
**Method:** `build_prompt()`
**Add word count calculation:**
```python
def build_prompt(self, data: Any, account=None) -> str:
task = data if not isinstance(data, list) else data[0]
# Calculate adaptive word count parameters
target_words = task.word_count or 1000
if target_words <= 600:
intro_words = 60
section_count = 5
section_words = 80
hook_words = 25
elif target_words <= 1200:
intro_words = 120
section_count = 6
section_words = 140
hook_words = 35
else:
intro_words = 180
section_count = 7
section_words = 180
hook_words = 45
# Get prompt and replace variables
prompt = PromptRegistry.get_prompt(
function_name='generate_content',
account=account,
task=task,
context={
'TARGET_WORD_COUNT': target_words,
'INTRO_WORDS': intro_words,
'SECTION_COUNT': section_count,
'SECTION_WORDS': section_words,
'HOOK_WORDS': hook_words,
# ... existing context
}
)
return prompt
```
**Success Criteria:**
- 95%+ of generated content within ±5% of target word count
- HTML structure consistent
- Content structure types clearly differentiated
- Keyword integration natural
- No sections starting with lists
---
### Phase 4: Image Prompt Improvements (Week 2-3)
**Tasks:**
1. ✅ Draft new extraction prompt with placement guidelines
2. ✅ Draft new template prompt with style variables
3. ✅ Test with 10 sample articles
4. ✅ Validate image diversity and relevance
5. ✅ Update both prompts in registry
6. ✅ Update `GenerateImagePromptsFunction` to use new template
7. ✅ Deploy and monitor
**Success Criteria:**
- No duplicate image concepts in same article
- Prompts are specific and detailed
- Featured image distinct from in-article images
- Image placement logically distributed
- Generated images relevant to content
---
## Prompt Versioning & Testing
### Version Control
**Recommendation:** Store prompt versions in database for A/B testing
**Schema:**
```python
class AIPromptVersion(models.Model):
prompt_type = CharField(choices=PROMPT_TYPE_CHOICES)
version = IntegerField()
prompt_value = TextField()
is_active = BooleanField(default=False)
created_at = DateTimeField(auto_now_add=True)
performance_metrics = JSONField(default=dict) # Track success rates
```
**Process:**
1. Test new prompt version alongside current
2. Compare outputs on same inputs
3. Measure quality metrics (manual + automated)
4. Gradually roll out if better
5. Keep old version as fallback
---
### Automated Quality Metrics
**Implement automated checks:**
| Metric | Check | Threshold |
|--------|-------|-----------|
| Word Count Accuracy | `abs(actual - target) / target` | < 0.05 (±5%) |
| HTML Validity | Parse with BeautifulSoup | 100% valid |
| Keyword Presence | Count keyword mentions | ≥ 3 for primary |
| Structure Compliance | Check H2/H3 hierarchy | Valid structure |
| Cluster Count | Number of clusters | 5-15 |
| Cluster Size | Keywords per cluster | 3-10 |
| No Duplicates | Keyword appears once | 100% unique |
**Log results:**
- Track per prompt version
- Identify patterns in failures
- Use for prompt iteration
---
## Model Selection & Optimization
### Current Models
**Location:** `backend/igny8_core/ai/settings.py`
**Default Models per Function:**
- Clustering: GPT-4 (expensive but accurate)
- Ideas: GPT-4 (creative)
- Content: GPT-4 (quality)
- Image Prompts: GPT-3.5-turbo (simpler task)
- Images: DALL-E 3 / Runware
### Optimization Opportunities
**Cost vs Quality Tradeoffs:**
| Function | Current | Alternative | Cost Savings | Quality Impact |
|----------|---------|-------------|--------------|----------------|
| Clustering | GPT-4 | GPT-4-turbo | 50% | Minimal |
| Ideas | GPT-4 | GPT-4-turbo | 50% | Minimal |
| Content | GPT-4 | GPT-4-turbo | 50% | Test required |
| Image Prompts | GPT-3.5 | Keep | - | - |
**Recommendation:** Test GPT-4-turbo for all text generation tasks
- Faster response time
- 50% cost reduction
- Similar quality for structured outputs
---
## Success Metrics
- ✅ Word count accuracy: 95%+ within ±5%
- ✅ Clustering quality: No single-keyword clusters
- ✅ Idea generation: Clear hub vs supporting distinction
- ✅ HTML validity: 100%
- ✅ Keyword integration: Natural, not stuffed
- ✅ Image prompt diversity: No duplicates
- ✅ User satisfaction: Fewer manual edits needed
- ✅ Processing time: <10s for 1000-word article
- ✅ Credit cost: 30% reduction with model optimization
---
## Related Files Reference
### Backend
- `backend/igny8_core/ai/prompts.py` - Prompt registry and defaults
- `backend/igny8_core/ai/functions/auto_cluster.py` - Clustering function
- `backend/igny8_core/ai/functions/generate_ideas.py` - Ideas function
- `backend/igny8_core/ai/functions/generate_content.py` - Content function
- `backend/igny8_core/ai/functions/generate_image_prompts.py` - Image prompts
- `backend/igny8_core/ai/settings.py` - Model configuration
- `backend/igny8_core/modules/system/models.py` - AIPrompt model
### Testing
- Create test suite: `backend/igny8_core/ai/tests/test_prompts.py`
- Test fixtures with sample inputs
- Automated quality validation
- Performance benchmarks
---
## Notes
- All prompt changes should be tested on real data first
- Keep old prompts in version history for rollback
- Monitor user feedback on content quality
- Consider user-customizable prompt templates (advanced feature)
- Document prompt engineering best practices for team
- SAG clustering prompt (mentioned in original doc) to be handled separately as specialized architecture