# Item 3: Prompt Improvement and Model Optimization

**Priority:** High  
**Target:** Production Launch  
**Last Updated:** December 11, 2025

---

## Overview

Redesign and optimize all AI prompts for clustering, idea generation, content generation, and image prompt extraction to achieve:
- Extreme accuracy and consistent outputs
- Faster processing with optimized token usage
- Correct word count adherence (500, 1000, 1500 words)
- Improved clustering quality and idea relevance
- Better image prompt clarity and relevance

---

## Current Prompt System Architecture

### Prompt Registry

**Location:** `backend/igny8_core/ai/prompts.py`

**Class:** `PromptRegistry`

**Hierarchy** (resolution order):
1. Task-level `prompt_override` (if exists on specific task)
2. Database prompt from `AIPrompt` model (account-specific)
3. Default fallback from `PromptRegistry.DEFAULT_PROMPTS`

**Storage:**
- Default prompts: Hardcoded in `prompts.py`
- Account overrides: `system_aiprompt` database table
- Task overrides: `prompt_override` field on task object

---

## Current Prompts Analysis

### 1. Clustering Prompt

**Function:** `auto_cluster`  
**File:** `backend/igny8_core/ai/functions/auto_cluster.py`  
**Prompt Key:** `'clustering'`

#### Current Prompt Structure

**Approach:** Semantic strategist + intent-driven clustering

**Key Instructions:**
- Return single JSON with "clusters" array
- Each cluster: name, description, keywords[]
- Multi-dimensional grouping (intent, use-case, function, persona, context)
- Model real search behavior and user journeys
- Avoid superficial groupings and duplicates
- 3-10 keywords per cluster

**Strengths:**
✅ Clear JSON output format  
✅ Detailed grouping logic with dimensions  
✅ Emphasis on semantic strength over keyword matching  
✅ User journey modeling (Problem → Solution, General → Specific)  

**Issues:**
❌ Very long prompt (~400+ tokens) - may confuse model  
❌ No examples provided - model must guess formatting  
❌ Doesn't specify what to do with outliers explicitly  
❌ No guidance on cluster count (outputs variable)  
❌ Description length not constrained  

**Real-World Performance Issues:**
- Sometimes creates too many small clusters (1-2 keywords each)
- Inconsistent cluster naming convention
- Descriptions sometimes generic ("Keywords related to...")

---

### 2. Idea Generation Prompt

**Function:** `generate_ideas`  
**File:** `backend/igny8_core/ai/functions/generate_ideas.py`  
**Prompt Key:** `'ideas'`

#### Current Prompt Structure

**Approach:** SEO-optimized content ideas + outlines

**Key Instructions:**
- Input: Clusters + Keywords
- Output: JSON "ideas" array
- 1 cluster_hub + 2-4 supporting ideas per cluster
- Fields: title, description, content_type, content_structure, cluster_id, estimated_word_count, covered_keywords
- Outline format: intro (hook + 2 paragraphs), 5-8 H2 sections with 2-3 H3s each
- Content mixing: paragraphs, lists, tables, blockquotes
- No bullets/lists at start
- Professional tone, no generic phrasing

**Strengths:**
✅ Detailed outline structure  
✅ Content mixing guidance (lists, tables, blockquotes)  
✅ Clear JSON format  
✅ Tone guidelines  

**Issues:**
❌ Very complex prompt (600+ tokens)  
❌ Outline format too prescriptive (might limit creativity)  
❌ No examples provided  
❌ Estimated word count often inaccurate (too high or too low)  
❌ "hook" guidance unclear (what makes a good hook?)  
❌ Content structure validation not enforced  

**Real-World Performance Issues:**
- Generated ideas sometimes too similar within cluster
- Outlines don't always respect structure types (e.g., "review" vs "guide")
- covered_keywords field sometimes empty or incorrect
- cluster_hub vs supporting ideas distinction unclear

---

### 3. Content Generation Prompt

**Function:** `generate_content`  
**File:** `backend/igny8_core/ai/functions/generate_content.py`  
**Prompt Key:** `'content_generation'`

#### Current Prompt Structure

**Approach:** Editorial content strategist

**Key Instructions:**
- Output: JSON {title, content (HTML)}
- Introduction: 1 italic hook (30-40 words) + 2 paragraphs (50-60 words each), no headings
- H2 sections: 5-8 total, 250-300 words each
- Section format: 2 narrative paragraphs → list/table → optional closing paragraph → 2-3 subsections
- Vary list/table types
- Never start section with list/table
- Tone: professional, no passive voice, no generic intros
- Keyword usage: natural in title, intro, headings

**Strengths:**
✅ Detailed structure guidance  
✅ Strong tone/style rules  
✅ HTML output format  
✅ Keyword integration guidance  

**Issues:**
❌ **Word count not mentioned in prompt** - critical flaw  
❌ No guidance on 500 vs 1000 vs 1500 word versions  
❌ Hook word count (30-40) + paragraph counts (50-60 × 2) don't scale proportionally  
❌ Section word count (250-300) doesn't adapt to total target  
❌ No example output  
❌ Content structure (article vs guide vs review) not clearly differentiated  
❌ Table column guidance missing (what columns? how many?)  

**Real-World Performance Issues:**
- **Output length wildly inconsistent** (generates 800 words when asked for 1500)
- Introductions sometimes have headings despite instructions
- Lists appear at start of sections
- Table structure unclear (random columns)
- Doesn't adapt content density to word count

---

### 4. Image Prompt Extraction

**Function:** `generate_image_prompts`  
**File:** `backend/igny8_core/ai/functions/generate_image_prompts.py`  
**Prompt Key:** `'image_prompt_extraction'`

#### Current Prompt Structure

**Approach:** Extract visual descriptions from article

**Key Instructions:**
- Input: article title + content
- Output: JSON {featured_prompt, in_article_prompts[]}
- Extract featured image (main topic)
- Extract up to {max_images} in-article images
- Each prompt detailed for image generation (visual elements, style, mood, composition)

**Strengths:**
✅ Clear structure  
✅ Separates featured vs in-article  
✅ Emphasizes detail in descriptions  

**Issues:**
❌ No guidance on what makes a good image prompt  
❌ No style/mood specifications  
❌ Doesn't specify where in article to place images  
❌ No examples  
❌ "Detailed enough" is subjective  

**Real-World Performance Issues:**
- Prompts sometimes too generic ("Image of a person using a laptop")
- No context from article content (extracts irrelevant visuals)
- Featured image prompt sometimes identical to in-article prompt
- No guidance on image diversity (all similar)

---

### 5. Image Generation Template

**Prompt Key:** `'image_prompt_template'`

#### Current Template

**Approach:** Template-based prompt assembly

**Format:**
```
Create a high-quality {image_type} image... "{post_title}"... {image_prompt}...
Focus on realistic, well-composed scene... lifestyle/editorial web content...
Avoid text, watermarks, logos... **not blurry.**
```

**Issues:**
❌ {image_type} not always populated  
❌ "high-quality" and "not blurry" redundant/unclear  
❌ No style guidance (photographic, illustration, 3D, etc.)  
❌ No aspect ratio specification  

---

## Required Improvements

### A. Clustering Prompt Redesign

#### Goals
- Reduce prompt length by 30-40%
- Add 2-3 concrete examples
- Enforce consistent cluster count (5-15 clusters ideal)
- Standardize cluster naming (title case, descriptive)
- Limit description to 20-30 words

#### Proposed Structure

**Section 1: Role & Task** (50 tokens)
- Clear, concise role definition
- Task: group keywords into intent-driven clusters

**Section 2: Output Format with Example** (100 tokens)
- JSON structure
- Show 1 complete example cluster
- Specify exact field requirements

**Section 3: Clustering Rules** (150 tokens)
- List 5-7 key rules (bullet format)
- Keyword-first approach
- Intent dimensions (brief)
- Quality thresholds (3-10 keywords per cluster)
- No duplicates

**Section 4: Quality Checklist** (50 tokens)
- Checklist of 4-5 validation points
- Model self-validates before output

**Total:** ~350 tokens (vs current ~420)

#### Example Output Format to Include

```json
{
  "clusters": [
    {
      "name": "Organic Bedding Benefits",
      "description": "Health, eco-friendly, and comfort aspects of organic cotton bedding materials",
      "keywords": ["organic sheets", "eco-friendly bedding", "chemical-free cotton", "hypoallergenic sheets", "sustainable bedding"]
    }
  ]
}
```

---

### B. Idea Generation Prompt Redesign

#### Goals
- Simplify outline structure (less prescriptive)
- Add examples of cluster_hub vs supporting ideas
- Better covered_keywords extraction
- Adaptive word count estimation
- Content structure differentiation

#### Proposed Structure

**Section 1: Role & Objective** (40 tokens)
- SEO content strategist
- Task: generate content ideas from clusters

**Section 2: Output Format with Examples** (150 tokens)
- Show 1 cluster_hub example
- Show 1 supporting idea example
- Highlight key differences

**Section 3: Idea Generation Rules** (100 tokens)
- 1 cluster_hub (comprehensive, authoritative)
- 2-4 supporting ideas (specific angles)
- Word count: 1500-2200 for hubs, 1000-1500 for supporting
- covered_keywords: extract from cluster keywords

**Section 4: Outline Guidance** (100 tokens)
- Simplified: Intro + 5-8 sections + Conclusion
- Section types by content_structure:
  - article: narrative + data
  - guide: step-by-step + tips
  - review: pros/cons + comparison
  - listicle: numbered + categories
  - comparison: side-by-side + verdict

**Total:** ~390 tokens (vs current ~610)

---

### C. Content Generation Prompt Redesign

**Most Critical Improvement:** Word Count Adherence

#### Goals
- **Primary:** Generate exact word count (±5% tolerance)
- Scale structure proportionally to word count
- Differentiate content structures clearly
- Improve HTML quality and consistency
- Better keyword integration

#### Proposed Adaptive Word Count System

**Word Count Targets:**
- 500 words: Short-form (5 sections × 80 words + intro/outro 60 words)
- 1000 words: Standard (6 sections × 140 words + intro/outro 120 words)
- 1500 words: Long-form (7 sections × 180 words + intro/outro 180 words)

**Prompt Variable Replacement:**

Before sending to AI, calculate:
- `{TARGET_WORD_COUNT}` - from task.word_count
- `{INTRO_WORDS}` - 60 / 120 / 180 based on target
- `{SECTION_COUNT}` - 5 / 6 / 7 based on target
- `{SECTION_WORDS}` - 80 / 140 / 180 based on target
- `{HOOK_WORDS}` - 25 / 35 / 45 based on target

#### Proposed Structure

**Section 1: Role & Objective** (30 tokens)
```
You are an editorial content writer. Generate a {TARGET_WORD_COUNT}-word article...
```

**Section 2: Word Count Requirements** (80 tokens)
```
CRITICAL: The content must be exactly {TARGET_WORD_COUNT} words (±5% tolerance).

Structure breakdown:
- Introduction: {INTRO_WORDS} words total
  - Hook (italic): {HOOK_WORDS} words
  - Paragraphs: 2 × ~{INTRO_WORDS/2} words each
- Main Sections: {SECTION_COUNT} H2 sections
  - Each section: {SECTION_WORDS} words
- Conclusion: 60 words

Word count validation: Count words in final output and adjust if needed.
```

**Section 3: Content Flow & HTML** (120 tokens)
- Detailed structure per section
- HTML tag usage (<p>, <h2>, <h3>, <ul>, <ol>, <table>)
- Formatting rules

**Section 4: Style & Quality** (80 tokens)
- Tone guidance
- Keyword usage
- Avoid generic phrases
- Examples of good vs bad openings

**Section 5: Content Structure Types** (90 tokens)
- article: {structure description}
- guide: {structure description}
- review: {structure description}
- comparison: {structure description}
- listicle: {structure description}
- cluster_hub: {structure description}

**Section 6: Output Format with Example** (100 tokens)
- JSON structure
- Show abbreviated example with proper HTML

**Total:** ~500 tokens (vs current ~550, but much more precise)

---

### D. Image Prompt Improvements

#### Goals
- Generate visually diverse prompts
- Better context from article content
- Specify image placement guidelines
- Improve prompt detail and clarity

#### Proposed Extraction Prompt Structure

**Section 1: Task & Context** (50 tokens)
```
Extract image prompts from this article for visual content placement.

Article: {title}
Content: {content}
Required: 1 featured + {max_images} in-article images
```

**Section 2: Image Types & Guidelines** (100 tokens)
```
Featured Image:
- Hero visual representing article's main theme
- Broad, engaging, high-quality
- Should work at large sizes (1200×630+)

In-Article Images (place strategically):
1. After introduction
2. Mid-article (before major H2 sections)
3. Supporting specific concepts or examples
4. Before conclusion

Each prompt must describe:
- Subject & composition
- Visual style (photographic, minimal, editorial)
- Mood & lighting
- Color palette suggestions
- Avoid: text, logos, faces (unless relevant)
```

**Section 3: Prompt Quality Rules** (80 tokens)
- Be specific and descriptive (not generic)
- Include scene details, angles, perspective
- Specify lighting, time of day if relevant
- Mention style references
- Ensure diversity across all images
- No duplicate concepts

**Section 4: Output Format** (50 tokens)
- JSON structure
- Show example with good vs bad prompts

#### Proposed Template Prompt Improvement

Replace current template with:

```
A {style} photograph for "{post_title}". {image_prompt}. 
Composition: {composition_hint}. Lighting: {lighting_hint}. 
Mood: {mood}. Style: clean, modern, editorial web content. 
No text, watermarks, or logos.
```

Where:
- {style} - photographic, minimalist, lifestyle, etc.
- {composition_hint} - center-framed, rule-of-thirds, wide-angle, etc.
- {lighting_hint} - natural daylight, soft indoor, dramatic, etc.
- {mood} - professional, warm, energetic, calm, etc.

---

## Implementation Plan

### Phase 1: Clustering Prompt (Week 1)

**Tasks:**
1. ✅ Draft new clustering prompt with examples
2. ✅ Test with sample keyword sets (20, 50, 100 keywords)
3. ✅ Compare outputs: old vs new
4. ✅ Validate cluster quality (manual review)
5. ✅ Update `PromptRegistry.DEFAULT_PROMPTS['clustering']`
6. ✅ Deploy and monitor

**Success Criteria:**
- Consistent cluster count (5-15)
- No single-keyword clusters
- Clear, descriptive names
- Concise descriptions (20-30 words)
- 95%+ of keywords clustered

---

### Phase 2: Idea Generation Prompt (Week 1-2)

**Tasks:**
1. ✅ Draft new ideas prompt with examples
2. ✅ Test with 5-10 clusters
3. ✅ Validate cluster_hub vs supporting idea distinction
4. ✅ Check covered_keywords accuracy
5. ✅ Verify content_structure alignment
6. ✅ Update `PromptRegistry.DEFAULT_PROMPTS['ideas']`
7. ✅ Deploy and monitor

**Success Criteria:**
- Clear distinction between hub and supporting ideas
- Accurate covered_keywords extraction
- Appropriate word count estimates
- Outlines match content_structure type
- No duplicate ideas within cluster

---

### Phase 3: Content Generation Prompt (Week 2)

**Tasks:**
1. ✅ Draft new content prompt with word count logic
2. ✅ Implement dynamic variable replacement in `build_prompt()`
3. ✅ Test with 500, 1000, 1500 word targets
4. ✅ Validate actual word counts (automated counting)
5. ✅ Test all content_structure types
6. ✅ Verify HTML quality and consistency
7. ✅ Update `PromptRegistry.DEFAULT_PROMPTS['content_generation']`
8. ✅ Deploy and monitor

**Code Change Required:**

**File:** `backend/igny8_core/ai/functions/generate_content.py`

**Method:** `build_prompt()`

**Add word count calculation:**

```python
def build_prompt(self, data: Any, account=None) -> str:
    task = data if not isinstance(data, list) else data[0]
    
    # Calculate adaptive word count parameters
    target_words = task.word_count or 1000
    
    if target_words <= 600:
        intro_words = 60
        section_count = 5
        section_words = 80
        hook_words = 25
    elif target_words <= 1200:
        intro_words = 120
        section_count = 6
        section_words = 140
        hook_words = 35
    else:
        intro_words = 180
        section_count = 7
        section_words = 180
        hook_words = 45
    
    # Get prompt and replace variables
    prompt = PromptRegistry.get_prompt(
        function_name='generate_content',
        account=account,
        task=task,
        context={
            'TARGET_WORD_COUNT': target_words,
            'INTRO_WORDS': intro_words,
            'SECTION_COUNT': section_count,
            'SECTION_WORDS': section_words,
            'HOOK_WORDS': hook_words,
            # ... existing context
        }
    )
    
    return prompt
```

**Success Criteria:**
- 95%+ of generated content within ±5% of target word count
- HTML structure consistent
- Content structure types clearly differentiated
- Keyword integration natural
- No sections starting with lists

---

### Phase 4: Image Prompt Improvements (Week 2-3)

**Tasks:**
1. ✅ Draft new extraction prompt with placement guidelines
2. ✅ Draft new template prompt with style variables
3. ✅ Test with 10 sample articles
4. ✅ Validate image diversity and relevance
5. ✅ Update both prompts in registry
6. ✅ Update `GenerateImagePromptsFunction` to use new template
7. ✅ Deploy and monitor

**Success Criteria:**
- No duplicate image concepts in same article
- Prompts are specific and detailed
- Featured image distinct from in-article images
- Image placement logically distributed
- Generated images relevant to content

---

## Prompt Versioning & Testing

### Version Control

**Recommendation:** Store prompt versions in database for A/B testing

**Schema:**

```python
class AIPromptVersion(models.Model):
    prompt_type = CharField(choices=PROMPT_TYPE_CHOICES)
    version = IntegerField()
    prompt_value = TextField()
    is_active = BooleanField(default=False)
    created_at = DateTimeField(auto_now_add=True)
    performance_metrics = JSONField(default=dict)  # Track success rates
```

**Process:**
1. Test new prompt version alongside current
2. Compare outputs on same inputs
3. Measure quality metrics (manual + automated)
4. Gradually roll out if better
5. Keep old version as fallback

---

### Automated Quality Metrics

**Implement automated checks:**

| Metric | Check | Threshold |
|--------|-------|-----------|
| Word Count Accuracy | `abs(actual - target) / target` | < 0.05 (±5%) |
| HTML Validity | Parse with BeautifulSoup | 100% valid |
| Keyword Presence | Count keyword mentions | ≥ 3 for primary |
| Structure Compliance | Check H2/H3 hierarchy | Valid structure |
| Cluster Count | Number of clusters | 5-15 |
| Cluster Size | Keywords per cluster | 3-10 |
| No Duplicates | Keyword appears once | 100% unique |

**Log results:**
- Track per prompt version
- Identify patterns in failures
- Use for prompt iteration

---

## Model Selection & Optimization

### Current Models

**Location:** `backend/igny8_core/ai/settings.py`

**Default Models per Function:**
- Clustering: GPT-4 (expensive but accurate)
- Ideas: GPT-4 (creative)
- Content: GPT-4 (quality)
- Image Prompts: GPT-3.5-turbo (simpler task)
- Images: DALL-E 3 / Runware

### Optimization Opportunities

**Cost vs Quality Tradeoffs:**

| Function | Current | Alternative | Cost Savings | Quality Impact |
|----------|---------|-------------|--------------|----------------|
| Clustering | GPT-4 | GPT-4-turbo | 50% | Minimal |
| Ideas | GPT-4 | GPT-4-turbo | 50% | Minimal |
| Content | GPT-4 | GPT-4-turbo | 50% | Test required |
| Image Prompts | GPT-3.5 | Keep | - | - |

**Recommendation:** Test GPT-4-turbo for all text generation tasks
- Faster response time
- 50% cost reduction
- Similar quality for structured outputs

---

## Success Metrics

- ✅ Word count accuracy: 95%+ within ±5%
- ✅ Clustering quality: No single-keyword clusters
- ✅ Idea generation: Clear hub vs supporting distinction
- ✅ HTML validity: 100%
- ✅ Keyword integration: Natural, not stuffed
- ✅ Image prompt diversity: No duplicates
- ✅ User satisfaction: Fewer manual edits needed
- ✅ Processing time: <10s for 1000-word article
- ✅ Credit cost: 30% reduction with model optimization

---

## Related Files Reference

### Backend
- `backend/igny8_core/ai/prompts.py` - Prompt registry and defaults
- `backend/igny8_core/ai/functions/auto_cluster.py` - Clustering function
- `backend/igny8_core/ai/functions/generate_ideas.py` - Ideas function
- `backend/igny8_core/ai/functions/generate_content.py` - Content function
- `backend/igny8_core/ai/functions/generate_image_prompts.py` - Image prompts
- `backend/igny8_core/ai/settings.py` - Model configuration
- `backend/igny8_core/modules/system/models.py` - AIPrompt model

### Testing
- Create test suite: `backend/igny8_core/ai/tests/test_prompts.py`
- Test fixtures with sample inputs
- Automated quality validation
- Performance benchmarks

---

## Notes

- All prompt changes should be tested on real data first
- Keep old prompts in version history for rollback
- Monitor user feedback on content quality
- Consider user-customizable prompt templates (advanced feature)
- Document prompt engineering best practices for team
- SAG clustering prompt (mentioned in original doc) to be handled separately as specialized architecture