Files
igny8/docs/PRE-LAUNCH/ITEM-3-PROMPT-OPTIMIZATION.md
2025-12-11 07:20:21 +00:00

21 KiB
Raw Blame History

Item 3: Prompt Improvement and Model Optimization

Priority: High
Target: Production Launch
Last Updated: December 11, 2025


Overview

Redesign and optimize all AI prompts for clustering, idea generation, content generation, and image prompt extraction to achieve:

  • Extreme accuracy and consistent outputs
  • Faster processing with optimized token usage
  • Correct word count adherence (500, 1000, 1500 words)
  • Improved clustering quality and idea relevance
  • Better image prompt clarity and relevance

Current Prompt System Architecture

Prompt Registry

Location: backend/igny8_core/ai/prompts.py

Class: PromptRegistry

Hierarchy (resolution order):

  1. Task-level prompt_override (if exists on specific task)
  2. Database prompt from AIPrompt model (account-specific)
  3. Default fallback from PromptRegistry.DEFAULT_PROMPTS

Storage:

  • Default prompts: Hardcoded in prompts.py
  • Account overrides: system_aiprompt database table
  • Task overrides: prompt_override field on task object

Current Prompts Analysis

1. Clustering Prompt

Function: auto_cluster
File: backend/igny8_core/ai/functions/auto_cluster.py
Prompt Key: 'clustering'

Current Prompt Structure

Approach: Semantic strategist + intent-driven clustering

Key Instructions:

  • Return single JSON with "clusters" array
  • Each cluster: name, description, keywords[]
  • Multi-dimensional grouping (intent, use-case, function, persona, context)
  • Model real search behavior and user journeys
  • Avoid superficial groupings and duplicates
  • 3-10 keywords per cluster

Strengths: Clear JSON output format
Detailed grouping logic with dimensions
Emphasis on semantic strength over keyword matching
User journey modeling (Problem → Solution, General → Specific)

Issues: Very long prompt (~400+ tokens) - may confuse model
No examples provided - model must guess formatting
Doesn't specify what to do with outliers explicitly
No guidance on cluster count (outputs variable)
Description length not constrained

Real-World Performance Issues:

  • Sometimes creates too many small clusters (1-2 keywords each)
  • Inconsistent cluster naming convention
  • Descriptions sometimes generic ("Keywords related to...")

2. Idea Generation Prompt

Function: generate_ideas
File: backend/igny8_core/ai/functions/generate_ideas.py
Prompt Key: 'ideas'

Current Prompt Structure

Approach: SEO-optimized content ideas + outlines

Key Instructions:

  • Input: Clusters + Keywords
  • Output: JSON "ideas" array
  • 1 cluster_hub + 2-4 supporting ideas per cluster
  • Fields: title, description, content_type, content_structure, cluster_id, estimated_word_count, covered_keywords
  • Outline format: intro (hook + 2 paragraphs), 5-8 H2 sections with 2-3 H3s each
  • Content mixing: paragraphs, lists, tables, blockquotes
  • No bullets/lists at start
  • Professional tone, no generic phrasing

Strengths: Detailed outline structure
Content mixing guidance (lists, tables, blockquotes)
Clear JSON format
Tone guidelines

Issues: Very complex prompt (600+ tokens)
Outline format too prescriptive (might limit creativity)
No examples provided
Estimated word count often inaccurate (too high or too low)
"hook" guidance unclear (what makes a good hook?)
Content structure validation not enforced

Real-World Performance Issues:

  • Generated ideas sometimes too similar within cluster
  • Outlines don't always respect structure types (e.g., "review" vs "guide")
  • covered_keywords field sometimes empty or incorrect
  • cluster_hub vs supporting ideas distinction unclear

3. Content Generation Prompt

Function: generate_content
File: backend/igny8_core/ai/functions/generate_content.py
Prompt Key: 'content_generation'

Current Prompt Structure

Approach: Editorial content strategist

Key Instructions:

  • Output: JSON {title, content (HTML)}
  • Introduction: 1 italic hook (30-40 words) + 2 paragraphs (50-60 words each), no headings
  • H2 sections: 5-8 total, 250-300 words each
  • Section format: 2 narrative paragraphs → list/table → optional closing paragraph → 2-3 subsections
  • Vary list/table types
  • Never start section with list/table
  • Tone: professional, no passive voice, no generic intros
  • Keyword usage: natural in title, intro, headings

Strengths: Detailed structure guidance
Strong tone/style rules
HTML output format
Keyword integration guidance

Issues: Word count not mentioned in prompt - critical flaw
No guidance on 500 vs 1000 vs 1500 word versions
Hook word count (30-40) + paragraph counts (50-60 × 2) don't scale proportionally
Section word count (250-300) doesn't adapt to total target
No example output
Content structure (article vs guide vs review) not clearly differentiated
Table column guidance missing (what columns? how many?)

Real-World Performance Issues:

  • Output length wildly inconsistent (generates 800 words when asked for 1500)
  • Introductions sometimes have headings despite instructions
  • Lists appear at start of sections
  • Table structure unclear (random columns)
  • Doesn't adapt content density to word count

4. Image Prompt Extraction

Function: generate_image_prompts
File: backend/igny8_core/ai/functions/generate_image_prompts.py
Prompt Key: 'image_prompt_extraction'

Current Prompt Structure

Approach: Extract visual descriptions from article

Key Instructions:

  • Input: article title + content
  • Output: JSON {featured_prompt, in_article_prompts[]}
  • Extract featured image (main topic)
  • Extract up to {max_images} in-article images
  • Each prompt detailed for image generation (visual elements, style, mood, composition)

Strengths: Clear structure
Separates featured vs in-article
Emphasizes detail in descriptions

Issues: No guidance on what makes a good image prompt
No style/mood specifications
Doesn't specify where in article to place images
No examples
"Detailed enough" is subjective

Real-World Performance Issues:

  • Prompts sometimes too generic ("Image of a person using a laptop")
  • No context from article content (extracts irrelevant visuals)
  • Featured image prompt sometimes identical to in-article prompt
  • No guidance on image diversity (all similar)

5. Image Generation Template

Prompt Key: 'image_prompt_template'

Current Template

Approach: Template-based prompt assembly

Format:

Create a high-quality {image_type} image... "{post_title}"... {image_prompt}...
Focus on realistic, well-composed scene... lifestyle/editorial web content...
Avoid text, watermarks, logos... **not blurry.**

Issues: {image_type} not always populated
"high-quality" and "not blurry" redundant/unclear
No style guidance (photographic, illustration, 3D, etc.)
No aspect ratio specification


Required Improvements

A. Clustering Prompt Redesign

Goals

  • Reduce prompt length by 30-40%
  • Add 2-3 concrete examples
  • Enforce consistent cluster count (5-15 clusters ideal)
  • Standardize cluster naming (title case, descriptive)
  • Limit description to 20-30 words

Proposed Structure

Section 1: Role & Task (50 tokens)

  • Clear, concise role definition
  • Task: group keywords into intent-driven clusters

Section 2: Output Format with Example (100 tokens)

  • JSON structure
  • Show 1 complete example cluster
  • Specify exact field requirements

Section 3: Clustering Rules (150 tokens)

  • List 5-7 key rules (bullet format)
  • Keyword-first approach
  • Intent dimensions (brief)
  • Quality thresholds (3-10 keywords per cluster)
  • No duplicates

Section 4: Quality Checklist (50 tokens)

  • Checklist of 4-5 validation points
  • Model self-validates before output

Total: ~350 tokens (vs current ~420)

Example Output Format to Include

{
  "clusters": [
    {
      "name": "Organic Bedding Benefits",
      "description": "Health, eco-friendly, and comfort aspects of organic cotton bedding materials",
      "keywords": ["organic sheets", "eco-friendly bedding", "chemical-free cotton", "hypoallergenic sheets", "sustainable bedding"]
    }
  ]
}

B. Idea Generation Prompt Redesign

Goals

  • Simplify outline structure (less prescriptive)
  • Add examples of cluster_hub vs supporting ideas
  • Better covered_keywords extraction
  • Adaptive word count estimation
  • Content structure differentiation

Proposed Structure

Section 1: Role & Objective (40 tokens)

  • SEO content strategist
  • Task: generate content ideas from clusters

Section 2: Output Format with Examples (150 tokens)

  • Show 1 cluster_hub example
  • Show 1 supporting idea example
  • Highlight key differences

Section 3: Idea Generation Rules (100 tokens)

  • 1 cluster_hub (comprehensive, authoritative)
  • 2-4 supporting ideas (specific angles)
  • Word count: 1500-2200 for hubs, 1000-1500 for supporting
  • covered_keywords: extract from cluster keywords

Section 4: Outline Guidance (100 tokens)

  • Simplified: Intro + 5-8 sections + Conclusion
  • Section types by content_structure:
    • article: narrative + data
    • guide: step-by-step + tips
    • review: pros/cons + comparison
    • listicle: numbered + categories
    • comparison: side-by-side + verdict

Total: ~390 tokens (vs current ~610)


C. Content Generation Prompt Redesign

Most Critical Improvement: Word Count Adherence

Goals

  • Primary: Generate exact word count (±5% tolerance)
  • Scale structure proportionally to word count
  • Differentiate content structures clearly
  • Improve HTML quality and consistency
  • Better keyword integration

Proposed Adaptive Word Count System

Word Count Targets:

  • 500 words: Short-form (5 sections × 80 words + intro/outro 60 words)
  • 1000 words: Standard (6 sections × 140 words + intro/outro 120 words)
  • 1500 words: Long-form (7 sections × 180 words + intro/outro 180 words)

Prompt Variable Replacement:

Before sending to AI, calculate:

  • {TARGET_WORD_COUNT} - from task.word_count
  • {INTRO_WORDS} - 60 / 120 / 180 based on target
  • {SECTION_COUNT} - 5 / 6 / 7 based on target
  • {SECTION_WORDS} - 80 / 140 / 180 based on target
  • {HOOK_WORDS} - 25 / 35 / 45 based on target

Proposed Structure

Section 1: Role & Objective (30 tokens)

You are an editorial content writer. Generate a {TARGET_WORD_COUNT}-word article...

Section 2: Word Count Requirements (80 tokens)

CRITICAL: The content must be exactly {TARGET_WORD_COUNT} words (±5% tolerance).

Structure breakdown:
- Introduction: {INTRO_WORDS} words total
  - Hook (italic): {HOOK_WORDS} words
  - Paragraphs: 2 × ~{INTRO_WORDS/2} words each
- Main Sections: {SECTION_COUNT} H2 sections
  - Each section: {SECTION_WORDS} words
- Conclusion: 60 words

Word count validation: Count words in final output and adjust if needed.

Section 3: Content Flow & HTML (120 tokens)

  • Detailed structure per section
  • HTML tag usage (

    ,

    ,

    ,
      ,
        , )
      1. Formatting rules
      2. Section 4: Style & Quality (80 tokens)

        • Tone guidance
        • Keyword usage
        • Avoid generic phrases
        • Examples of good vs bad openings

        Section 5: Content Structure Types (90 tokens)

        • article: {structure description}
        • guide: {structure description}
        • review: {structure description}
        • comparison: {structure description}
        • listicle: {structure description}
        • cluster_hub: {structure description}

        Section 6: Output Format with Example (100 tokens)

        • JSON structure
        • Show abbreviated example with proper HTML

        Total: ~500 tokens (vs current ~550, but much more precise)


        D. Image Prompt Improvements

        Goals

        • Generate visually diverse prompts
        • Better context from article content
        • Specify image placement guidelines
        • Improve prompt detail and clarity

        Proposed Extraction Prompt Structure

        Section 1: Task & Context (50 tokens)

        Extract image prompts from this article for visual content placement.
        
        Article: {title}
        Content: {content}
        Required: 1 featured + {max_images} in-article images
        

        Section 2: Image Types & Guidelines (100 tokens)

        Featured Image:
        - Hero visual representing article's main theme
        - Broad, engaging, high-quality
        - Should work at large sizes (1200×630+)
        
        In-Article Images (place strategically):
        1. After introduction
        2. Mid-article (before major H2 sections)
        3. Supporting specific concepts or examples
        4. Before conclusion
        
        Each prompt must describe:
        - Subject & composition
        - Visual style (photographic, minimal, editorial)
        - Mood & lighting
        - Color palette suggestions
        - Avoid: text, logos, faces (unless relevant)
        

        Section 3: Prompt Quality Rules (80 tokens)

        • Be specific and descriptive (not generic)
        • Include scene details, angles, perspective
        • Specify lighting, time of day if relevant
        • Mention style references
        • Ensure diversity across all images
        • No duplicate concepts

        Section 4: Output Format (50 tokens)

        • JSON structure
        • Show example with good vs bad prompts

        Proposed Template Prompt Improvement

        Replace current template with:

        A {style} photograph for "{post_title}". {image_prompt}. 
        Composition: {composition_hint}. Lighting: {lighting_hint}. 
        Mood: {mood}. Style: clean, modern, editorial web content. 
        No text, watermarks, or logos.
        

        Where:

        • {style} - photographic, minimalist, lifestyle, etc.
        • {composition_hint} - center-framed, rule-of-thirds, wide-angle, etc.
        • {lighting_hint} - natural daylight, soft indoor, dramatic, etc.
        • {mood} - professional, warm, energetic, calm, etc.

        Implementation Plan

        Phase 1: Clustering Prompt (Week 1)

        Tasks:

        1. Draft new clustering prompt with examples
        2. Test with sample keyword sets (20, 50, 100 keywords)
        3. Compare outputs: old vs new
        4. Validate cluster quality (manual review)
        5. Update PromptRegistry.DEFAULT_PROMPTS['clustering']
        6. Deploy and monitor

        Success Criteria:

        • Consistent cluster count (5-15)
        • No single-keyword clusters
        • Clear, descriptive names
        • Concise descriptions (20-30 words)
        • 95%+ of keywords clustered

        Phase 2: Idea Generation Prompt (Week 1-2)

        Tasks:

        1. Draft new ideas prompt with examples
        2. Test with 5-10 clusters
        3. Validate cluster_hub vs supporting idea distinction
        4. Check covered_keywords accuracy
        5. Verify content_structure alignment
        6. Update PromptRegistry.DEFAULT_PROMPTS['ideas']
        7. Deploy and monitor

        Success Criteria:

        • Clear distinction between hub and supporting ideas
        • Accurate covered_keywords extraction
        • Appropriate word count estimates
        • Outlines match content_structure type
        • No duplicate ideas within cluster

        Phase 3: Content Generation Prompt (Week 2)

        Tasks:

        1. Draft new content prompt with word count logic
        2. Implement dynamic variable replacement in build_prompt()
        3. Test with 500, 1000, 1500 word targets
        4. Validate actual word counts (automated counting)
        5. Test all content_structure types
        6. Verify HTML quality and consistency
        7. Update PromptRegistry.DEFAULT_PROMPTS['content_generation']
        8. Deploy and monitor

        Code Change Required:

        File: backend/igny8_core/ai/functions/generate_content.py

        Method: build_prompt()

        Add word count calculation:

        def build_prompt(self, data: Any, account=None) -> str:
            task = data if not isinstance(data, list) else data[0]
            
            # Calculate adaptive word count parameters
            target_words = task.word_count or 1000
            
            if target_words <= 600:
                intro_words = 60
                section_count = 5
                section_words = 80
                hook_words = 25
            elif target_words <= 1200:
                intro_words = 120
                section_count = 6
                section_words = 140
                hook_words = 35
            else:
                intro_words = 180
                section_count = 7
                section_words = 180
                hook_words = 45
            
            # Get prompt and replace variables
            prompt = PromptRegistry.get_prompt(
                function_name='generate_content',
                account=account,
                task=task,
                context={
                    'TARGET_WORD_COUNT': target_words,
                    'INTRO_WORDS': intro_words,
                    'SECTION_COUNT': section_count,
                    'SECTION_WORDS': section_words,
                    'HOOK_WORDS': hook_words,
                    # ... existing context
                }
            )
            
            return prompt
        

        Success Criteria:

        • 95%+ of generated content within ±5% of target word count
        • HTML structure consistent
        • Content structure types clearly differentiated
        • Keyword integration natural
        • No sections starting with lists

        Phase 4: Image Prompt Improvements (Week 2-3)

        Tasks:

        1. Draft new extraction prompt with placement guidelines
        2. Draft new template prompt with style variables
        3. Test with 10 sample articles
        4. Validate image diversity and relevance
        5. Update both prompts in registry
        6. Update GenerateImagePromptsFunction to use new template
        7. Deploy and monitor

        Success Criteria:

        • No duplicate image concepts in same article
        • Prompts are specific and detailed
        • Featured image distinct from in-article images
        • Image placement logically distributed
        • Generated images relevant to content

        Prompt Versioning & Testing

        Version Control

        Recommendation: Store prompt versions in database for A/B testing

        Schema:

        class AIPromptVersion(models.Model):
            prompt_type = CharField(choices=PROMPT_TYPE_CHOICES)
            version = IntegerField()
            prompt_value = TextField()
            is_active = BooleanField(default=False)
            created_at = DateTimeField(auto_now_add=True)
            performance_metrics = JSONField(default=dict)  # Track success rates
        

        Process:

        1. Test new prompt version alongside current
        2. Compare outputs on same inputs
        3. Measure quality metrics (manual + automated)
        4. Gradually roll out if better
        5. Keep old version as fallback

        Automated Quality Metrics

        Implement automated checks:

        Metric Check Threshold
        Word Count Accuracy abs(actual - target) / target < 0.05 (±5%)
        HTML Validity Parse with BeautifulSoup 100% valid
        Keyword Presence Count keyword mentions ≥ 3 for primary
        Structure Compliance Check H2/H3 hierarchy Valid structure
        Cluster Count Number of clusters 5-15
        Cluster Size Keywords per cluster 3-10
        No Duplicates Keyword appears once 100% unique

        Log results:

        • Track per prompt version
        • Identify patterns in failures
        • Use for prompt iteration

        Model Selection & Optimization

        Current Models

        Location: backend/igny8_core/ai/settings.py

        Default Models per Function:

        • Clustering: GPT-4 (expensive but accurate)
        • Ideas: GPT-4 (creative)
        • Content: GPT-4 (quality)
        • Image Prompts: GPT-3.5-turbo (simpler task)
        • Images: DALL-E 3 / Runware

        Optimization Opportunities

        Cost vs Quality Tradeoffs:

        Function Current Alternative Cost Savings Quality Impact
        Clustering GPT-4 GPT-4-turbo 50% Minimal
        Ideas GPT-4 GPT-4-turbo 50% Minimal
        Content GPT-4 GPT-4-turbo 50% Test required
        Image Prompts GPT-3.5 Keep - -

        Recommendation: Test GPT-4-turbo for all text generation tasks

        • Faster response time
        • 50% cost reduction
        • Similar quality for structured outputs

        Success Metrics

        • Word count accuracy: 95%+ within ±5%
        • Clustering quality: No single-keyword clusters
        • Idea generation: Clear hub vs supporting distinction
        • HTML validity: 100%
        • Keyword integration: Natural, not stuffed
        • Image prompt diversity: No duplicates
        • User satisfaction: Fewer manual edits needed
        • Processing time: <10s for 1000-word article
        • Credit cost: 30% reduction with model optimization

        Backend

        • backend/igny8_core/ai/prompts.py - Prompt registry and defaults
        • backend/igny8_core/ai/functions/auto_cluster.py - Clustering function
        • backend/igny8_core/ai/functions/generate_ideas.py - Ideas function
        • backend/igny8_core/ai/functions/generate_content.py - Content function
        • backend/igny8_core/ai/functions/generate_image_prompts.py - Image prompts
        • backend/igny8_core/ai/settings.py - Model configuration
        • backend/igny8_core/modules/system/models.py - AIPrompt model

        Testing

        • Create test suite: backend/igny8_core/ai/tests/test_prompts.py
        • Test fixtures with sample inputs
        • Automated quality validation
        • Performance benchmarks

        Notes

        • All prompt changes should be tested on real data first
        • Keep old prompts in version history for rollback
        • Monitor user feedback on content quality
        • Consider user-customizable prompt templates (advanced feature)
        • Document prompt engineering best practices for team
        • SAG clustering prompt (mentioned in original doc) to be handled separately as specialized architecture