# Item 3: Prompt Improvement and Model Optimization **Priority:** High **Target:** Production Launch **Last Updated:** December 11, 2025 --- ## Overview Redesign and optimize all AI prompts for clustering, idea generation, content generation, and image prompt extraction to achieve: - Extreme accuracy and consistent outputs - Faster processing with optimized token usage - Correct word count adherence (500, 1000, 1500 words) - Improved clustering quality and idea relevance - Better image prompt clarity and relevance --- ## Current Prompt System Architecture ### Prompt Registry **Location:** `backend/igny8_core/ai/prompts.py` **Class:** `PromptRegistry` **Hierarchy** (resolution order): 1. Task-level `prompt_override` (if exists on specific task) 2. Database prompt from `AIPrompt` model (account-specific) 3. Default fallback from `PromptRegistry.DEFAULT_PROMPTS` **Storage:** - Default prompts: Hardcoded in `prompts.py` - Account overrides: `system_aiprompt` database table - Task overrides: `prompt_override` field on task object --- ## Current Prompts Analysis ### 1. Clustering Prompt **Function:** `auto_cluster` **File:** `backend/igny8_core/ai/functions/auto_cluster.py` **Prompt Key:** `'clustering'` #### Current Prompt Structure **Approach:** Semantic strategist + intent-driven clustering **Key Instructions:** - Return single JSON with "clusters" array - Each cluster: name, description, keywords[] - Multi-dimensional grouping (intent, use-case, function, persona, context) - Model real search behavior and user journeys - Avoid superficial groupings and duplicates - 3-10 keywords per cluster **Strengths:** ✅ Clear JSON output format ✅ Detailed grouping logic with dimensions ✅ Emphasis on semantic strength over keyword matching ✅ User journey modeling (Problem → Solution, General → Specific) **Issues:** ❌ Very long prompt (~400+ tokens) - may confuse model ❌ No examples provided - model must guess formatting ❌ Doesn't specify what to do with outliers explicitly ❌ No guidance on cluster count (outputs variable) ❌ Description length not constrained **Real-World Performance Issues:** - Sometimes creates too many small clusters (1-2 keywords each) - Inconsistent cluster naming convention - Descriptions sometimes generic ("Keywords related to...") --- ### 2. Idea Generation Prompt **Function:** `generate_ideas` **File:** `backend/igny8_core/ai/functions/generate_ideas.py` **Prompt Key:** `'ideas'` #### Current Prompt Structure **Approach:** SEO-optimized content ideas + outlines **Key Instructions:** - Input: Clusters + Keywords - Output: JSON "ideas" array - 1 cluster_hub + 2-4 supporting ideas per cluster - Fields: title, description, content_type, content_structure, cluster_id, estimated_word_count, covered_keywords - Outline format: intro (hook + 2 paragraphs), 5-8 H2 sections with 2-3 H3s each - Content mixing: paragraphs, lists, tables, blockquotes - No bullets/lists at start - Professional tone, no generic phrasing **Strengths:** ✅ Detailed outline structure ✅ Content mixing guidance (lists, tables, blockquotes) ✅ Clear JSON format ✅ Tone guidelines **Issues:** ❌ Very complex prompt (600+ tokens) ❌ Outline format too prescriptive (might limit creativity) ❌ No examples provided ❌ Estimated word count often inaccurate (too high or too low) ❌ "hook" guidance unclear (what makes a good hook?) ❌ Content structure validation not enforced **Real-World Performance Issues:** - Generated ideas sometimes too similar within cluster - Outlines don't always respect structure types (e.g., "review" vs "guide") - covered_keywords field sometimes empty or incorrect - cluster_hub vs supporting ideas distinction unclear --- ### 3. Content Generation Prompt **Function:** `generate_content` **File:** `backend/igny8_core/ai/functions/generate_content.py` **Prompt Key:** `'content_generation'` #### Current Prompt Structure **Approach:** Editorial content strategist **Key Instructions:** - Output: JSON {title, content (HTML)} - Introduction: 1 italic hook (30-40 words) + 2 paragraphs (50-60 words each), no headings - H2 sections: 5-8 total, 250-300 words each - Section format: 2 narrative paragraphs → list/table → optional closing paragraph → 2-3 subsections - Vary list/table types - Never start section with list/table - Tone: professional, no passive voice, no generic intros - Keyword usage: natural in title, intro, headings **Strengths:** ✅ Detailed structure guidance ✅ Strong tone/style rules ✅ HTML output format ✅ Keyword integration guidance **Issues:** ❌ **Word count not mentioned in prompt** - critical flaw ❌ No guidance on 500 vs 1000 vs 1500 word versions ❌ Hook word count (30-40) + paragraph counts (50-60 × 2) don't scale proportionally ❌ Section word count (250-300) doesn't adapt to total target ❌ No example output ❌ Content structure (article vs guide vs review) not clearly differentiated ❌ Table column guidance missing (what columns? how many?) **Real-World Performance Issues:** - **Output length wildly inconsistent** (generates 800 words when asked for 1500) - Introductions sometimes have headings despite instructions - Lists appear at start of sections - Table structure unclear (random columns) - Doesn't adapt content density to word count --- ### 4. Image Prompt Extraction **Function:** `generate_image_prompts` **File:** `backend/igny8_core/ai/functions/generate_image_prompts.py` **Prompt Key:** `'image_prompt_extraction'` #### Current Prompt Structure **Approach:** Extract visual descriptions from article **Key Instructions:** - Input: article title + content - Output: JSON {featured_prompt, in_article_prompts[]} - Extract featured image (main topic) - Extract up to {max_images} in-article images - Each prompt detailed for image generation (visual elements, style, mood, composition) **Strengths:** ✅ Clear structure ✅ Separates featured vs in-article ✅ Emphasizes detail in descriptions **Issues:** ❌ No guidance on what makes a good image prompt ❌ No style/mood specifications ❌ Doesn't specify where in article to place images ❌ No examples ❌ "Detailed enough" is subjective **Real-World Performance Issues:** - Prompts sometimes too generic ("Image of a person using a laptop") - No context from article content (extracts irrelevant visuals) - Featured image prompt sometimes identical to in-article prompt - No guidance on image diversity (all similar) --- ### 5. Image Generation Template **Prompt Key:** `'image_prompt_template'` #### Current Template **Approach:** Template-based prompt assembly **Format:** ``` Create a high-quality {image_type} image... "{post_title}"... {image_prompt}... Focus on realistic, well-composed scene... lifestyle/editorial web content... Avoid text, watermarks, logos... **not blurry.** ``` **Issues:** ❌ {image_type} not always populated ❌ "high-quality" and "not blurry" redundant/unclear ❌ No style guidance (photographic, illustration, 3D, etc.) ❌ No aspect ratio specification --- ## Required Improvements ### A. Clustering Prompt Redesign #### Goals - Reduce prompt length by 30-40% - Add 2-3 concrete examples - Enforce consistent cluster count (5-15 clusters ideal) - Standardize cluster naming (title case, descriptive) - Limit description to 20-30 words #### Proposed Structure **Section 1: Role & Task** (50 tokens) - Clear, concise role definition - Task: group keywords into intent-driven clusters **Section 2: Output Format with Example** (100 tokens) - JSON structure - Show 1 complete example cluster - Specify exact field requirements **Section 3: Clustering Rules** (150 tokens) - List 5-7 key rules (bullet format) - Keyword-first approach - Intent dimensions (brief) - Quality thresholds (3-10 keywords per cluster) - No duplicates **Section 4: Quality Checklist** (50 tokens) - Checklist of 4-5 validation points - Model self-validates before output **Total:** ~350 tokens (vs current ~420) #### Example Output Format to Include ```json { "clusters": [ { "name": "Organic Bedding Benefits", "description": "Health, eco-friendly, and comfort aspects of organic cotton bedding materials", "keywords": ["organic sheets", "eco-friendly bedding", "chemical-free cotton", "hypoallergenic sheets", "sustainable bedding"] } ] } ``` --- ### B. Idea Generation Prompt Redesign #### Goals - Simplify outline structure (less prescriptive) - Add examples of cluster_hub vs supporting ideas - Better covered_keywords extraction - Adaptive word count estimation - Content structure differentiation #### Proposed Structure **Section 1: Role & Objective** (40 tokens) - SEO content strategist - Task: generate content ideas from clusters **Section 2: Output Format with Examples** (150 tokens) - Show 1 cluster_hub example - Show 1 supporting idea example - Highlight key differences **Section 3: Idea Generation Rules** (100 tokens) - 1 cluster_hub (comprehensive, authoritative) - 2-4 supporting ideas (specific angles) - Word count: 1500-2200 for hubs, 1000-1500 for supporting - covered_keywords: extract from cluster keywords **Section 4: Outline Guidance** (100 tokens) - Simplified: Intro + 5-8 sections + Conclusion - Section types by content_structure: - article: narrative + data - guide: step-by-step + tips - review: pros/cons + comparison - listicle: numbered + categories - comparison: side-by-side + verdict **Total:** ~390 tokens (vs current ~610) --- ### C. Content Generation Prompt Redesign **Most Critical Improvement:** Word Count Adherence #### Goals - **Primary:** Generate exact word count (±5% tolerance) - Scale structure proportionally to word count - Differentiate content structures clearly - Improve HTML quality and consistency - Better keyword integration #### Proposed Adaptive Word Count System **Word Count Targets:** - 500 words: Short-form (5 sections × 80 words + intro/outro 60 words) - 1000 words: Standard (6 sections × 140 words + intro/outro 120 words) - 1500 words: Long-form (7 sections × 180 words + intro/outro 180 words) **Prompt Variable Replacement:** Before sending to AI, calculate: - `{TARGET_WORD_COUNT}` - from task.word_count - `{INTRO_WORDS}` - 60 / 120 / 180 based on target - `{SECTION_COUNT}` - 5 / 6 / 7 based on target - `{SECTION_WORDS}` - 80 / 140 / 180 based on target - `{HOOK_WORDS}` - 25 / 35 / 45 based on target #### Proposed Structure **Section 1: Role & Objective** (30 tokens) ``` You are an editorial content writer. Generate a {TARGET_WORD_COUNT}-word article... ``` **Section 2: Word Count Requirements** (80 tokens) ``` CRITICAL: The content must be exactly {TARGET_WORD_COUNT} words (±5% tolerance). Structure breakdown: - Introduction: {INTRO_WORDS} words total - Hook (italic): {HOOK_WORDS} words - Paragraphs: 2 × ~{INTRO_WORDS/2} words each - Main Sections: {SECTION_COUNT} H2 sections - Each section: {SECTION_WORDS} words - Conclusion: 60 words Word count validation: Count words in final output and adjust if needed. ``` **Section 3: Content Flow & HTML** (120 tokens) - Detailed structure per section - HTML tag usage (
,