NAVIGATION_REFACTOR COMPLETED

2026-01-17 03:49:50 +00:00
parent 47a00e8875
commit 501a269450
29 changed files with 3839 additions and 2103 deletions
--- a/docs/plans/implemented/PROMPT_ALIGNMENT_SUGGESTIONS.md
+++ b/docs/plans/implemented/PROMPT_ALIGNMENT_SUGGESTIONS.md
@@ -0,0 +1,425 @@
+# AI Prompt Alignment Suggestions
+
+**Date:** January 15, 2026
+
+## 🚨 CRITICAL FINDING: Data Loss Between Idea Generation & Content Generation
+
+**The Problem:** The idea generation AI creates detailed outlines with 6-10 H2 sections, but this outline structure is **never stored in the database**. Only basic fields (title, description text, keywords) are saved. When content generation runs, it has NO ACCESS to:
+- The planned section count (6? 8? 10?)
+- The section outline structure (h2_topic, coverage details)
+- The primary focus keywords
+- The covered keywords  
+- The target word count
+
+**Result:** Content generator uses a fixed template (6 sections, 1000-1200 words) that conflicts with the variable planning done by ideas generator (6-10 sections, 1200-1800 words).
+
+**Solution:** Either add a JSONField to store the complete idea structure, OR update the content prompt to work with limited information and pass available keyword/word count data.
+
+---
+
+## Executive Summary
+
+After analyzing the current **Ideas Generation** and **Content Generation** prompts from the database, I've identified key areas where these prompts need better alignment to ensure consistency in content output.
+
+---
+
+## Current State Analysis
+
+### Ideas Generation Prompt
+- Generates 3-7 content ideas per cluster
+- Defines 6-10 H2 sections per idea
+- Targets 1-2 primary focus keywords + 2-3 covered keywords (3-5 total)
+- AI-determined word count based on sections/keywords
+- Emphasizes completely different keywords per idea
+- Outputs strategic outline only (no detailed H3/formatting)
+
+### Content Generation Prompt
+- Targets 1000-1200 words
+- Requires exactly 6 H2 sections
+- Has rigid section format requirements (2 paragraphs, 2 lists, 1 table)
+- Detailed HTML structure specifications
+- Strict word count per paragraph (50-80 words)
+- Includes specific formatting rules for lists and tables
+
+---
+
+## Key Inconsistencies Identified
+
+### 1. **Section Count Mismatch**
+- **Ideas Prompt:** 6-10 H2 sections (variable, AI-determined)
+- **Content Prompt:** Exactly 6 H2 sections (fixed)
+- **Issue:** Content generator cannot accommodate ideas with 7-10 sections
+
+### 2. **Word Count Flexibility**
+- **Ideas Prompt:** AI-determined based on topic complexity (typically 1200-1800 words)
+- **Content Prompt:** Fixed 1000-1200 words
+- **Issue:** Complex topics with 8-10 sections cannot fit in 1000-1200 words
+
+### 3. **Format Variety vs. Fixed Pattern**
+- **Ideas Prompt:** No formatting specifications (lets content generator decide)
+- **Content Prompt:** Rigid format (2 paragraphs, 2 lists, 1 table distributed)
+- **Issue:** Some topics need more lists/tables, others need more narrative
+
+### 4. **Keyword Coverage Alignment**
+- **Ideas Prompt:** 3-5 keywords total (1-2 primary + 2-3 covered)
+- **Content Prompt:** Primary keyword + secondary keywords (no clear limit)
+- **Alignment:** This is actually okay, but needs clearer instruction
+
+---
+
+## Suggested Changes to Content Generation Prompt
+
+### Change 1: Dynamic Section Count
+**Current:**
+```
+### 1. WORD COUNT: 1000-1200 words target
+- Write 6 H2 sections
+```
+
+**Suggested:**
+```
+### 1. WORD COUNT AND SECTIONS
+
+**Use the section count from the provided outline:**
+- The outline specifies the number of H2 sections to write
+- Typically 6-10 H2 sections based on topic complexity
+- Write ALL sections from the outline
+
+**Word count calculation:**
+- Base: 150-180 words per H2 section
+- Introduction: 100-150 words
+- Total = (Number of H2 sections × 170) + 125
+- Example: 6 sections = ~1,145 words | 8 sections = ~1,485 words | 10 sections = ~1,825 words
+```
+
+### Change 2: Flexible Format Distribution
+**Current:**
+```
+### 2. SECTION FORMAT VARIETY
+**For 6 H2 sections, distribute as:**
+- 2 sections: Paragraphs ONLY 
+- 2 section: Paragraphs + Lists
+- 1 section: Paragraphs + Tables
+```
+
+**Suggested:**
+```
+### 2. SECTION FORMAT VARIETY
+
+**Format distribution (scales with section count):**
+
+**For 6-7 sections:**
+- 3-4 sections: Paragraphs ONLY
+- 2 sections: Paragraphs + Lists
+- 1 section: Paragraphs + Tables
+
+**For 8-9 sections:**
+- 4-5 sections: Paragraphs ONLY
+- 2-3 sections: Paragraphs + Lists
+- 1-2 sections: Paragraphs + Tables
+
+**For 10+ sections:**
+- 5-6 sections: Paragraphs ONLY
+- 3 sections: Paragraphs + Lists
+- 2 sections: Paragraphs + Tables
+
+**Rules (apply to all counts):**
+- Randomize which sections get which format
+- Never use same pattern for consecutive sections
+- Lists: 4-5 items, 15-20 words each
+- Tables: 4-5 columns, 5-6 rows with real data
+- Use block quotes randomly in non-table sections
+```
+
+### Change 3: Input Structure Alignment - CRITICAL FINDING
+
+**What's Currently Output in [IGNY8_IDEA]:**
+
+Based on code analysis (`backend/igny8_core/ai/functions/generate_content.py`), here's what's actually being passed:
+
+```python
+# From generate_content.py build_prompt():
+idea_data = f"Title: {task.title or 'Untitled'}\n"
+if task.description:
+    idea_data += f"Description: {task.description}\n"
+idea_data += f"Content Type: {task.content_type or 'post'}\n"
+idea_data += f"Content Structure: {task.content_structure or 'article'}\n"
+```
+
+**Current Output Format (Plain Text):**
+```
+Title: How to Build an Email List from Scratch
+Description: This guide covers the fundamentals of list building...
+Content Type: post
+Content Structure: guide
+```
+
+**What's Available But NOT Being Passed:**
+
+The ContentIdeas model has these fields:
+- ✅ `primary_focus_keywords` (CharField - "email list building")
+- ✅ `target_keywords` (CharField - "subscriber acquisition, lead magnets")
+- ✅ `estimated_word_count` (IntegerField - 1500)
+- ✅ `content_type` (CharField - "post")
+- ✅ `content_structure` (CharField - "guide")
+
+But the outline structure (intro_focus, main_sections array) is **NOT stored anywhere**:
+- ❌ No outline JSON stored in ContentIdeas model
+- ❌ No outline JSON stored in Tasks model
+- ❌ The AI generates the outline but it's only in the API response, never persisted
+
+**The Root Problem:**
+
+1. **Ideas Generator outputs** full JSON with outline:
+```json
+{
+  "title": "...",
+  "description": {
+    "overview": "...",
+    "outline": {
+      "intro_focus": "...",
+      "main_sections": [
+        {"h2_topic": "...", "coverage": "..."},
+        {"h2_topic": "...", "coverage": "..."},
+        ...6-10 sections...
+      ]
+    }
+  },
+  "primary_focus_keywords": "...",
+  "covered_keywords": "..."
+}
+```
+
+2. **Only these get saved** to ContentIdeas:
+   - `idea_title` = title
+   - `description` = description.overview (NOT the outline!)
+   - `primary_focus_keywords` = primary_focus_keywords
+   - `target_keywords` = covered_keywords
+   - `estimated_word_count` = estimated_word_count
+
+3. **Content Generator receives** (from Tasks):
+   - Just title and description text
+   - No section outline
+   - No keyword info
+   - No word count target
+
+**Why This Causes Misalignment:**
+- Content generator has NO IDEA how many sections were planned (6? 8? 10?)
+- Content generator doesn't know which keywords to target
+- Content generator doesn't know the word count goal
+- Content generator can't follow the planned outline structure
+
+---
+
+**Recommended Solution Path:**
+
+**OPTION A: Store Full Idea JSON** (Best for Long-term)
+
+1. Add JSONField to ContentIdeas model:
+```python
+class ContentIdeas(models.Model):
+    # ... existing fields ...
+    idea_json = models.JSONField(
+        default=dict,
+        blank=True,
+        help_text="Complete idea structure from AI generation (outline, keywords, sections)"
+    )
+```
+
+2. Update generate_ideas.py to save full JSON:
+```python
+# In save_output method:
+content_idea = ContentIdeas.objects.create(
+    # ... existing fields ...
+    idea_json=idea_data,  # Store the complete JSON structure
+)
+```
+
+3. Update generate_content.py to use full structure:
+```python
+# In build_prompt method:
+if task.idea and task.idea.idea_json:
+    # Pass full JSON structure
+    idea_data = json.dumps(task.idea.idea_json, indent=2)
+else:
+    # Fallback to current simple format
+    idea_data = f"Title: {task.title}\nDescription: {task.description}\n"
+```
+
+4. Update Content Generation prompt INPUT section:
+```
+## INPUT
+
+**CONTENT IDEA:**
+[IGNY8_IDEA]
+
+Expected JSON structure:
+{
+  "title": "Article title",
+  "description": {
+    "overview": "2-3 sentence description",
+    "outline": {
+      "intro_focus": "What the introduction should establish",
+      "main_sections": [
+        {"h2_topic": "Section heading", "coverage": "What to cover"},
+        ... array of 6-10 sections ...
+      ]
+    }
+  },
+  "primary_focus_keywords": "1-2 main keywords",
+  "covered_keywords": "2-3 supporting keywords",
+  "estimated_word_count": 1500,
+  "content_type": "post",
+  "content_structure": "guide_tutorial"
+}
+
+**KEYWORD CLUSTER:**
+[IGNY8_CLUSTER]
+
+**KEYWORDS:**
+[IGNY8_KEYWORDS]
+
+**INSTRUCTIONS:**
+- Use the exact number of H2 sections from main_sections array
+- Each H2 section should follow the h2_topic and coverage from the outline
+- Target the word count from estimated_word_count (±100 words)
+- Focus on primary_focus_keywords and covered_keywords for SEO
+```
+
+**OPTION B: Quick Fix - Pass Available Fields** (Can implement immediately without DB changes)
+
+Update generate_content.py:
+```python
+# In build_prompt method:
+idea_data = f"Title: {task.title or 'Untitled'}\n"
+if task.description:
+    idea_data += f"Description: {task.description}\n"
+idea_data += f"Content Type: {task.content_type or 'post'}\n"
+idea_data += f"Content Structure: {task.content_structure or 'article'}\n"
+
+# ADD: Pull from related idea if available
+if task.idea:
+    if task.idea.primary_focus_keywords:
+        idea_data += f"Primary Focus Keywords: {task.idea.primary_focus_keywords}\n"
+    if task.idea.target_keywords:
+        idea_data += f"Covered Keywords: {task.idea.target_keywords}\n"
+    if task.idea.estimated_word_count:
+        idea_data += f"Target Word Count: {task.idea.estimated_word_count}\n"
+```
+
+Then update Content Generation prompt:
+```
+## INPUT
+
+**CONTENT IDEA:**
+[IGNY8_IDEA]
+
+Format:
+- Title: Article title
+- Description: Content overview
+- Content Type: post|page|product
+- Content Structure: article|guide|comparison|review|listicle
+- Primary Focus Keywords: 1-2 main keywords (if available)
+- Covered Keywords: 2-3 supporting keywords (if available)
+- Target Word Count: Estimated words (if available)
+
+**NOTE:** Generate 6-8 H2 sections based on content_structure type. Scale word count to match Target Word Count if provided (±100 words acceptable).
+```
+
+### Change 4: Keyword Usage Clarity
+**Current:**
+```
+## KEYWORD USAGE
+
+**Primary keyword** (identify from title):
+- Use in title, intro, meta title/description
+- Include in 2-3 H2 headings naturally
+- Mention 2-3 times in content (0.5-1% density)
+
+**Secondary keywords** (3-4 from keyword list):
+- Distribute across H2 sections
+- Use in H2/H3 headings where natural
+- 2-3 mentions each (0.3-0.6% density)
+- Include variations and related terms
+```
+
+**Suggested:**
+```
+## KEYWORD USAGE
+
+**Primary focus keywords** (1-2 from IGNY8_IDEA.primary_focus_keywords):
+- Already in the provided title (use it as-is)
+- Include in 2-3 H2 headings naturally (outline already targets this)
+- Mention 2-3 times in content (0.5-1% density)
+
+**Covered keywords** (2-3 from IGNY8_IDEA.covered_keywords):
+- Distribute across H2 sections
+- Use in H2/H3 headings where natural (outline may already include them)
+- 2-3 mentions each (0.3-0.6% density)
+- Include variations and related terms
+
+**Total keyword target:** 3-5 keywords (1-2 primary + 2-3 covered)
+```
+
+### Change 5: Verification Checklist Update
+**Current:**
+```
+## VERIFICATION BEFORE OUTPUT
+
+- [ ] 1000-1200 words ONLY (excluding HTML tags) - STOP if exceeding
+- [ ] 6 H2 sections
+- [ ] Maximum 2 sections with lists
+- [ ] Maximum 2 sections with tables
+```
+
+**Suggested:**
+```
+## VERIFICATION BEFORE OUTPUT
+
+- [ ] Word count matches outline's estimated_word_count (±100 words acceptable)
+- [ ] Number of H2 sections matches outline's main_sections count
+- [ ] Format distribution scales appropriately with section count
+- [ ] All sections from outline are covered
+- [ ] Primary focus keywords (1-2) used correctly
+- [ ] Covered keywords (2-3) distributed naturally
+- [ ] All paragraphs 50-80 words
+- [ ] All lists 4-5 items, 15-20 words each
+- [ ] All tables 4-5 columns, 5-6 rows, real data
+- [ ] No placeholder content anywhere
+- [ ] Meta title <60 chars, description <160 chars
+- [ ] Valid JSON with escaped quotes
+```
+
+---
+
+## Summary of Benefits
+
+### With These Changes:
+1. ✅ **Flexibility:** Content generator can handle 6-10 sections from ideas
+2. ✅ **Consistency:** Section count matches between idea and content generation
+3. ✅ **Scalability:** Word count scales naturally with complexity
+4. ✅ **Quality:** Format variety adapts to content needs
+5. ✅ **Alignment:** Clear keyword strategy (1-2 primary + 2-3 covered = 3-5 total)
+6. ✅ **Maintainability:** One source of truth for section structure (the outline)
+
+### Key Principle:
+**The Ideas Generator is the strategic planner** (decides sections, word count, keywords)
+**The Content Generator is the tactical executor** (follows the plan, adds formatting/depth)
+
+---
+
+## Implementation Notes
+
+- These changes maintain all quality requirements (word count per paragraph, list/table specs, etc.)
+- The rigid structure is replaced with scalable rules that maintain quality at any section count
+- The content generator becomes more flexible while maintaining consistency
+- Both prompts now work together as a cohesive system
+
+---
+
+## Next Steps
+
+1. Update the `content_generation` prompt in the database with suggested changes
+2. Test with various section counts (6, 8, 10 sections) to verify scalability
+3. Monitor output quality to ensure formatting rules scale properly
+4. Consider creating a validation layer that checks idea/content alignment before generation