24 KiB
IGNY8 Phase 2: Content Optimizer (02F)
Cluster-Aligned Content Optimization Engine
Document Version: 1.0
Date: 2026-03-23
Phase: IGNY8 Phase 2 — Feature Expansion
Status: Build Ready
Source of Truth: Codebase at /data/app/igny8/
Audience: Claude Code, Backend Developers, Architects
1. CURRENT STATE
Optimization App Today
The optimization Django app exists in INSTALLED_APPS but is inactive (behind feature flag). The following exist:
OptimizationTaskmodel — exists with minimal fields (basic task tracking only)optimize_contentAI function — registered inigny8_core/ai/registry.pyas one of the 7 registered functions, but only does basic content rewriting without cluster awareness, keyword coverage analysis, or scoringoptimizationapp label — app exists atigny8_core/modules/optimization/
What Does Not Exist
- No cluster-alignment during optimization
- No keyword coverage analysis against cluster keyword sets
- No heading restructure logic
- No intent-based content rewrite
- No schema gap detection
- No before/after scoring system (0-100)
- No batch optimization
- No integration with SAG data (01A) or taxonomy terms (02B)
Foundation Available
Clustersmodel (app_label=planner, db_table=igny8_clusters) with cluster keywordsKeywordsmodel (app_label=planner, db_table=igny8_keywords) linked to clustersContent.schema_markupJSONField — used by 02G for JSON-LDContent.content_typeandContent.content_structure— routing contextContent.structured_dataJSONField (added by 02A)ContentTaxonomycluster mapping (added by 02B) withmapping_confidenceGSCMetricsCache(added by 02C) — position data identifies pages needing optimizationSchemaValidationService(added by 02G) — schema gap detection reuseBaseAIFunctionwithvalidate(),prepare(),build_prompt(),parse_response(),save_output()
2. WHAT TO BUILD
Overview
Extend the existing OptimizationTask model and optimize_content AI function into a full cluster-aligned optimization engine. The system analyzes content against its cluster's keyword set, scores quality on a 0-100 scale, and produces optimized content with tracked before/after metrics.
2.1 Cluster Matching (Auto-Assign Optimization Context)
When content has no cluster assignment, the optimizer auto-detects the best-fit cluster:
Scoring Algorithm:
- Keyword overlap (40%): count of cluster keywords found in content title + headings + body
- Semantic similarity (40%): AI-scored relevance between content topic and cluster theme
- Title match (20%): similarity between content title and cluster name/keywords
Thresholds:
- Confidence ≥ 0.6 → auto-assign cluster
- Confidence < 0.6 → flag for manual review, suggest top 3 candidates
This reuses the same scoring pattern as ClusterMappingService from 02B.
2.2 Keyword Coverage Analysis
For content with an assigned cluster:
- Load all
Keywordsrecords belonging to that cluster - Scan
content_htmlfor each keyword: exact match, partial match (stemmed), semantic presence - Report per keyword:
{keyword, target_density, current_density, status: present|missing|low_density} - Coverage targets:
- Hub content (
cluster_hub): 70%+ of cluster keywords covered - Supporting articles: 40%+ of cluster keywords covered
- Product/service pages: 30%+ (focused on commercial keywords)
- Hub content (
2.3 Heading Restructure
Analyze H1/H2/H3 hierarchy for SEO best practices:
| Check | Rule | Fix |
|---|---|---|
| Single H1 | Content must have exactly one H1 | Merge or demote extra H1s |
| H2 keyword coverage | H2s should contain target keywords from cluster | AI rewrites H2s with keyword incorporation |
| Logical hierarchy | No skipped levels (H1 → H3 without H2) | Insert missing levels |
| H2 count | Minimum 3 H2s for content >1000 words | AI suggests additional H2 sections |
| Missing keyword themes | Cluster keywords not represented in any heading | AI suggests new H2/H3 sections for missing themes |
2.4 Content Rewrite (Intent-Aligned)
Intent Classification:
- Informational: expand explanations, add examples, increase depth, add definitions
- Commercial: add comparison tables, pros/cons, feature highlights, trust signals
- Transactional: strengthen CTAs, add urgency, streamline conversion path, social proof
Content Adjustments:
- Expand thin content (<500 words) to minimum viable length for the content structure
- Compress bloated content (detect and remove redundancy)
- Add missing sections identified by keyword coverage analysis
- Maintain existing tone and style while improving SEO alignment
2.5 Schema Gap Detection
Leverages SchemaValidationService from 02G:
- Check existing
Content.schema_markupagainst expected schemas for the content type - Expected schema by type: Article (post), Product (product), Service (service_page), FAQPage (if FAQ detected), BreadcrumbList (all), HowTo (if steps detected)
- Identify missing required fields per schema type
- Generate corrected/complete schema JSON-LD
- Schema-only optimization mode available (no content rewrite, just schema fix)
2.6 Before/After Scoring
Content Quality Score (0-100):
| Factor | Weight | Score Criteria |
|---|---|---|
| Keyword Coverage | 30% | % of cluster keywords present vs target |
| Heading Structure | 20% | Single H1, keyword H2s, logical hierarchy, no skipped levels |
| Content Depth | 20% | Word count vs structure minimum, section completeness, detail level |
| Readability | 15% | Sentence length, paragraph length, Flesch-Kincaid approximation |
| Schema Completeness | 15% | Required schema fields present, validation passes |
Every optimization records score_before and score_after. Dashboard aggregates show average improvement across all optimizations.
2.7 Batch Optimization
- Select content by: cluster ID, score threshold (e.g., all content scoring < 50), content type, date range
- Queue as Celery tasks with priority ordering (lowest scores first)
- Concurrency: max 3 concurrent optimization tasks per account
- Progress tracking via OptimizationTask status field
- Cancel capability: change status to
rejectedto stop processing
3. DATA MODELS & APIS
3.1 Modified Model — OptimizationTask (optimization app)
Extend the existing OptimizationTask model with 16 new fields:
# Add to existing OptimizationTask model:
content = models.ForeignKey(
'writer.Content',
on_delete=models.CASCADE,
related_name='optimization_tasks'
)
primary_cluster = models.ForeignKey(
'planner.Clusters',
on_delete=models.SET_NULL,
null=True,
blank=True,
related_name='optimization_tasks'
)
secondary_clusters = models.JSONField(
default=list,
blank=True,
help_text='List of Clusters IDs for secondary relevance'
)
keyword_targets = models.JSONField(
default=list,
blank=True,
help_text='[{keyword, target_density, current_density, status}]'
)
optimization_type = models.CharField(
max_length=20,
choices=[
('full_rewrite', 'Full Rewrite'),
('heading_only', 'Heading Only'),
('schema_only', 'Schema Only'),
('keyword_coverage', 'Keyword Coverage'),
('batch', 'Batch'),
],
default='full_rewrite'
)
intent_classification = models.CharField(
max_length=15,
choices=[
('informational', 'Informational'),
('commercial', 'Commercial'),
('transactional', 'Transactional'),
],
blank=True,
default=''
)
score_before = models.FloatField(null=True, blank=True)
score_after = models.FloatField(null=True, blank=True)
content_before = models.TextField(
blank=True,
default='',
help_text='Snapshot of original content_html'
)
content_after = models.TextField(
blank=True,
default='',
help_text='Optimized HTML (null until optimization completes)'
)
metadata_before = models.JSONField(
default=dict,
blank=True,
help_text='{meta_title, meta_description, headings[]}'
)
metadata_after = models.JSONField(
default=dict,
blank=True
)
schema_before = models.JSONField(default=dict, blank=True)
schema_after = models.JSONField(default=dict, blank=True)
structure_changes = models.JSONField(
default=list,
blank=True,
help_text='[{change_type, description, before, after}]'
)
confidence_score = models.FloatField(
null=True,
blank=True,
help_text='AI confidence in the quality of changes (0-1)'
)
applied = models.BooleanField(default=False)
applied_at = models.DateTimeField(null=True, blank=True)
Update STATUS choices on OptimizationTask:
STATUS_CHOICES = [
('pending', 'Pending'),
('analyzing', 'Analyzing'),
('optimizing', 'Optimizing'),
('review', 'Ready for Review'),
('applied', 'Applied'),
('rejected', 'Rejected'),
]
PK: BigAutoField (integer) — existing model
Table: existing igny8_optimization_tasks table (no rename needed)
3.2 Migration
Single migration in the optimization app (or igny8_core migrations):
igny8_core/migrations/XXXX_extend_optimization_task.py
Operations:
AddField('OptimizationTask', 'content', ...)— FK to ContentAddField('OptimizationTask', 'primary_cluster', ...)— FK to ClustersAddField('OptimizationTask', 'secondary_clusters', ...)— JSONFieldAddField('OptimizationTask', 'keyword_targets', ...)— JSONFieldAddField('OptimizationTask', 'optimization_type', ...)— CharFieldAddField('OptimizationTask', 'intent_classification', ...)— CharFieldAddField('OptimizationTask', 'score_before', ...)— FloatFieldAddField('OptimizationTask', 'score_after', ...)— FloatFieldAddField('OptimizationTask', 'content_before', ...)— TextFieldAddField('OptimizationTask', 'content_after', ...)— TextFieldAddField('OptimizationTask', 'metadata_before', ...)— JSONFieldAddField('OptimizationTask', 'metadata_after', ...)— JSONFieldAddField('OptimizationTask', 'schema_before', ...)— JSONFieldAddField('OptimizationTask', 'schema_after', ...)— JSONFieldAddField('OptimizationTask', 'structure_changes', ...)— JSONFieldAddField('OptimizationTask', 'confidence_score', ...)— FloatFieldAddField('OptimizationTask', 'applied', ...)— BooleanFieldAddField('OptimizationTask', 'applied_at', ...)— DateTimeField
3.3 API Endpoints
All endpoints under /api/v1/optimizer/:
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/optimizer/analyze/ |
Analyze single content piece. Body: {content_id}. Returns scores + keyword coverage + heading analysis + recommendations. Does NOT rewrite. |
| POST | /api/v1/optimizer/optimize/ |
Run full optimization. Body: {content_id, optimization_type}. Creates OptimizationTask, runs analysis + rewrite, returns preview. |
| POST | /api/v1/optimizer/preview/ |
Preview changes without creating task. Body: {content_id}. Returns diff-style output. |
| POST | /api/v1/optimizer/apply/{id}/ |
Apply optimized version. Copies content_after → Content.content_html, updates metadata, sets applied=True. |
| POST | /api/v1/optimizer/reject/{id}/ |
Reject optimization. Sets status=rejected, keeps original content. |
| POST | /api/v1/optimizer/batch/ |
Queue batch optimization. Body: {site_id, cluster_id?, score_threshold?, content_type?, content_ids?}. Returns batch task ID. |
| GET | /api/v1/optimizer/tasks/?site_id=X |
List OptimizationTask records with filters (status, optimization_type, cluster_id, date range). |
| GET | /api/v1/optimizer/tasks/{id}/ |
Single optimization detail with full before/after data. |
| GET | /api/v1/optimizer/tasks/{id}/diff/ |
HTML diff view — visual comparison of content_before vs content_after. |
| GET | /api/v1/optimizer/cluster-suggestions/?content_id=X |
Suggest best-fit cluster for unassigned content. Returns top 3 candidates with confidence scores. |
| POST | /api/v1/optimizer/assign-cluster/ |
Assign cluster to content. Body: {content_id, cluster_id}. Updates Content record. |
| GET | /api/v1/optimizer/dashboard/?site_id=X |
Optimization stats: avg score improvement, count by status, top improved, lowest scoring content. |
Permissions: All endpoints use SiteSectorModelViewSet permission patterns.
3.4 AI Function — Enhanced optimize_content
Extend the existing registered optimize_content AI function:
Registry key: optimize_content (already registered — enhance, not replace)
Location: igny8_core/ai/functions/optimize_content.py (existing file)
class OptimizeContentFunction(BaseAIFunction):
"""
Enhanced cluster-aligned content optimization.
Extends existing optimize_content with keyword coverage,
heading restructure, intent classification, and scoring.
"""
function_name = 'optimize_content'
def validate(self, content_id, optimization_type='full_rewrite', **kwargs):
# Verify content exists, has content_html
# Verify optimization_type is valid
pass
def prepare(self, content_id, optimization_type='full_rewrite', **kwargs):
# Load Content record
# Determine cluster (from Content or auto-match)
# Load cluster Keywords
# Analyze current keyword coverage
# Parse heading structure
# Classify intent
# Calculate score_before
# Snapshot content_before, metadata_before, schema_before
pass
def build_prompt(self):
# Build type-specific optimization prompt:
# - Include current content_html
# - Include cluster keywords with coverage status
# - Include heading analysis results
# - Include intent classification
# - Include optimization_type instructions:
# full_rewrite: all optimizations
# heading_only: heading restructure only
# schema_only: schema fix only (no content change)
# keyword_coverage: add missing keyword sections only
pass
def parse_response(self, response):
# Parse optimized HTML
# Parse updated metadata (meta_title, meta_description)
# Parse structure_changes list
# Parse confidence_score
pass
def save_output(self, parsed):
# Create OptimizationTask with all before/after data
# Calculate score_after
# Set status='review'
pass
3.5 Content Scoring Service
Location: igny8_core/business/content_scoring.py
class ContentScoringService:
"""
Calculates Content Quality Score (0-100) using 5 weighted factors.
Used by optimizer for before/after and by dashboard for overview.
"""
WEIGHTS = {
'keyword_coverage': 0.30,
'heading_structure': 0.20,
'content_depth': 0.20,
'readability': 0.15,
'schema_completeness': 0.15,
}
def score(self, content_id, cluster_id=None):
"""
Calculate composite score for a content record.
Returns: {total: float, breakdown: {factor: score}}
"""
pass
def _score_keyword_coverage(self, content, cluster):
"""0-100: % of cluster keywords found in content."""
pass
def _score_heading_structure(self, content_html):
"""0-100: single H1, keyword H2s, no skipped levels, H2 count."""
pass
def _score_content_depth(self, content_html, content_structure):
"""0-100: word count vs minimum for structure type, section completeness."""
pass
def _score_readability(self, content_html):
"""0-100: avg sentence length, paragraph length, Flesch-Kincaid approx."""
pass
def _score_schema_completeness(self, content):
"""0-100: required schema fields present, from SchemaValidationService (02G)."""
pass
3.6 Keyword Coverage Analyzer
Location: igny8_core/business/keyword_coverage.py
class KeywordCoverageAnalyzer:
"""
Analyzes content against cluster keyword set.
Returns per-keyword presence and overall coverage percentage.
"""
def analyze(self, content_id, cluster_id):
"""
Returns {
total_keywords: int,
covered: int,
missing: int,
coverage_pct: float,
keywords: [{keyword, target_density, current_density, status}]
}
"""
pass
def _extract_text(self, content_html):
"""Strip HTML, return plain text for analysis."""
pass
def _check_keyword(self, keyword, text):
"""Check for exact, partial (stemmed), and semantic presence."""
pass
3.7 Celery Tasks
Location: igny8_core/tasks/optimization_tasks.py
@shared_task(name='run_optimization')
def run_optimization(optimization_task_id):
"""Process a single OptimizationTask. Called by API endpoints."""
pass
@shared_task(name='run_batch_optimization')
def run_batch_optimization(site_id, cluster_id=None, score_threshold=None,
content_type=None, content_ids=None, batch_size=10):
"""
Process batch of content for optimization.
Selects content matching filters, creates OptimizationTask per item,
processes sequentially with max 3 concurrent per account.
"""
pass
@shared_task(name='identify_optimization_candidates')
def identify_optimization_candidates(site_id, threshold=50):
"""
Weekly scan: find content with quality score below threshold.
Creates report, does NOT auto-optimize.
"""
pass
Beat Schedule Addition:
| Task | Schedule | Notes |
|---|---|---|
identify_optimization_candidates |
Weekly (Monday 4:00 AM) | Scans all sites, identifies low-scoring content |
4. IMPLEMENTATION STEPS
Step 1: Migration
- Add 16 new fields to
OptimizationTaskmodel - Update STATUS_CHOICES on OptimizationTask
- Run migration
Step 2: Services
- Implement
ContentScoringServiceinigny8_core/business/content_scoring.py - Implement
KeywordCoverageAnalyzerinigny8_core/business/keyword_coverage.py
Step 3: AI Function Enhancement
- Extend
OptimizeContentFunctioninigny8_core/ai/functions/optimize_content.py - Add cluster-alignment, keyword coverage, heading analysis, intent classification, scoring
- Maintain backward compatibility — existing
optimize_contentcalls still work
Step 4: API Endpoints
- Add optimizer endpoints to
igny8_core/urls/optimizer.py(or create if doesn't exist) - Create views:
AnalyzeView,OptimizeView,PreviewView,ApplyView,RejectView,BatchView - Create
ClusterSuggestionsView,AssignClusterView,DashboardView,DiffView - Register URL patterns under
/api/v1/optimizer/
Step 5: Celery Tasks
- Implement
run_optimization,run_batch_optimization,identify_optimization_candidates - Add
identify_optimization_candidatesto Celery beat schedule
Step 6: Serializers & Admin
- Update DRF serializer for extended OptimizationTask (include all 16 new fields)
- Create nested serializers for before/after views
- Update Django admin registration
Step 7: Credit Cost Configuration
Add to CreditCostConfig (billing app):
| operation_type | default_cost | description |
|---|---|---|
optimization_analysis |
2 | Analyze single content (scoring + keyword coverage) |
optimization_full_rewrite |
5-8 | Full rewrite optimization (varies by content length) |
optimization_schema_only |
1 | Schema gap fix only |
optimization_batch |
15-25 | Batch optimization for 10 items |
Credit deduction follows existing CreditUsageLog pattern.
5. ACCEPTANCE CRITERIA
Cluster Matching
- Content without cluster assignment gets auto-matched with confidence scoring
- Confidence ≥ 0.6 auto-assigns; < 0.6 flags for manual review with top 3 suggestions
- Cluster suggestions endpoint returns ranked candidates
Keyword Coverage
- All cluster keywords analyzed for presence in content
- Coverage report includes exact match, partial match, and missing keywords
- Hub content targets 70%+, supporting articles 40%+, product/service 30%+
Heading Restructure
- H1/H2/H3 hierarchy validated (single H1, no skipped levels)
- Missing keyword themes identified and new headings suggested
- AI rewrites headings incorporating target keywords while maintaining meaning
Content Rewrite
- Intent classified correctly (informational/commercial/transactional)
- Rewrite adjusts content structure based on intent
- Thin content expanded, bloated content compressed
- Missing keyword sections added
Scoring
- Score 0-100 calculated with 5 weighted factors
- score_before recorded before any changes
- score_after recorded after optimization
- Dashboard shows average improvement and distribution
Before/After
- Full snapshot of original content preserved in content_before
- Optimized version stored in content_after without auto-applying
- Diff view provides visual HTML comparison
- Apply action copies content_after → Content.content_html
- Reject action preserves original, marks task rejected
Batch
- Batch optimization selects content by cluster, score threshold, type, or explicit IDs
- Max 3 concurrent optimizations per account enforced
- Progress trackable via OptimizationTask status
- Weekly candidate identification runs without auto-optimizing
Integration
- Schema gap detection leverages SchemaValidationService from 02G
- Credit costs deducted per CreditCostConfig entries
- All API endpoints respect account/site permission boundaries
6. CLAUDE CODE INSTRUCTIONS
File Locations
igny8_core/
├── ai/
│ └── functions/
│ └── optimize_content.py # Enhance existing function
├── business/
│ ├── content_scoring.py # ContentScoringService
│ └── keyword_coverage.py # KeywordCoverageAnalyzer
├── tasks/
│ └── optimization_tasks.py # Celery tasks
├── urls/
│ └── optimizer.py # Optimizer endpoints
└── migrations/
└── XXXX_extend_optimization_task.py
Conventions
- PKs: BigAutoField (integer) — do NOT use UUIDs
- Table prefix:
igny8_(existing tableigny8_optimization_tasks) - Celery app name:
igny8_core - URL pattern:
/api/v1/optimizer/... - Permissions: Use
SiteSectorModelViewSetpermission pattern - AI functions: Extend existing
BaseAIFunctionsubclass — do NOT create a new registration key, enhance the existingoptimize_content - Frontend:
.tsxfiles with Zustand stores for state management
Cross-References
| Doc | Relationship |
|---|---|
| 02B | Taxonomy terms get cluster context for optimization; ClusterMappingService scoring pattern reused |
| 02G | SchemaValidationService used for schema gap detection; schema_only optimization triggers 02G schema generation |
| 02C | GSC position data identifies pages needing optimization (high impressions, low clicks) |
| 02D | Optimizer identifies internal link opportunities and feeds them to linker |
| 01E | Blueprint-aware pipeline sets initial content quality; optimizer improves post-generation |
| 01A | SAGBlueprint/SAGCluster data provides cluster context for optimization |
| 01G | SAG health monitoring can incorporate content quality scores as a health factor |
Key Decisions
- Extend, don't replace — The existing
OptimizationTaskmodel andoptimize_contentAI function are enhanced, not replaced with new models - Preview-first workflow — Optimizations always produce a preview (status=
review) before applying to Content - Content snapshot — Full HTML snapshot stored in
content_beforefor rollback capability - Score reuse —
ContentScoringServiceis a standalone service usable by other modules (02G schema audit, 01G health monitoring) - Schema delegation — Schema gap detection reuses 02G's
SchemaValidationServicerather than duplicating logic