Files
igny8/v2/V2-Execution-Docs/02F-optimizer.md
IGNY8 VPS (Salman) 0570052fec 1
2026-03-23 17:20:51 +00:00

24 KiB

IGNY8 Phase 2: Content Optimizer (02F)

Cluster-Aligned Content Optimization Engine

Document Version: 1.0 Date: 2026-03-23 Phase: IGNY8 Phase 2 — Feature Expansion Status: Build Ready Source of Truth: Codebase at /data/app/igny8/ Audience: Claude Code, Backend Developers, Architects


1. CURRENT STATE

Optimization App Today

The optimization Django app exists in INSTALLED_APPS but is inactive (behind feature flag). The following exist:

  • OptimizationTask model — exists with minimal fields (basic task tracking only)
  • optimize_content AI function — registered in igny8_core/ai/registry.py as one of the 7 registered functions, but only does basic content rewriting without cluster awareness, keyword coverage analysis, or scoring
  • optimization app label — app exists at igny8_core/modules/optimization/

What Does Not Exist

  • No cluster-alignment during optimization
  • No keyword coverage analysis against cluster keyword sets
  • No heading restructure logic
  • No intent-based content rewrite
  • No schema gap detection
  • No before/after scoring system (0-100)
  • No batch optimization
  • No integration with SAG data (01A) or taxonomy terms (02B)

Foundation Available

  • Clusters model (app_label=planner, db_table=igny8_clusters) with cluster keywords
  • Keywords model (app_label=planner, db_table=igny8_keywords) linked to clusters
  • Content.schema_markup JSONField — used by 02G for JSON-LD
  • Content.content_type and Content.content_structure — routing context
  • Content.structured_data JSONField (added by 02A)
  • ContentTaxonomy cluster mapping (added by 02B) with mapping_confidence
  • GSCMetricsCache (added by 02C) — position data identifies pages needing optimization
  • SchemaValidationService (added by 02G) — schema gap detection reuse
  • BaseAIFunction with validate(), prepare(), build_prompt(), parse_response(), save_output()

2. WHAT TO BUILD

Overview

Extend the existing OptimizationTask model and optimize_content AI function into a full cluster-aligned optimization engine. The system analyzes content against its cluster's keyword set, scores quality on a 0-100 scale, and produces optimized content with tracked before/after metrics.

2.1 Cluster Matching (Auto-Assign Optimization Context)

When content has no cluster assignment, the optimizer auto-detects the best-fit cluster:

Scoring Algorithm:

  • Keyword overlap (40%): count of cluster keywords found in content title + headings + body
  • Semantic similarity (40%): AI-scored relevance between content topic and cluster theme
  • Title match (20%): similarity between content title and cluster name/keywords

Thresholds:

  • Confidence ≥ 0.6 → auto-assign cluster
  • Confidence < 0.6 → flag for manual review, suggest top 3 candidates

This reuses the same scoring pattern as ClusterMappingService from 02B.

2.2 Keyword Coverage Analysis

For content with an assigned cluster:

  1. Load all Keywords records belonging to that cluster
  2. Scan content_html for each keyword: exact match, partial match (stemmed), semantic presence
  3. Report per keyword: {keyword, target_density, current_density, status: present|missing|low_density}
  4. Coverage targets:
    • Hub content (cluster_hub): 70%+ of cluster keywords covered
    • Supporting articles: 40%+ of cluster keywords covered
    • Product/service pages: 30%+ (focused on commercial keywords)

2.3 Heading Restructure

Analyze H1/H2/H3 hierarchy for SEO best practices:

Check Rule Fix
Single H1 Content must have exactly one H1 Merge or demote extra H1s
H2 keyword coverage H2s should contain target keywords from cluster AI rewrites H2s with keyword incorporation
Logical hierarchy No skipped levels (H1 → H3 without H2) Insert missing levels
H2 count Minimum 3 H2s for content >1000 words AI suggests additional H2 sections
Missing keyword themes Cluster keywords not represented in any heading AI suggests new H2/H3 sections for missing themes

2.4 Content Rewrite (Intent-Aligned)

Intent Classification:

  • Informational: expand explanations, add examples, increase depth, add definitions
  • Commercial: add comparison tables, pros/cons, feature highlights, trust signals
  • Transactional: strengthen CTAs, add urgency, streamline conversion path, social proof

Content Adjustments:

  • Expand thin content (<500 words) to minimum viable length for the content structure
  • Compress bloated content (detect and remove redundancy)
  • Add missing sections identified by keyword coverage analysis
  • Maintain existing tone and style while improving SEO alignment

2.5 Schema Gap Detection

Leverages SchemaValidationService from 02G:

  1. Check existing Content.schema_markup against expected schemas for the content type
  2. Expected schema by type: Article (post), Product (product), Service (service_page), FAQPage (if FAQ detected), BreadcrumbList (all), HowTo (if steps detected)
  3. Identify missing required fields per schema type
  4. Generate corrected/complete schema JSON-LD
  5. Schema-only optimization mode available (no content rewrite, just schema fix)

2.6 Before/After Scoring

Content Quality Score (0-100):

Factor Weight Score Criteria
Keyword Coverage 30% % of cluster keywords present vs target
Heading Structure 20% Single H1, keyword H2s, logical hierarchy, no skipped levels
Content Depth 20% Word count vs structure minimum, section completeness, detail level
Readability 15% Sentence length, paragraph length, Flesch-Kincaid approximation
Schema Completeness 15% Required schema fields present, validation passes

Every optimization records score_before and score_after. Dashboard aggregates show average improvement across all optimizations.

2.7 Batch Optimization

  • Select content by: cluster ID, score threshold (e.g., all content scoring < 50), content type, date range
  • Queue as Celery tasks with priority ordering (lowest scores first)
  • Concurrency: max 3 concurrent optimization tasks per account
  • Progress tracking via OptimizationTask status field
  • Cancel capability: change status to rejected to stop processing

3. DATA MODELS & APIS

3.1 Modified Model — OptimizationTask (optimization app)

Extend the existing OptimizationTask model with 16 new fields:

# Add to existing OptimizationTask model:

content = models.ForeignKey(
    'writer.Content',
    on_delete=models.CASCADE,
    related_name='optimization_tasks'
)
primary_cluster = models.ForeignKey(
    'planner.Clusters',
    on_delete=models.SET_NULL,
    null=True,
    blank=True,
    related_name='optimization_tasks'
)
secondary_clusters = models.JSONField(
    default=list,
    blank=True,
    help_text='List of Clusters IDs for secondary relevance'
)
keyword_targets = models.JSONField(
    default=list,
    blank=True,
    help_text='[{keyword, target_density, current_density, status}]'
)
optimization_type = models.CharField(
    max_length=20,
    choices=[
        ('full_rewrite', 'Full Rewrite'),
        ('heading_only', 'Heading Only'),
        ('schema_only', 'Schema Only'),
        ('keyword_coverage', 'Keyword Coverage'),
        ('batch', 'Batch'),
    ],
    default='full_rewrite'
)
intent_classification = models.CharField(
    max_length=15,
    choices=[
        ('informational', 'Informational'),
        ('commercial', 'Commercial'),
        ('transactional', 'Transactional'),
    ],
    blank=True,
    default=''
)
score_before = models.FloatField(null=True, blank=True)
score_after = models.FloatField(null=True, blank=True)
content_before = models.TextField(
    blank=True,
    default='',
    help_text='Snapshot of original content_html'
)
content_after = models.TextField(
    blank=True,
    default='',
    help_text='Optimized HTML (null until optimization completes)'
)
metadata_before = models.JSONField(
    default=dict,
    blank=True,
    help_text='{meta_title, meta_description, headings[]}'
)
metadata_after = models.JSONField(
    default=dict,
    blank=True
)
schema_before = models.JSONField(default=dict, blank=True)
schema_after = models.JSONField(default=dict, blank=True)
structure_changes = models.JSONField(
    default=list,
    blank=True,
    help_text='[{change_type, description, before, after}]'
)
confidence_score = models.FloatField(
    null=True,
    blank=True,
    help_text='AI confidence in the quality of changes (0-1)'
)
applied = models.BooleanField(default=False)
applied_at = models.DateTimeField(null=True, blank=True)

Update STATUS choices on OptimizationTask:

STATUS_CHOICES = [
    ('pending', 'Pending'),
    ('analyzing', 'Analyzing'),
    ('optimizing', 'Optimizing'),
    ('review', 'Ready for Review'),
    ('applied', 'Applied'),
    ('rejected', 'Rejected'),
]

PK: BigAutoField (integer) — existing model Table: existing igny8_optimization_tasks table (no rename needed)

3.2 Migration

Single migration in the optimization app (or igny8_core migrations):

igny8_core/migrations/XXXX_extend_optimization_task.py

Operations:

  1. AddField('OptimizationTask', 'content', ...) — FK to Content
  2. AddField('OptimizationTask', 'primary_cluster', ...) — FK to Clusters
  3. AddField('OptimizationTask', 'secondary_clusters', ...) — JSONField
  4. AddField('OptimizationTask', 'keyword_targets', ...) — JSONField
  5. AddField('OptimizationTask', 'optimization_type', ...) — CharField
  6. AddField('OptimizationTask', 'intent_classification', ...) — CharField
  7. AddField('OptimizationTask', 'score_before', ...) — FloatField
  8. AddField('OptimizationTask', 'score_after', ...) — FloatField
  9. AddField('OptimizationTask', 'content_before', ...) — TextField
  10. AddField('OptimizationTask', 'content_after', ...) — TextField
  11. AddField('OptimizationTask', 'metadata_before', ...) — JSONField
  12. AddField('OptimizationTask', 'metadata_after', ...) — JSONField
  13. AddField('OptimizationTask', 'schema_before', ...) — JSONField
  14. AddField('OptimizationTask', 'schema_after', ...) — JSONField
  15. AddField('OptimizationTask', 'structure_changes', ...) — JSONField
  16. AddField('OptimizationTask', 'confidence_score', ...) — FloatField
  17. AddField('OptimizationTask', 'applied', ...) — BooleanField
  18. AddField('OptimizationTask', 'applied_at', ...) — DateTimeField

3.3 API Endpoints

All endpoints under /api/v1/optimizer/:

Method Path Description
POST /api/v1/optimizer/analyze/ Analyze single content piece. Body: {content_id}. Returns scores + keyword coverage + heading analysis + recommendations. Does NOT rewrite.
POST /api/v1/optimizer/optimize/ Run full optimization. Body: {content_id, optimization_type}. Creates OptimizationTask, runs analysis + rewrite, returns preview.
POST /api/v1/optimizer/preview/ Preview changes without creating task. Body: {content_id}. Returns diff-style output.
POST /api/v1/optimizer/apply/{id}/ Apply optimized version. Copies content_afterContent.content_html, updates metadata, sets applied=True.
POST /api/v1/optimizer/reject/{id}/ Reject optimization. Sets status=rejected, keeps original content.
POST /api/v1/optimizer/batch/ Queue batch optimization. Body: {site_id, cluster_id?, score_threshold?, content_type?, content_ids?}. Returns batch task ID.
GET /api/v1/optimizer/tasks/?site_id=X List OptimizationTask records with filters (status, optimization_type, cluster_id, date range).
GET /api/v1/optimizer/tasks/{id}/ Single optimization detail with full before/after data.
GET /api/v1/optimizer/tasks/{id}/diff/ HTML diff view — visual comparison of content_before vs content_after.
GET /api/v1/optimizer/cluster-suggestions/?content_id=X Suggest best-fit cluster for unassigned content. Returns top 3 candidates with confidence scores.
POST /api/v1/optimizer/assign-cluster/ Assign cluster to content. Body: {content_id, cluster_id}. Updates Content record.
GET /api/v1/optimizer/dashboard/?site_id=X Optimization stats: avg score improvement, count by status, top improved, lowest scoring content.

Permissions: All endpoints use SiteSectorModelViewSet permission patterns.

3.4 AI Function — Enhanced optimize_content

Extend the existing registered optimize_content AI function:

Registry key: optimize_content (already registered — enhance, not replace) Location: igny8_core/ai/functions/optimize_content.py (existing file)

class OptimizeContentFunction(BaseAIFunction):
    """
    Enhanced cluster-aligned content optimization.
    Extends existing optimize_content with keyword coverage,
    heading restructure, intent classification, and scoring.
    """
    function_name = 'optimize_content'

    def validate(self, content_id, optimization_type='full_rewrite', **kwargs):
        # Verify content exists, has content_html
        # Verify optimization_type is valid
        pass

    def prepare(self, content_id, optimization_type='full_rewrite', **kwargs):
        # Load Content record
        # Determine cluster (from Content or auto-match)
        # Load cluster Keywords
        # Analyze current keyword coverage
        # Parse heading structure
        # Classify intent
        # Calculate score_before
        # Snapshot content_before, metadata_before, schema_before
        pass

    def build_prompt(self):
        # Build type-specific optimization prompt:
        # - Include current content_html
        # - Include cluster keywords with coverage status
        # - Include heading analysis results
        # - Include intent classification
        # - Include optimization_type instructions:
        #   full_rewrite: all optimizations
        #   heading_only: heading restructure only
        #   schema_only: schema fix only (no content change)
        #   keyword_coverage: add missing keyword sections only
        pass

    def parse_response(self, response):
        # Parse optimized HTML
        # Parse updated metadata (meta_title, meta_description)
        # Parse structure_changes list
        # Parse confidence_score
        pass

    def save_output(self, parsed):
        # Create OptimizationTask with all before/after data
        # Calculate score_after
        # Set status='review'
        pass

3.5 Content Scoring Service

Location: igny8_core/business/content_scoring.py

class ContentScoringService:
    """
    Calculates Content Quality Score (0-100) using 5 weighted factors.
    Used by optimizer for before/after and by dashboard for overview.
    """

    WEIGHTS = {
        'keyword_coverage': 0.30,
        'heading_structure': 0.20,
        'content_depth': 0.20,
        'readability': 0.15,
        'schema_completeness': 0.15,
    }

    def score(self, content_id, cluster_id=None):
        """
        Calculate composite score for a content record.
        Returns: {total: float, breakdown: {factor: score}}
        """
        pass

    def _score_keyword_coverage(self, content, cluster):
        """0-100: % of cluster keywords found in content."""
        pass

    def _score_heading_structure(self, content_html):
        """0-100: single H1, keyword H2s, no skipped levels, H2 count."""
        pass

    def _score_content_depth(self, content_html, content_structure):
        """0-100: word count vs minimum for structure type, section completeness."""
        pass

    def _score_readability(self, content_html):
        """0-100: avg sentence length, paragraph length, Flesch-Kincaid approx."""
        pass

    def _score_schema_completeness(self, content):
        """0-100: required schema fields present, from SchemaValidationService (02G)."""
        pass

3.6 Keyword Coverage Analyzer

Location: igny8_core/business/keyword_coverage.py

class KeywordCoverageAnalyzer:
    """
    Analyzes content against cluster keyword set.
    Returns per-keyword presence and overall coverage percentage.
    """

    def analyze(self, content_id, cluster_id):
        """
        Returns {
            total_keywords: int,
            covered: int,
            missing: int,
            coverage_pct: float,
            keywords: [{keyword, target_density, current_density, status}]
        }
        """
        pass

    def _extract_text(self, content_html):
        """Strip HTML, return plain text for analysis."""
        pass

    def _check_keyword(self, keyword, text):
        """Check for exact, partial (stemmed), and semantic presence."""
        pass

3.7 Celery Tasks

Location: igny8_core/tasks/optimization_tasks.py

@shared_task(name='run_optimization')
def run_optimization(optimization_task_id):
    """Process a single OptimizationTask. Called by API endpoints."""
    pass

@shared_task(name='run_batch_optimization')
def run_batch_optimization(site_id, cluster_id=None, score_threshold=None,
                           content_type=None, content_ids=None, batch_size=10):
    """
    Process batch of content for optimization.
    Selects content matching filters, creates OptimizationTask per item,
    processes sequentially with max 3 concurrent per account.
    """
    pass

@shared_task(name='identify_optimization_candidates')
def identify_optimization_candidates(site_id, threshold=50):
    """
    Weekly scan: find content with quality score below threshold.
    Creates report, does NOT auto-optimize.
    """
    pass

Beat Schedule Addition:

Task Schedule Notes
identify_optimization_candidates Weekly (Monday 4:00 AM) Scans all sites, identifies low-scoring content

4. IMPLEMENTATION STEPS

Step 1: Migration

  1. Add 16 new fields to OptimizationTask model
  2. Update STATUS_CHOICES on OptimizationTask
  3. Run migration

Step 2: Services

  1. Implement ContentScoringService in igny8_core/business/content_scoring.py
  2. Implement KeywordCoverageAnalyzer in igny8_core/business/keyword_coverage.py

Step 3: AI Function Enhancement

  1. Extend OptimizeContentFunction in igny8_core/ai/functions/optimize_content.py
  2. Add cluster-alignment, keyword coverage, heading analysis, intent classification, scoring
  3. Maintain backward compatibility — existing optimize_content calls still work

Step 4: API Endpoints

  1. Add optimizer endpoints to igny8_core/urls/optimizer.py (or create if doesn't exist)
  2. Create views: AnalyzeView, OptimizeView, PreviewView, ApplyView, RejectView, BatchView
  3. Create ClusterSuggestionsView, AssignClusterView, DashboardView, DiffView
  4. Register URL patterns under /api/v1/optimizer/

Step 5: Celery Tasks

  1. Implement run_optimization, run_batch_optimization, identify_optimization_candidates
  2. Add identify_optimization_candidates to Celery beat schedule

Step 6: Serializers & Admin

  1. Update DRF serializer for extended OptimizationTask (include all 16 new fields)
  2. Create nested serializers for before/after views
  3. Update Django admin registration

Step 7: Credit Cost Configuration

Add to CreditCostConfig (billing app):

operation_type default_cost description
optimization_analysis 2 Analyze single content (scoring + keyword coverage)
optimization_full_rewrite 5-8 Full rewrite optimization (varies by content length)
optimization_schema_only 1 Schema gap fix only
optimization_batch 15-25 Batch optimization for 10 items

Credit deduction follows existing CreditUsageLog pattern.


5. ACCEPTANCE CRITERIA

Cluster Matching

  • Content without cluster assignment gets auto-matched with confidence scoring
  • Confidence ≥ 0.6 auto-assigns; < 0.6 flags for manual review with top 3 suggestions
  • Cluster suggestions endpoint returns ranked candidates

Keyword Coverage

  • All cluster keywords analyzed for presence in content
  • Coverage report includes exact match, partial match, and missing keywords
  • Hub content targets 70%+, supporting articles 40%+, product/service 30%+

Heading Restructure

  • H1/H2/H3 hierarchy validated (single H1, no skipped levels)
  • Missing keyword themes identified and new headings suggested
  • AI rewrites headings incorporating target keywords while maintaining meaning

Content Rewrite

  • Intent classified correctly (informational/commercial/transactional)
  • Rewrite adjusts content structure based on intent
  • Thin content expanded, bloated content compressed
  • Missing keyword sections added

Scoring

  • Score 0-100 calculated with 5 weighted factors
  • score_before recorded before any changes
  • score_after recorded after optimization
  • Dashboard shows average improvement and distribution

Before/After

  • Full snapshot of original content preserved in content_before
  • Optimized version stored in content_after without auto-applying
  • Diff view provides visual HTML comparison
  • Apply action copies content_after → Content.content_html
  • Reject action preserves original, marks task rejected

Batch

  • Batch optimization selects content by cluster, score threshold, type, or explicit IDs
  • Max 3 concurrent optimizations per account enforced
  • Progress trackable via OptimizationTask status
  • Weekly candidate identification runs without auto-optimizing

Integration

  • Schema gap detection leverages SchemaValidationService from 02G
  • Credit costs deducted per CreditCostConfig entries
  • All API endpoints respect account/site permission boundaries

6. CLAUDE CODE INSTRUCTIONS

File Locations

igny8_core/
├── ai/
│   └── functions/
│       └── optimize_content.py       # Enhance existing function
├── business/
│   ├── content_scoring.py            # ContentScoringService
│   └── keyword_coverage.py           # KeywordCoverageAnalyzer
├── tasks/
│   └── optimization_tasks.py         # Celery tasks
├── urls/
│   └── optimizer.py                  # Optimizer endpoints
└── migrations/
    └── XXXX_extend_optimization_task.py

Conventions

  • PKs: BigAutoField (integer) — do NOT use UUIDs
  • Table prefix: igny8_ (existing table igny8_optimization_tasks)
  • Celery app name: igny8_core
  • URL pattern: /api/v1/optimizer/...
  • Permissions: Use SiteSectorModelViewSet permission pattern
  • AI functions: Extend existing BaseAIFunction subclass — do NOT create a new registration key, enhance the existing optimize_content
  • Frontend: .tsx files with Zustand stores for state management

Cross-References

Doc Relationship
02B Taxonomy terms get cluster context for optimization; ClusterMappingService scoring pattern reused
02G SchemaValidationService used for schema gap detection; schema_only optimization triggers 02G schema generation
02C GSC position data identifies pages needing optimization (high impressions, low clicks)
02D Optimizer identifies internal link opportunities and feeds them to linker
01E Blueprint-aware pipeline sets initial content quality; optimizer improves post-generation
01A SAGBlueprint/SAGCluster data provides cluster context for optimization
01G SAG health monitoring can incorporate content quality scores as a health factor

Key Decisions

  1. Extend, don't replace — The existing OptimizationTask model and optimize_content AI function are enhanced, not replaced with new models
  2. Preview-first workflow — Optimizations always produce a preview (status=review) before applying to Content
  3. Content snapshot — Full HTML snapshot stored in content_before for rollback capability
  4. Score reuseContentScoringService is a standalone service usable by other modules (02G schema audit, 01G health monitoring)
  5. Schema delegation — Schema gap detection reuses 02G's SchemaValidationService rather than duplicating logic