Files

IGNY8 VPS (Salman) 0570052fec 1

2026-03-23 17:20:51 +00:00

24 KiB

Raw Blame History

IGNY8 Phase 2: Content Optimizer (02F)

Cluster-Aligned Content Optimization Engine

Document Version: 1.0 Date: 2026-03-23 Phase: IGNY8 Phase 2 — Feature Expansion Status: Build Ready Source of Truth: Codebase at /data/app/igny8/ Audience: Claude Code, Backend Developers, Architects

1. CURRENT STATE

Optimization App Today

The optimization Django app exists in INSTALLED_APPS but is inactive (behind feature flag). The following exist:

OptimizationTask model — exists with minimal fields (basic task tracking only)
optimize_content AI function — registered in igny8_core/ai/registry.py as one of the 7 registered functions, but only does basic content rewriting without cluster awareness, keyword coverage analysis, or scoring
optimization app label — app exists at igny8_core/modules/optimization/

What Does Not Exist

No cluster-alignment during optimization
No keyword coverage analysis against cluster keyword sets
No heading restructure logic
No intent-based content rewrite
No schema gap detection
No before/after scoring system (0-100)
No batch optimization
No integration with SAG data (01A) or taxonomy terms (02B)

Foundation Available

Clusters model (app_label=planner, db_table=igny8_clusters) with cluster keywords
Keywords model (app_label=planner, db_table=igny8_keywords) linked to clusters
Content.schema_markup JSONField — used by 02G for JSON-LD
Content.content_type and Content.content_structure — routing context
Content.structured_data JSONField (added by 02A)
ContentTaxonomy cluster mapping (added by 02B) with mapping_confidence
GSCMetricsCache (added by 02C) — position data identifies pages needing optimization
SchemaValidationService (added by 02G) — schema gap detection reuse
BaseAIFunction with validate(), prepare(), build_prompt(), parse_response(), save_output()

2. WHAT TO BUILD

Overview

Extend the existing OptimizationTask model and optimize_content AI function into a full cluster-aligned optimization engine. The system analyzes content against its cluster's keyword set, scores quality on a 0-100 scale, and produces optimized content with tracked before/after metrics.

2.1 Cluster Matching (Auto-Assign Optimization Context)

When content has no cluster assignment, the optimizer auto-detects the best-fit cluster:

Scoring Algorithm:

Keyword overlap (40%): count of cluster keywords found in content title + headings + body
Semantic similarity (40%): AI-scored relevance between content topic and cluster theme
Title match (20%): similarity between content title and cluster name/keywords

Thresholds:

Confidence ≥ 0.6 → auto-assign cluster
Confidence < 0.6 → flag for manual review, suggest top 3 candidates

This reuses the same scoring pattern as ClusterMappingService from 02B.

2.2 Keyword Coverage Analysis

For content with an assigned cluster:

Load all Keywords records belonging to that cluster
Scan content_html for each keyword: exact match, partial match (stemmed), semantic presence
Report per keyword: {keyword, target_density, current_density, status: present|missing|low_density}
Coverage targets:
- Hub content (cluster_hub): 70%+ of cluster keywords covered
- Supporting articles: 40%+ of cluster keywords covered
- Product/service pages: 30%+ (focused on commercial keywords)

2.3 Heading Restructure

Analyze H1/H2/H3 hierarchy for SEO best practices:

Check	Rule	Fix
Single H1	Content must have exactly one H1	Merge or demote extra H1s
H2 keyword coverage	H2s should contain target keywords from cluster	AI rewrites H2s with keyword incorporation
Logical hierarchy	No skipped levels (H1 → H3 without H2)	Insert missing levels
H2 count	Minimum 3 H2s for content >1000 words	AI suggests additional H2 sections
Missing keyword themes	Cluster keywords not represented in any heading	AI suggests new H2/H3 sections for missing themes

2.4 Content Rewrite (Intent-Aligned)

Intent Classification:

Informational: expand explanations, add examples, increase depth, add definitions
Commercial: add comparison tables, pros/cons, feature highlights, trust signals
Transactional: strengthen CTAs, add urgency, streamline conversion path, social proof

Content Adjustments:

Expand thin content (<500 words) to minimum viable length for the content structure
Compress bloated content (detect and remove redundancy)
Add missing sections identified by keyword coverage analysis
Maintain existing tone and style while improving SEO alignment

2.5 Schema Gap Detection

Leverages SchemaValidationService from 02G:

Check existing Content.schema_markup against expected schemas for the content type
Expected schema by type: Article (post), Product (product), Service (service_page), FAQPage (if FAQ detected), BreadcrumbList (all), HowTo (if steps detected)
Identify missing required fields per schema type
Generate corrected/complete schema JSON-LD
Schema-only optimization mode available (no content rewrite, just schema fix)

2.6 Before/After Scoring

Content Quality Score (0-100):

Factor	Weight	Score Criteria
Keyword Coverage	30%	% of cluster keywords present vs target
Heading Structure	20%	Single H1, keyword H2s, logical hierarchy, no skipped levels
Content Depth	20%	Word count vs structure minimum, section completeness, detail level
Readability	15%	Sentence length, paragraph length, Flesch-Kincaid approximation
Schema Completeness	15%	Required schema fields present, validation passes

Every optimization records score_before and score_after. Dashboard aggregates show average improvement across all optimizations.

2.7 Batch Optimization

Select content by: cluster ID, score threshold (e.g., all content scoring < 50), content type, date range
Queue as Celery tasks with priority ordering (lowest scores first)
Concurrency: max 3 concurrent optimization tasks per account
Progress tracking via OptimizationTask status field
Cancel capability: change status to rejected to stop processing

3. DATA MODELS & APIS

3.1 Modified Model — OptimizationTask (optimization app)

Extend the existing OptimizationTask model with 16 new fields:

# Add to existing OptimizationTask model:

content = models.ForeignKey(
    'writer.Content',
    on_delete=models.CASCADE,
    related_name='optimization_tasks'
)
primary_cluster = models.ForeignKey(
    'planner.Clusters',
    on_delete=models.SET_NULL,
    null=True,
    blank=True,
    related_name='optimization_tasks'
)
secondary_clusters = models.JSONField(
    default=list,
    blank=True,
    help_text='List of Clusters IDs for secondary relevance'
)
keyword_targets = models.JSONField(
    default=list,
    blank=True,
    help_text='[{keyword, target_density, current_density, status}]'
)
optimization_type = models.CharField(
    max_length=20,
    choices=[
        ('full_rewrite', 'Full Rewrite'),
        ('heading_only', 'Heading Only'),
        ('schema_only', 'Schema Only'),
        ('keyword_coverage', 'Keyword Coverage'),
        ('batch', 'Batch'),
    ],
    default='full_rewrite'
)
intent_classification = models.CharField(
    max_length=15,
    choices=[
        ('informational', 'Informational'),
        ('commercial', 'Commercial'),
        ('transactional', 'Transactional'),
    ],
    blank=True,
    default=''
)
score_before = models.FloatField(null=True, blank=True)
score_after = models.FloatField(null=True, blank=True)
content_before = models.TextField(
    blank=True,
    default='',
    help_text='Snapshot of original content_html'
)
content_after = models.TextField(
    blank=True,
    default='',
    help_text='Optimized HTML (null until optimization completes)'
)
metadata_before = models.JSONField(
    default=dict,
    blank=True,
    help_text='{meta_title, meta_description, headings[]}'
)
metadata_after = models.JSONField(
    default=dict,
    blank=True
)
schema_before = models.JSONField(default=dict, blank=True)
schema_after = models.JSONField(default=dict, blank=True)
structure_changes = models.JSONField(
    default=list,
    blank=True,
    help_text='[{change_type, description, before, after}]'
)
confidence_score = models.FloatField(
    null=True,
    blank=True,
    help_text='AI confidence in the quality of changes (0-1)'
)
applied = models.BooleanField(default=False)
applied_at = models.DateTimeField(null=True, blank=True)

Update STATUS choices on OptimizationTask:

STATUS_CHOICES = [
    ('pending', 'Pending'),
    ('analyzing', 'Analyzing'),
    ('optimizing', 'Optimizing'),
    ('review', 'Ready for Review'),
    ('applied', 'Applied'),
    ('rejected', 'Rejected'),
]

PK: BigAutoField (integer) — existing model Table: existing igny8_optimization_tasks table (no rename needed)

3.2 Migration

Single migration in the optimization app (or igny8_core migrations):

igny8_core/migrations/XXXX_extend_optimization_task.py

Operations:

AddField('OptimizationTask', 'content', ...) — FK to Content
AddField('OptimizationTask', 'primary_cluster', ...) — FK to Clusters
AddField('OptimizationTask', 'secondary_clusters', ...) — JSONField
AddField('OptimizationTask', 'keyword_targets', ...) — JSONField
AddField('OptimizationTask', 'optimization_type', ...) — CharField
AddField('OptimizationTask', 'intent_classification', ...) — CharField
AddField('OptimizationTask', 'score_before', ...) — FloatField
AddField('OptimizationTask', 'score_after', ...) — FloatField
AddField('OptimizationTask', 'content_before', ...) — TextField
AddField('OptimizationTask', 'content_after', ...) — TextField
AddField('OptimizationTask', 'metadata_before', ...) — JSONField
AddField('OptimizationTask', 'metadata_after', ...) — JSONField
AddField('OptimizationTask', 'schema_before', ...) — JSONField
AddField('OptimizationTask', 'schema_after', ...) — JSONField
AddField('OptimizationTask', 'structure_changes', ...) — JSONField
AddField('OptimizationTask', 'confidence_score', ...) — FloatField
AddField('OptimizationTask', 'applied', ...) — BooleanField
AddField('OptimizationTask', 'applied_at', ...) — DateTimeField

3.3 API Endpoints

All endpoints under /api/v1/optimizer/:

Method	Path	Description
POST	`/api/v1/optimizer/analyze/`	Analyze single content piece. Body: `{content_id}`. Returns scores + keyword coverage + heading analysis + recommendations. Does NOT rewrite.
POST	`/api/v1/optimizer/optimize/`	Run full optimization. Body: `{content_id, optimization_type}`. Creates OptimizationTask, runs analysis + rewrite, returns preview.
POST	`/api/v1/optimizer/preview/`	Preview changes without creating task. Body: `{content_id}`. Returns diff-style output.
POST	`/api/v1/optimizer/apply/{id}/`	Apply optimized version. Copies `content_after` → `Content.content_html`, updates metadata, sets `applied=True`.
POST	`/api/v1/optimizer/reject/{id}/`	Reject optimization. Sets status=`rejected`, keeps original content.
POST	`/api/v1/optimizer/batch/`	Queue batch optimization. Body: `{site_id, cluster_id?, score_threshold?, content_type?, content_ids?}`. Returns batch task ID.
GET	`/api/v1/optimizer/tasks/?site_id=X`	List OptimizationTask records with filters (status, optimization_type, cluster_id, date range).
GET	`/api/v1/optimizer/tasks/{id}/`	Single optimization detail with full before/after data.
GET	`/api/v1/optimizer/tasks/{id}/diff/`	HTML diff view — visual comparison of content_before vs content_after.
GET	`/api/v1/optimizer/cluster-suggestions/?content_id=X`	Suggest best-fit cluster for unassigned content. Returns top 3 candidates with confidence scores.
POST	`/api/v1/optimizer/assign-cluster/`	Assign cluster to content. Body: `{content_id, cluster_id}`. Updates Content record.
GET	`/api/v1/optimizer/dashboard/?site_id=X`	Optimization stats: avg score improvement, count by status, top improved, lowest scoring content.

Permissions: All endpoints use SiteSectorModelViewSet permission patterns.

3.4 AI Function — Enhanced optimize_content

Extend the existing registered optimize_content AI function:

Registry key: optimize_content (already registered — enhance, not replace) Location: igny8_core/ai/functions/optimize_content.py (existing file)

class OptimizeContentFunction(BaseAIFunction):
    """
    Enhanced cluster-aligned content optimization.
    Extends existing optimize_content with keyword coverage,
    heading restructure, intent classification, and scoring.
    """
    function_name = 'optimize_content'

    def validate(self, content_id, optimization_type='full_rewrite', **kwargs):
        # Verify content exists, has content_html
        # Verify optimization_type is valid
        pass

    def prepare(self, content_id, optimization_type='full_rewrite', **kwargs):
        # Load Content record
        # Determine cluster (from Content or auto-match)
        # Load cluster Keywords
        # Analyze current keyword coverage
        # Parse heading structure
        # Classify intent
        # Calculate score_before
        # Snapshot content_before, metadata_before, schema_before
        pass

    def build_prompt(self):
        # Build type-specific optimization prompt:
        # - Include current content_html
        # - Include cluster keywords with coverage status
        # - Include heading analysis results
        # - Include intent classification
        # - Include optimization_type instructions:
        #   full_rewrite: all optimizations
        #   heading_only: heading restructure only
        #   schema_only: schema fix only (no content change)
        #   keyword_coverage: add missing keyword sections only
        pass

    def parse_response(self, response):
        # Parse optimized HTML
        # Parse updated metadata (meta_title, meta_description)
        # Parse structure_changes list
        # Parse confidence_score
        pass

    def save_output(self, parsed):
        # Create OptimizationTask with all before/after data
        # Calculate score_after
        # Set status='review'
        pass

3.5 Content Scoring Service

Location: igny8_core/business/content_scoring.py

class ContentScoringService:
    """
    Calculates Content Quality Score (0-100) using 5 weighted factors.
    Used by optimizer for before/after and by dashboard for overview.
    """

    WEIGHTS = {
        'keyword_coverage': 0.30,
        'heading_structure': 0.20,
        'content_depth': 0.20,
        'readability': 0.15,
        'schema_completeness': 0.15,
    }

    def score(self, content_id, cluster_id=None):
        """
        Calculate composite score for a content record.
        Returns: {total: float, breakdown: {factor: score}}
        """
        pass

    def _score_keyword_coverage(self, content, cluster):
        """0-100: % of cluster keywords found in content."""
        pass

    def _score_heading_structure(self, content_html):
        """0-100: single H1, keyword H2s, no skipped levels, H2 count."""
        pass

    def _score_content_depth(self, content_html, content_structure):
        """0-100: word count vs minimum for structure type, section completeness."""
        pass

    def _score_readability(self, content_html):
        """0-100: avg sentence length, paragraph length, Flesch-Kincaid approx."""
        pass

    def _score_schema_completeness(self, content):
        """0-100: required schema fields present, from SchemaValidationService (02G)."""
        pass

3.6 Keyword Coverage Analyzer

Location: igny8_core/business/keyword_coverage.py

class KeywordCoverageAnalyzer:
    """
    Analyzes content against cluster keyword set.
    Returns per-keyword presence and overall coverage percentage.
    """

    def analyze(self, content_id, cluster_id):
        """
        Returns {
            total_keywords: int,
            covered: int,
            missing: int,
            coverage_pct: float,
            keywords: [{keyword, target_density, current_density, status}]
        }
        """
        pass

    def _extract_text(self, content_html):
        """Strip HTML, return plain text for analysis."""
        pass

    def _check_keyword(self, keyword, text):
        """Check for exact, partial (stemmed), and semantic presence."""
        pass

3.7 Celery Tasks

Location: igny8_core/tasks/optimization_tasks.py

@shared_task(name='run_optimization')
def run_optimization(optimization_task_id):
    """Process a single OptimizationTask. Called by API endpoints."""
    pass

@shared_task(name='run_batch_optimization')
def run_batch_optimization(site_id, cluster_id=None, score_threshold=None,
                           content_type=None, content_ids=None, batch_size=10):
    """
    Process batch of content for optimization.
    Selects content matching filters, creates OptimizationTask per item,
    processes sequentially with max 3 concurrent per account.
    """
    pass

@shared_task(name='identify_optimization_candidates')
def identify_optimization_candidates(site_id, threshold=50):
    """
    Weekly scan: find content with quality score below threshold.
    Creates report, does NOT auto-optimize.
    """
    pass

Beat Schedule Addition:

Task	Schedule	Notes
`identify_optimization_candidates`	Weekly (Monday 4:00 AM)	Scans all sites, identifies low-scoring content

4. IMPLEMENTATION STEPS

Step 1: Migration

Add 16 new fields to OptimizationTask model
Update STATUS_CHOICES on OptimizationTask
Run migration

Step 2: Services

Implement ContentScoringService in igny8_core/business/content_scoring.py
Implement KeywordCoverageAnalyzer in igny8_core/business/keyword_coverage.py

Step 3: AI Function Enhancement

Extend OptimizeContentFunction in igny8_core/ai/functions/optimize_content.py
Add cluster-alignment, keyword coverage, heading analysis, intent classification, scoring
Maintain backward compatibility — existing optimize_content calls still work

Step 4: API Endpoints

Add optimizer endpoints to igny8_core/urls/optimizer.py (or create if doesn't exist)
Create views: AnalyzeView, OptimizeView, PreviewView, ApplyView, RejectView, BatchView
Create ClusterSuggestionsView, AssignClusterView, DashboardView, DiffView
Register URL patterns under /api/v1/optimizer/

Step 5: Celery Tasks

Implement run_optimization, run_batch_optimization, identify_optimization_candidates
Add identify_optimization_candidates to Celery beat schedule

Step 6: Serializers & Admin

Update DRF serializer for extended OptimizationTask (include all 16 new fields)
Create nested serializers for before/after views
Update Django admin registration

Step 7: Credit Cost Configuration

Add to CreditCostConfig (billing app):

operation_type	default_cost	description
`optimization_analysis`	2	Analyze single content (scoring + keyword coverage)
`optimization_full_rewrite`	5-8	Full rewrite optimization (varies by content length)
`optimization_schema_only`	1	Schema gap fix only
`optimization_batch`	15-25	Batch optimization for 10 items

Credit deduction follows existing CreditUsageLog pattern.

5. ACCEPTANCE CRITERIA

Cluster Matching

Content without cluster assignment gets auto-matched with confidence scoring
Confidence ≥ 0.6 auto-assigns; < 0.6 flags for manual review with top 3 suggestions
Cluster suggestions endpoint returns ranked candidates

Keyword Coverage

All cluster keywords analyzed for presence in content
Coverage report includes exact match, partial match, and missing keywords
Hub content targets 70%+, supporting articles 40%+, product/service 30%+

Heading Restructure

H1/H2/H3 hierarchy validated (single H1, no skipped levels)
Missing keyword themes identified and new headings suggested
AI rewrites headings incorporating target keywords while maintaining meaning

Content Rewrite

Intent classified correctly (informational/commercial/transactional)
Rewrite adjusts content structure based on intent
Thin content expanded, bloated content compressed
Missing keyword sections added

Scoring

Score 0-100 calculated with 5 weighted factors
score_before recorded before any changes
score_after recorded after optimization
Dashboard shows average improvement and distribution

Before/After

Full snapshot of original content preserved in content_before
Optimized version stored in content_after without auto-applying
Diff view provides visual HTML comparison
Apply action copies content_after → Content.content_html
Reject action preserves original, marks task rejected

Batch

Batch optimization selects content by cluster, score threshold, type, or explicit IDs
Max 3 concurrent optimizations per account enforced
Progress trackable via OptimizationTask status
Weekly candidate identification runs without auto-optimizing

Integration

Schema gap detection leverages SchemaValidationService from 02G
Credit costs deducted per CreditCostConfig entries
All API endpoints respect account/site permission boundaries

6. CLAUDE CODE INSTRUCTIONS

File Locations

igny8_core/
├── ai/
│   └── functions/
│       └── optimize_content.py       # Enhance existing function
├── business/
│   ├── content_scoring.py            # ContentScoringService
│   └── keyword_coverage.py           # KeywordCoverageAnalyzer
├── tasks/
│   └── optimization_tasks.py         # Celery tasks
├── urls/
│   └── optimizer.py                  # Optimizer endpoints
└── migrations/
    └── XXXX_extend_optimization_task.py

Conventions

PKs: BigAutoField (integer) — do NOT use UUIDs
Table prefix: igny8_ (existing table igny8_optimization_tasks)
Celery app name: igny8_core
URL pattern: /api/v1/optimizer/...
Permissions: Use SiteSectorModelViewSet permission pattern
AI functions: Extend existing BaseAIFunction subclass — do NOT create a new registration key, enhance the existing optimize_content
Frontend: .tsx files with Zustand stores for state management

Cross-References

Doc	Relationship
02B	Taxonomy terms get cluster context for optimization; ClusterMappingService scoring pattern reused
02G	SchemaValidationService used for schema gap detection; schema_only optimization triggers 02G schema generation
02C	GSC position data identifies pages needing optimization (high impressions, low clicks)
02D	Optimizer identifies internal link opportunities and feeds them to linker
01E	Blueprint-aware pipeline sets initial content quality; optimizer improves post-generation
01A	SAGBlueprint/SAGCluster data provides cluster context for optimization
01G	SAG health monitoring can incorporate content quality scores as a health factor

Key Decisions

Extend, don't replace — The existing OptimizationTask model and optimize_content AI function are enhanced, not replaced with new models
Preview-first workflow — Optimizations always produce a preview (status=review) before applying to Content
Content snapshot — Full HTML snapshot stored in content_before for rollback capability
Score reuse — ContentScoringService is a standalone service usable by other modules (02G schema audit, 01G health monitoring)
Schema delegation — Schema gap detection reuses 02G's SchemaValidationService rather than duplicating logic

24 KiB Raw Blame History