1
This commit is contained in:
602
v2/V2-Execution-Docs/02F-optimizer.md
Normal file
602
v2/V2-Execution-Docs/02F-optimizer.md
Normal file
@@ -0,0 +1,602 @@
|
||||
# IGNY8 Phase 2: Content Optimizer (02F)
|
||||
## Cluster-Aligned Content Optimization Engine
|
||||
|
||||
**Document Version:** 1.0
|
||||
**Date:** 2026-03-23
|
||||
**Phase:** IGNY8 Phase 2 — Feature Expansion
|
||||
**Status:** Build Ready
|
||||
**Source of Truth:** Codebase at `/data/app/igny8/`
|
||||
**Audience:** Claude Code, Backend Developers, Architects
|
||||
|
||||
---
|
||||
|
||||
## 1. CURRENT STATE
|
||||
|
||||
### Optimization App Today
|
||||
The `optimization` Django app exists in `INSTALLED_APPS` but is **inactive** (behind feature flag). The following exist:
|
||||
|
||||
- **`OptimizationTask` model** — exists with minimal fields (basic task tracking only)
|
||||
- **`optimize_content` AI function** — registered in `igny8_core/ai/registry.py` as one of the 7 registered functions, but only does basic content rewriting without cluster awareness, keyword coverage analysis, or scoring
|
||||
- **`optimization` app label** — app exists at `igny8_core/modules/optimization/`
|
||||
|
||||
### What Does Not Exist
|
||||
- No cluster-alignment during optimization
|
||||
- No keyword coverage analysis against cluster keyword sets
|
||||
- No heading restructure logic
|
||||
- No intent-based content rewrite
|
||||
- No schema gap detection
|
||||
- No before/after scoring system (0-100)
|
||||
- No batch optimization
|
||||
- No integration with SAG data (01A) or taxonomy terms (02B)
|
||||
|
||||
### Foundation Available
|
||||
- `Clusters` model (app_label=`planner`, db_table=`igny8_clusters`) with cluster keywords
|
||||
- `Keywords` model (app_label=`planner`, db_table=`igny8_keywords`) linked to clusters
|
||||
- `Content.schema_markup` JSONField — used by 02G for JSON-LD
|
||||
- `Content.content_type` and `Content.content_structure` — routing context
|
||||
- `Content.structured_data` JSONField (added by 02A)
|
||||
- `ContentTaxonomy` cluster mapping (added by 02B) with `mapping_confidence`
|
||||
- `GSCMetricsCache` (added by 02C) — position data identifies pages needing optimization
|
||||
- `SchemaValidationService` (added by 02G) — schema gap detection reuse
|
||||
- `BaseAIFunction` with `validate()`, `prepare()`, `build_prompt()`, `parse_response()`, `save_output()`
|
||||
|
||||
---
|
||||
|
||||
## 2. WHAT TO BUILD
|
||||
|
||||
### Overview
|
||||
Extend the existing `OptimizationTask` model and `optimize_content` AI function into a full cluster-aligned optimization engine. The system analyzes content against its cluster's keyword set, scores quality on a 0-100 scale, and produces optimized content with tracked before/after metrics.
|
||||
|
||||
### 2.1 Cluster Matching (Auto-Assign Optimization Context)
|
||||
|
||||
When content has no cluster assignment, the optimizer auto-detects the best-fit cluster:
|
||||
|
||||
**Scoring Algorithm:**
|
||||
- Keyword overlap (40%): count of cluster keywords found in content title + headings + body
|
||||
- Semantic similarity (40%): AI-scored relevance between content topic and cluster theme
|
||||
- Title match (20%): similarity between content title and cluster name/keywords
|
||||
|
||||
**Thresholds:**
|
||||
- Confidence ≥ 0.6 → auto-assign cluster
|
||||
- Confidence < 0.6 → flag for manual review, suggest top 3 candidates
|
||||
|
||||
This reuses the same scoring pattern as `ClusterMappingService` from 02B.
|
||||
|
||||
### 2.2 Keyword Coverage Analysis
|
||||
|
||||
For content with an assigned cluster:
|
||||
|
||||
1. Load all `Keywords` records belonging to that cluster
|
||||
2. Scan `content_html` for each keyword: exact match, partial match (stemmed), semantic presence
|
||||
3. Report per keyword: `{keyword, target_density, current_density, status: present|missing|low_density}`
|
||||
4. Coverage targets:
|
||||
- Hub content (`cluster_hub`): 70%+ of cluster keywords covered
|
||||
- Supporting articles: 40%+ of cluster keywords covered
|
||||
- Product/service pages: 30%+ (focused on commercial keywords)
|
||||
|
||||
### 2.3 Heading Restructure
|
||||
|
||||
Analyze H1/H2/H3 hierarchy for SEO best practices:
|
||||
|
||||
| Check | Rule | Fix |
|
||||
|-------|------|-----|
|
||||
| Single H1 | Content must have exactly one H1 | Merge or demote extra H1s |
|
||||
| H2 keyword coverage | H2s should contain target keywords from cluster | AI rewrites H2s with keyword incorporation |
|
||||
| Logical hierarchy | No skipped levels (H1 → H3 without H2) | Insert missing levels |
|
||||
| H2 count | Minimum 3 H2s for content >1000 words | AI suggests additional H2 sections |
|
||||
| Missing keyword themes | Cluster keywords not represented in any heading | AI suggests new H2/H3 sections for missing themes |
|
||||
|
||||
### 2.4 Content Rewrite (Intent-Aligned)
|
||||
|
||||
**Intent Classification:**
|
||||
- **Informational**: expand explanations, add examples, increase depth, add definitions
|
||||
- **Commercial**: add comparison tables, pros/cons, feature highlights, trust signals
|
||||
- **Transactional**: strengthen CTAs, add urgency, streamline conversion path, social proof
|
||||
|
||||
**Content Adjustments:**
|
||||
- Expand thin content (<500 words) to minimum viable length for the content structure
|
||||
- Compress bloated content (detect and remove redundancy)
|
||||
- Add missing sections identified by keyword coverage analysis
|
||||
- Maintain existing tone and style while improving SEO alignment
|
||||
|
||||
### 2.5 Schema Gap Detection
|
||||
|
||||
Leverages `SchemaValidationService` from 02G:
|
||||
|
||||
1. Check existing `Content.schema_markup` against expected schemas for the content type
|
||||
2. Expected schema by type: Article (post), Product (product), Service (service_page), FAQPage (if FAQ detected), BreadcrumbList (all), HowTo (if steps detected)
|
||||
3. Identify missing required fields per schema type
|
||||
4. Generate corrected/complete schema JSON-LD
|
||||
5. Schema-only optimization mode available (no content rewrite, just schema fix)
|
||||
|
||||
### 2.6 Before/After Scoring
|
||||
|
||||
**Content Quality Score (0-100):**
|
||||
|
||||
| Factor | Weight | Score Criteria |
|
||||
|--------|--------|---------------|
|
||||
| Keyword Coverage | 30% | % of cluster keywords present vs target |
|
||||
| Heading Structure | 20% | Single H1, keyword H2s, logical hierarchy, no skipped levels |
|
||||
| Content Depth | 20% | Word count vs structure minimum, section completeness, detail level |
|
||||
| Readability | 15% | Sentence length, paragraph length, Flesch-Kincaid approximation |
|
||||
| Schema Completeness | 15% | Required schema fields present, validation passes |
|
||||
|
||||
Every optimization records `score_before` and `score_after`. Dashboard aggregates show average improvement across all optimizations.
|
||||
|
||||
### 2.7 Batch Optimization
|
||||
|
||||
- Select content by: cluster ID, score threshold (e.g., all content scoring < 50), content type, date range
|
||||
- Queue as Celery tasks with priority ordering (lowest scores first)
|
||||
- Concurrency: max 3 concurrent optimization tasks per account
|
||||
- Progress tracking via OptimizationTask status field
|
||||
- Cancel capability: change status to `rejected` to stop processing
|
||||
|
||||
---
|
||||
|
||||
## 3. DATA MODELS & APIS
|
||||
|
||||
### 3.1 Modified Model — OptimizationTask (optimization app)
|
||||
|
||||
Extend the existing `OptimizationTask` model with 16 new fields:
|
||||
|
||||
```python
|
||||
# Add to existing OptimizationTask model:
|
||||
|
||||
content = models.ForeignKey(
|
||||
'writer.Content',
|
||||
on_delete=models.CASCADE,
|
||||
related_name='optimization_tasks'
|
||||
)
|
||||
primary_cluster = models.ForeignKey(
|
||||
'planner.Clusters',
|
||||
on_delete=models.SET_NULL,
|
||||
null=True,
|
||||
blank=True,
|
||||
related_name='optimization_tasks'
|
||||
)
|
||||
secondary_clusters = models.JSONField(
|
||||
default=list,
|
||||
blank=True,
|
||||
help_text='List of Clusters IDs for secondary relevance'
|
||||
)
|
||||
keyword_targets = models.JSONField(
|
||||
default=list,
|
||||
blank=True,
|
||||
help_text='[{keyword, target_density, current_density, status}]'
|
||||
)
|
||||
optimization_type = models.CharField(
|
||||
max_length=20,
|
||||
choices=[
|
||||
('full_rewrite', 'Full Rewrite'),
|
||||
('heading_only', 'Heading Only'),
|
||||
('schema_only', 'Schema Only'),
|
||||
('keyword_coverage', 'Keyword Coverage'),
|
||||
('batch', 'Batch'),
|
||||
],
|
||||
default='full_rewrite'
|
||||
)
|
||||
intent_classification = models.CharField(
|
||||
max_length=15,
|
||||
choices=[
|
||||
('informational', 'Informational'),
|
||||
('commercial', 'Commercial'),
|
||||
('transactional', 'Transactional'),
|
||||
],
|
||||
blank=True,
|
||||
default=''
|
||||
)
|
||||
score_before = models.FloatField(null=True, blank=True)
|
||||
score_after = models.FloatField(null=True, blank=True)
|
||||
content_before = models.TextField(
|
||||
blank=True,
|
||||
default='',
|
||||
help_text='Snapshot of original content_html'
|
||||
)
|
||||
content_after = models.TextField(
|
||||
blank=True,
|
||||
default='',
|
||||
help_text='Optimized HTML (null until optimization completes)'
|
||||
)
|
||||
metadata_before = models.JSONField(
|
||||
default=dict,
|
||||
blank=True,
|
||||
help_text='{meta_title, meta_description, headings[]}'
|
||||
)
|
||||
metadata_after = models.JSONField(
|
||||
default=dict,
|
||||
blank=True
|
||||
)
|
||||
schema_before = models.JSONField(default=dict, blank=True)
|
||||
schema_after = models.JSONField(default=dict, blank=True)
|
||||
structure_changes = models.JSONField(
|
||||
default=list,
|
||||
blank=True,
|
||||
help_text='[{change_type, description, before, after}]'
|
||||
)
|
||||
confidence_score = models.FloatField(
|
||||
null=True,
|
||||
blank=True,
|
||||
help_text='AI confidence in the quality of changes (0-1)'
|
||||
)
|
||||
applied = models.BooleanField(default=False)
|
||||
applied_at = models.DateTimeField(null=True, blank=True)
|
||||
```
|
||||
|
||||
**Update STATUS choices on OptimizationTask:**
|
||||
```python
|
||||
STATUS_CHOICES = [
|
||||
('pending', 'Pending'),
|
||||
('analyzing', 'Analyzing'),
|
||||
('optimizing', 'Optimizing'),
|
||||
('review', 'Ready for Review'),
|
||||
('applied', 'Applied'),
|
||||
('rejected', 'Rejected'),
|
||||
]
|
||||
```
|
||||
|
||||
**PK:** BigAutoField (integer) — existing model
|
||||
**Table:** existing `igny8_optimization_tasks` table (no rename needed)
|
||||
|
||||
### 3.2 Migration
|
||||
|
||||
Single migration in the optimization app (or igny8_core migrations):
|
||||
|
||||
```
|
||||
igny8_core/migrations/XXXX_extend_optimization_task.py
|
||||
```
|
||||
|
||||
**Operations:**
|
||||
1. `AddField('OptimizationTask', 'content', ...)` — FK to Content
|
||||
2. `AddField('OptimizationTask', 'primary_cluster', ...)` — FK to Clusters
|
||||
3. `AddField('OptimizationTask', 'secondary_clusters', ...)` — JSONField
|
||||
4. `AddField('OptimizationTask', 'keyword_targets', ...)` — JSONField
|
||||
5. `AddField('OptimizationTask', 'optimization_type', ...)` — CharField
|
||||
6. `AddField('OptimizationTask', 'intent_classification', ...)` — CharField
|
||||
7. `AddField('OptimizationTask', 'score_before', ...)` — FloatField
|
||||
8. `AddField('OptimizationTask', 'score_after', ...)` — FloatField
|
||||
9. `AddField('OptimizationTask', 'content_before', ...)` — TextField
|
||||
10. `AddField('OptimizationTask', 'content_after', ...)` — TextField
|
||||
11. `AddField('OptimizationTask', 'metadata_before', ...)` — JSONField
|
||||
12. `AddField('OptimizationTask', 'metadata_after', ...)` — JSONField
|
||||
13. `AddField('OptimizationTask', 'schema_before', ...)` — JSONField
|
||||
14. `AddField('OptimizationTask', 'schema_after', ...)` — JSONField
|
||||
15. `AddField('OptimizationTask', 'structure_changes', ...)` — JSONField
|
||||
16. `AddField('OptimizationTask', 'confidence_score', ...)` — FloatField
|
||||
17. `AddField('OptimizationTask', 'applied', ...)` — BooleanField
|
||||
18. `AddField('OptimizationTask', 'applied_at', ...)` — DateTimeField
|
||||
|
||||
### 3.3 API Endpoints
|
||||
|
||||
All endpoints under `/api/v1/optimizer/`:
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| POST | `/api/v1/optimizer/analyze/` | Analyze single content piece. Body: `{content_id}`. Returns scores + keyword coverage + heading analysis + recommendations. Does NOT rewrite. |
|
||||
| POST | `/api/v1/optimizer/optimize/` | Run full optimization. Body: `{content_id, optimization_type}`. Creates OptimizationTask, runs analysis + rewrite, returns preview. |
|
||||
| POST | `/api/v1/optimizer/preview/` | Preview changes without creating task. Body: `{content_id}`. Returns diff-style output. |
|
||||
| POST | `/api/v1/optimizer/apply/{id}/` | Apply optimized version. Copies `content_after` → `Content.content_html`, updates metadata, sets `applied=True`. |
|
||||
| POST | `/api/v1/optimizer/reject/{id}/` | Reject optimization. Sets status=`rejected`, keeps original content. |
|
||||
| POST | `/api/v1/optimizer/batch/` | Queue batch optimization. Body: `{site_id, cluster_id?, score_threshold?, content_type?, content_ids?}`. Returns batch task ID. |
|
||||
| GET | `/api/v1/optimizer/tasks/?site_id=X` | List OptimizationTask records with filters (status, optimization_type, cluster_id, date range). |
|
||||
| GET | `/api/v1/optimizer/tasks/{id}/` | Single optimization detail with full before/after data. |
|
||||
| GET | `/api/v1/optimizer/tasks/{id}/diff/` | HTML diff view — visual comparison of content_before vs content_after. |
|
||||
| GET | `/api/v1/optimizer/cluster-suggestions/?content_id=X` | Suggest best-fit cluster for unassigned content. Returns top 3 candidates with confidence scores. |
|
||||
| POST | `/api/v1/optimizer/assign-cluster/` | Assign cluster to content. Body: `{content_id, cluster_id}`. Updates Content record. |
|
||||
| GET | `/api/v1/optimizer/dashboard/?site_id=X` | Optimization stats: avg score improvement, count by status, top improved, lowest scoring content. |
|
||||
|
||||
**Permissions:** All endpoints use `SiteSectorModelViewSet` permission patterns.
|
||||
|
||||
### 3.4 AI Function — Enhanced optimize_content
|
||||
|
||||
Extend the existing registered `optimize_content` AI function:
|
||||
|
||||
**Registry key:** `optimize_content` (already registered — enhance, not replace)
|
||||
**Location:** `igny8_core/ai/functions/optimize_content.py` (existing file)
|
||||
|
||||
```python
|
||||
class OptimizeContentFunction(BaseAIFunction):
|
||||
"""
|
||||
Enhanced cluster-aligned content optimization.
|
||||
Extends existing optimize_content with keyword coverage,
|
||||
heading restructure, intent classification, and scoring.
|
||||
"""
|
||||
function_name = 'optimize_content'
|
||||
|
||||
def validate(self, content_id, optimization_type='full_rewrite', **kwargs):
|
||||
# Verify content exists, has content_html
|
||||
# Verify optimization_type is valid
|
||||
pass
|
||||
|
||||
def prepare(self, content_id, optimization_type='full_rewrite', **kwargs):
|
||||
# Load Content record
|
||||
# Determine cluster (from Content or auto-match)
|
||||
# Load cluster Keywords
|
||||
# Analyze current keyword coverage
|
||||
# Parse heading structure
|
||||
# Classify intent
|
||||
# Calculate score_before
|
||||
# Snapshot content_before, metadata_before, schema_before
|
||||
pass
|
||||
|
||||
def build_prompt(self):
|
||||
# Build type-specific optimization prompt:
|
||||
# - Include current content_html
|
||||
# - Include cluster keywords with coverage status
|
||||
# - Include heading analysis results
|
||||
# - Include intent classification
|
||||
# - Include optimization_type instructions:
|
||||
# full_rewrite: all optimizations
|
||||
# heading_only: heading restructure only
|
||||
# schema_only: schema fix only (no content change)
|
||||
# keyword_coverage: add missing keyword sections only
|
||||
pass
|
||||
|
||||
def parse_response(self, response):
|
||||
# Parse optimized HTML
|
||||
# Parse updated metadata (meta_title, meta_description)
|
||||
# Parse structure_changes list
|
||||
# Parse confidence_score
|
||||
pass
|
||||
|
||||
def save_output(self, parsed):
|
||||
# Create OptimizationTask with all before/after data
|
||||
# Calculate score_after
|
||||
# Set status='review'
|
||||
pass
|
||||
```
|
||||
|
||||
### 3.5 Content Scoring Service
|
||||
|
||||
**Location:** `igny8_core/business/content_scoring.py`
|
||||
|
||||
```python
|
||||
class ContentScoringService:
|
||||
"""
|
||||
Calculates Content Quality Score (0-100) using 5 weighted factors.
|
||||
Used by optimizer for before/after and by dashboard for overview.
|
||||
"""
|
||||
|
||||
WEIGHTS = {
|
||||
'keyword_coverage': 0.30,
|
||||
'heading_structure': 0.20,
|
||||
'content_depth': 0.20,
|
||||
'readability': 0.15,
|
||||
'schema_completeness': 0.15,
|
||||
}
|
||||
|
||||
def score(self, content_id, cluster_id=None):
|
||||
"""
|
||||
Calculate composite score for a content record.
|
||||
Returns: {total: float, breakdown: {factor: score}}
|
||||
"""
|
||||
pass
|
||||
|
||||
def _score_keyword_coverage(self, content, cluster):
|
||||
"""0-100: % of cluster keywords found in content."""
|
||||
pass
|
||||
|
||||
def _score_heading_structure(self, content_html):
|
||||
"""0-100: single H1, keyword H2s, no skipped levels, H2 count."""
|
||||
pass
|
||||
|
||||
def _score_content_depth(self, content_html, content_structure):
|
||||
"""0-100: word count vs minimum for structure type, section completeness."""
|
||||
pass
|
||||
|
||||
def _score_readability(self, content_html):
|
||||
"""0-100: avg sentence length, paragraph length, Flesch-Kincaid approx."""
|
||||
pass
|
||||
|
||||
def _score_schema_completeness(self, content):
|
||||
"""0-100: required schema fields present, from SchemaValidationService (02G)."""
|
||||
pass
|
||||
```
|
||||
|
||||
### 3.6 Keyword Coverage Analyzer
|
||||
|
||||
**Location:** `igny8_core/business/keyword_coverage.py`
|
||||
|
||||
```python
|
||||
class KeywordCoverageAnalyzer:
|
||||
"""
|
||||
Analyzes content against cluster keyword set.
|
||||
Returns per-keyword presence and overall coverage percentage.
|
||||
"""
|
||||
|
||||
def analyze(self, content_id, cluster_id):
|
||||
"""
|
||||
Returns {
|
||||
total_keywords: int,
|
||||
covered: int,
|
||||
missing: int,
|
||||
coverage_pct: float,
|
||||
keywords: [{keyword, target_density, current_density, status}]
|
||||
}
|
||||
"""
|
||||
pass
|
||||
|
||||
def _extract_text(self, content_html):
|
||||
"""Strip HTML, return plain text for analysis."""
|
||||
pass
|
||||
|
||||
def _check_keyword(self, keyword, text):
|
||||
"""Check for exact, partial (stemmed), and semantic presence."""
|
||||
pass
|
||||
```
|
||||
|
||||
### 3.7 Celery Tasks
|
||||
|
||||
**Location:** `igny8_core/tasks/optimization_tasks.py`
|
||||
|
||||
```python
|
||||
@shared_task(name='run_optimization')
|
||||
def run_optimization(optimization_task_id):
|
||||
"""Process a single OptimizationTask. Called by API endpoints."""
|
||||
pass
|
||||
|
||||
@shared_task(name='run_batch_optimization')
|
||||
def run_batch_optimization(site_id, cluster_id=None, score_threshold=None,
|
||||
content_type=None, content_ids=None, batch_size=10):
|
||||
"""
|
||||
Process batch of content for optimization.
|
||||
Selects content matching filters, creates OptimizationTask per item,
|
||||
processes sequentially with max 3 concurrent per account.
|
||||
"""
|
||||
pass
|
||||
|
||||
@shared_task(name='identify_optimization_candidates')
|
||||
def identify_optimization_candidates(site_id, threshold=50):
|
||||
"""
|
||||
Weekly scan: find content with quality score below threshold.
|
||||
Creates report, does NOT auto-optimize.
|
||||
"""
|
||||
pass
|
||||
```
|
||||
|
||||
**Beat Schedule Addition:**
|
||||
|
||||
| Task | Schedule | Notes |
|
||||
|------|----------|-------|
|
||||
| `identify_optimization_candidates` | Weekly (Monday 4:00 AM) | Scans all sites, identifies low-scoring content |
|
||||
|
||||
---
|
||||
|
||||
## 4. IMPLEMENTATION STEPS
|
||||
|
||||
### Step 1: Migration
|
||||
1. Add 16 new fields to `OptimizationTask` model
|
||||
2. Update STATUS_CHOICES on OptimizationTask
|
||||
3. Run migration
|
||||
|
||||
### Step 2: Services
|
||||
1. Implement `ContentScoringService` in `igny8_core/business/content_scoring.py`
|
||||
2. Implement `KeywordCoverageAnalyzer` in `igny8_core/business/keyword_coverage.py`
|
||||
|
||||
### Step 3: AI Function Enhancement
|
||||
1. Extend `OptimizeContentFunction` in `igny8_core/ai/functions/optimize_content.py`
|
||||
2. Add cluster-alignment, keyword coverage, heading analysis, intent classification, scoring
|
||||
3. Maintain backward compatibility — existing `optimize_content` calls still work
|
||||
|
||||
### Step 4: API Endpoints
|
||||
1. Add optimizer endpoints to `igny8_core/urls/optimizer.py` (or create if doesn't exist)
|
||||
2. Create views: `AnalyzeView`, `OptimizeView`, `PreviewView`, `ApplyView`, `RejectView`, `BatchView`
|
||||
3. Create `ClusterSuggestionsView`, `AssignClusterView`, `DashboardView`, `DiffView`
|
||||
4. Register URL patterns under `/api/v1/optimizer/`
|
||||
|
||||
### Step 5: Celery Tasks
|
||||
1. Implement `run_optimization`, `run_batch_optimization`, `identify_optimization_candidates`
|
||||
2. Add `identify_optimization_candidates` to Celery beat schedule
|
||||
|
||||
### Step 6: Serializers & Admin
|
||||
1. Update DRF serializer for extended OptimizationTask (include all 16 new fields)
|
||||
2. Create nested serializers for before/after views
|
||||
3. Update Django admin registration
|
||||
|
||||
### Step 7: Credit Cost Configuration
|
||||
Add to `CreditCostConfig` (billing app):
|
||||
|
||||
| operation_type | default_cost | description |
|
||||
|---------------|-------------|-------------|
|
||||
| `optimization_analysis` | 2 | Analyze single content (scoring + keyword coverage) |
|
||||
| `optimization_full_rewrite` | 5-8 | Full rewrite optimization (varies by content length) |
|
||||
| `optimization_schema_only` | 1 | Schema gap fix only |
|
||||
| `optimization_batch` | 15-25 | Batch optimization for 10 items |
|
||||
|
||||
Credit deduction follows existing `CreditUsageLog` pattern.
|
||||
|
||||
---
|
||||
|
||||
## 5. ACCEPTANCE CRITERIA
|
||||
|
||||
### Cluster Matching
|
||||
- [ ] Content without cluster assignment gets auto-matched with confidence scoring
|
||||
- [ ] Confidence ≥ 0.6 auto-assigns; < 0.6 flags for manual review with top 3 suggestions
|
||||
- [ ] Cluster suggestions endpoint returns ranked candidates
|
||||
|
||||
### Keyword Coverage
|
||||
- [ ] All cluster keywords analyzed for presence in content
|
||||
- [ ] Coverage report includes exact match, partial match, and missing keywords
|
||||
- [ ] Hub content targets 70%+, supporting articles 40%+, product/service 30%+
|
||||
|
||||
### Heading Restructure
|
||||
- [ ] H1/H2/H3 hierarchy validated (single H1, no skipped levels)
|
||||
- [ ] Missing keyword themes identified and new headings suggested
|
||||
- [ ] AI rewrites headings incorporating target keywords while maintaining meaning
|
||||
|
||||
### Content Rewrite
|
||||
- [ ] Intent classified correctly (informational/commercial/transactional)
|
||||
- [ ] Rewrite adjusts content structure based on intent
|
||||
- [ ] Thin content expanded, bloated content compressed
|
||||
- [ ] Missing keyword sections added
|
||||
|
||||
### Scoring
|
||||
- [ ] Score 0-100 calculated with 5 weighted factors
|
||||
- [ ] score_before recorded before any changes
|
||||
- [ ] score_after recorded after optimization
|
||||
- [ ] Dashboard shows average improvement and distribution
|
||||
|
||||
### Before/After
|
||||
- [ ] Full snapshot of original content preserved in content_before
|
||||
- [ ] Optimized version stored in content_after without auto-applying
|
||||
- [ ] Diff view provides visual HTML comparison
|
||||
- [ ] Apply action copies content_after → Content.content_html
|
||||
- [ ] Reject action preserves original, marks task rejected
|
||||
|
||||
### Batch
|
||||
- [ ] Batch optimization selects content by cluster, score threshold, type, or explicit IDs
|
||||
- [ ] Max 3 concurrent optimizations per account enforced
|
||||
- [ ] Progress trackable via OptimizationTask status
|
||||
- [ ] Weekly candidate identification runs without auto-optimizing
|
||||
|
||||
### Integration
|
||||
- [ ] Schema gap detection leverages SchemaValidationService from 02G
|
||||
- [ ] Credit costs deducted per CreditCostConfig entries
|
||||
- [ ] All API endpoints respect account/site permission boundaries
|
||||
|
||||
---
|
||||
|
||||
## 6. CLAUDE CODE INSTRUCTIONS
|
||||
|
||||
### File Locations
|
||||
```
|
||||
igny8_core/
|
||||
├── ai/
|
||||
│ └── functions/
|
||||
│ └── optimize_content.py # Enhance existing function
|
||||
├── business/
|
||||
│ ├── content_scoring.py # ContentScoringService
|
||||
│ └── keyword_coverage.py # KeywordCoverageAnalyzer
|
||||
├── tasks/
|
||||
│ └── optimization_tasks.py # Celery tasks
|
||||
├── urls/
|
||||
│ └── optimizer.py # Optimizer endpoints
|
||||
└── migrations/
|
||||
└── XXXX_extend_optimization_task.py
|
||||
```
|
||||
|
||||
### Conventions
|
||||
- **PKs:** BigAutoField (integer) — do NOT use UUIDs
|
||||
- **Table prefix:** `igny8_` (existing table `igny8_optimization_tasks`)
|
||||
- **Celery app name:** `igny8_core`
|
||||
- **URL pattern:** `/api/v1/optimizer/...`
|
||||
- **Permissions:** Use `SiteSectorModelViewSet` permission pattern
|
||||
- **AI functions:** Extend existing `BaseAIFunction` subclass — do NOT create a new registration key, enhance the existing `optimize_content`
|
||||
- **Frontend:** `.tsx` files with Zustand stores for state management
|
||||
|
||||
### Cross-References
|
||||
| Doc | Relationship |
|
||||
|-----|-------------|
|
||||
| **02B** | Taxonomy terms get cluster context for optimization; ClusterMappingService scoring pattern reused |
|
||||
| **02G** | SchemaValidationService used for schema gap detection; schema_only optimization triggers 02G schema generation |
|
||||
| **02C** | GSC position data identifies pages needing optimization (high impressions, low clicks) |
|
||||
| **02D** | Optimizer identifies internal link opportunities and feeds them to linker |
|
||||
| **01E** | Blueprint-aware pipeline sets initial content quality; optimizer improves post-generation |
|
||||
| **01A** | SAGBlueprint/SAGCluster data provides cluster context for optimization |
|
||||
| **01G** | SAG health monitoring can incorporate content quality scores as a health factor |
|
||||
|
||||
### Key Decisions
|
||||
1. **Extend, don't replace** — The existing `OptimizationTask` model and `optimize_content` AI function are enhanced, not replaced with new models
|
||||
2. **Preview-first workflow** — Optimizations always produce a preview (status=`review`) before applying to Content
|
||||
3. **Content snapshot** — Full HTML snapshot stored in `content_before` for rollback capability
|
||||
4. **Score reuse** — `ContentScoringService` is a standalone service usable by other modules (02G schema audit, 01G health monitoring)
|
||||
5. **Schema delegation** — Schema gap detection reuses 02G's `SchemaValidationService` rather than duplicating logic
|
||||
Reference in New Issue
Block a user