Files
igny8/v2/V2-Execution-Docs/02G-rich-schema-serp.md
IGNY8 VPS (Salman) 0570052fec 1
2026-03-23 17:20:51 +00:00

703 lines
30 KiB
Markdown

# IGNY8 Phase 2: Rich Schema & SERP Enhancement (02G)
## JSON-LD Schema Generation & On-Page SERP Element Injection
**Document Version:** 1.0
**Date:** 2026-03-23
**Phase:** IGNY8 Phase 2 — Feature Expansion
**Status:** Build Ready
**Source of Truth:** Codebase at `/data/app/igny8/`
**Audience:** Claude Code, Backend Developers, Architects
---
## 1. CURRENT STATE
### Schema Markup Today
The `Content` model (app_label=`writer`, db_table=`igny8_content`) has a `schema_markup` JSONField that stores raw JSON-LD. The AI function `generate_content` occasionally includes basic Article schema, but the output is inconsistent and unvalidated.
### What Works Now
- `Content.schema_markup` — JSONField exists, sometimes populated during generation
- `generate_content` AI function — may produce rudimentary Article schema as part of content output
- `ContentTypeTemplate` model (added by 02A) defines section layouts and presets per content type
- 02A added `Content.structured_data` JSONField for type-specific data (product specs, service steps, etc.)
### What Does Not Exist
- No systematic schema generation by content type
- No on-page SERP element injection (TL;DR, TOC, Key Takeaways, etc.)
- No schema validation against Google Rich Results requirements
- No retroactive enhancement of already-published content
- No SchemaTemplate model, no SERPEnhancement model, no validation records
- No SERP element tracking per content
### Phase 1 & 2A Foundation Available
- `SAGCluster.cluster_type` choices: `product_category`, `condition_problem`, `feature`, `brand`, `informational`, `comparison`
- 01E blueprint-aware pipeline provides `blueprint_context` with `cluster_type`, `content_structure`, `content_type`
- 02A content type routing provides type-specific generation with section layouts
- `Content.content_type` choices: `post`, `page`, `product`, `taxonomy`
- `Content.content_structure` choices: 14 structure types including `cluster_hub`, `product_page`, `service_page`, `comparison`, `review`
---
## 2. WHAT TO BUILD
### Overview
Build a schema generation and SERP enhancement system that:
1. Generates correct JSON-LD structured data for 10 schema types, mapped to content type/structure
2. Injects 8 on-page SERP elements into `content_html` to improve rich snippet eligibility
3. Validates schema against Google Rich Results requirements
4. Retroactively enhances existing published content with missing schema and SERP elements
### 2.1 JSON-LD Schema Types (10 Types)
Each schema type maps to specific `content_type` + `content_structure` combinations:
| # | Schema Type | Applies To | Key Fields |
|---|------------|-----------|------------|
| 1 | **Article / BlogPosting** | `post` (all structures) | headline, datePublished, dateModified, author (Person/Organization), publisher, image, description, mainEntityOfPage, wordCount, articleSection |
| 2 | **Product** | `product` / `product_page` | name, description, image, brand, offers (price, priceCurrency, availability, url), aggregateRating, review, sku, gtin |
| 3 | **Service** | `page` / `service_page` | name, description, provider (Organization), serviceType, areaServed, hasOfferCatalog, offers |
| 4 | **LocalBusiness** | Sites with physical location (site-level config) | name, address, telephone, openingHours, geo, image, priceRange, sameAs, hasMap |
| 5 | **Organization** | Site-wide (homepage schema) | name, url, logo, sameAs[], contactPoint, foundingDate, founders |
| 6 | **BreadcrumbList** | All pages | itemListElement [{position, name, item(URL)}] — auto-generated from SAG hierarchy or WP breadcrumb trail |
| 7 | **FAQPage** | Content with FAQ sections (auto-detected from H2/H3 question patterns) | mainEntity [{@type: Question, name, acceptedAnswer: {text}}] |
| 8 | **HowTo** | Step-by-step content (detected from ordered lists with process indicators) | name, step [{@type: HowToStep, name, text, image, url}], totalTime, estimatedCost |
| 9 | **VideoObject** | Content with video embeds (02I integration) | name, description, thumbnailUrl, uploadDate, duration, contentUrl, embedUrl |
| 10 | **WebSite + SearchAction** | Site-wide (homepage) | name, url, potentialAction (SearchAction with query-input) |
**Auto-Detection Rules:**
- FAQPage: detected when content has H2/H3 headings matching question patterns (starts with "What", "How", "Why", "When", "Is", "Can", "Does", "Should") or explicit `<div class="faq-section">` blocks
- HowTo: detected when content has ordered lists (`<ol>`) combined with process language ("Step 1", "First", "Next", etc.)
- VideoObject: detected when `<iframe>` or `<video>` tags present, or when 02I VideoProject is linked to content
- BreadcrumbList: always generated — uses SAG hierarchy (Site → Sector → Cluster → Content) or WordPress breadcrumb trail from SiteIntegration sync
**Schema Stacking:** A single content piece can have multiple schemas. An article with FAQ and video gets Article + FAQPage + VideoObject + BreadcrumbList — all in a single `<script type="application/ld+json">` array.
### 2.2 On-Page SERP Elements (8 Types)
SERP elements are HTML blocks injected into `content_html` to improve featured snippet and rich result eligibility:
| # | Element | Description | Insertion Point | Detection / Source |
|---|---------|-------------|----------------|-------------------|
| 1 | **TL;DR Box** | 2-3 sentence summary in styled box | Top of article, after first paragraph | AI-generated from content |
| 2 | **Table of Contents** | Auto-generated from H2/H3 headings with anchor links | After intro paragraph, before first H2 | Parsed from content headings |
| 3 | **Key Takeaways** | Bullet list of main points in styled box | After TL;DR or after intro | AI-generated from content |
| 4 | **Definition Boxes** | Highlighted term definitions | Inline, after first use of defined term | AI detects key terms + generates definitions |
| 5 | **Comparison Tables** | Structured HTML tables for comparison content | Within body, at relevant H2 section | AI-generated for `comparison`, `review` structures |
| 6 | **People Also Ask** | Related questions with expandable answers | Before conclusion or after last H2 | AI-generated from content + cluster keywords |
| 7 | **Statistics Callouts** | Visual callout boxes for key numbers/stats | Inline, wrapping existing stats in text | Regex detection of numbers/percentages in text |
| 8 | **Pro/Con Boxes** | Structured pros and cons for review/comparison content | Within body, at relevant section | AI-generated for `review`, `comparison`, `product_page` structures |
**SERP Element Applicability by Content Structure:**
| Structure | TL;DR | TOC | Key Takeaways | Definitions | Comparison | PAA | Stats | Pro/Con |
|-----------|-------|-----|---------------|-------------|------------|-----|-------|---------|
| `article` | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |
| `guide` | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |
| `comparison` | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |
| `review` | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |
| `listicle` | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ |
| `landing_page` | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ |
| `service_page` | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ |
| `product_page` | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |
| `cluster_hub` | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |
### 2.3 Retroactive Enhancement Engine
For existing published content that was generated before this module:
1. **Scan Phase:** Query all Content records where `schema_markup` is empty/incomplete OR `serp_elements` is null/empty
2. **Priority Ordering:** Highest-traffic pages first (using GSC data from 02C `GSCMetricsCache` if available, otherwise by `created_at` DESC)
3. **Generate Phase:** For each content, determine applicable schema types + SERP elements based on `content_type`, `content_structure`, and HTML analysis
4. **Preview Mode:** Store generated schema and SERP HTML in model records without modifying Content — user reviews before applying
5. **Apply Phase:** On approval, update `Content.schema_markup` and inject SERP element HTML into `Content.content_html`
6. **Batch Processing:** Process 10 content items per Celery task, with configurable batch size
---
## 3. DATA MODELS & APIS
### 3.1 New Models
#### SchemaTemplate (writer app)
```python
class SchemaTemplate(AccountBaseModel):
"""
Reusable JSON-LD schema templates with placeholder fields.
Account-level: account admins can customize templates.
"""
schema_type = models.CharField(
max_length=30,
choices=[
('article', 'Article / BlogPosting'),
('product', 'Product'),
('service', 'Service'),
('localbusiness', 'LocalBusiness'),
('organization', 'Organization'),
('breadcrumb', 'BreadcrumbList'),
('faq', 'FAQPage'),
('howto', 'HowTo'),
('video', 'VideoObject'),
('website', 'WebSite + SearchAction'),
]
)
content_type_match = models.CharField(
max_length=20,
choices=CONTENT_TYPE_CHOICES,
help_text='Which content_type this template applies to'
)
content_structure_match = models.CharField(
max_length=30,
choices=CONTENT_STRUCTURE_CHOICES,
blank=True,
null=True,
help_text='Further filter by content_structure (null = any)'
)
template_json = models.JSONField(
help_text='JSON-LD template with {{placeholder}} fields'
)
required_fields = models.JSONField(
default=list,
help_text='List of required field paths for validation'
)
is_default = models.BooleanField(default=False)
class Meta:
app_label = 'writer'
db_table = 'igny8_schema_templates'
unique_together = [
('account', 'schema_type', 'content_type_match', 'content_structure_match')
]
```
**PK:** BigAutoField (integer) — inherits from AccountBaseModel
**Relationships:** account FK (from AccountBaseModel)
#### SERPEnhancement (writer app)
```python
class SERPEnhancement(SiteSectorBaseModel):
"""
Tracks individual SERP enhancement elements generated for content.
One record per enhancement type per content.
"""
ENHANCEMENT_TYPE_CHOICES = [
('tldr', 'TL;DR Box'),
('toc', 'Table of Contents'),
('key_takeaways', 'Key Takeaways'),
('definition', 'Definition Box'),
('comparison_table', 'Comparison Table'),
('paa', 'People Also Ask'),
('stats_callout', 'Statistics Callout'),
('pro_con', 'Pro/Con Box'),
]
content = models.ForeignKey(
'writer.Content',
on_delete=models.CASCADE,
related_name='serp_enhancement_records'
)
enhancement_type = models.CharField(max_length=20, choices=ENHANCEMENT_TYPE_CHOICES)
html_snippet = models.TextField(
help_text='Generated HTML block to inject into content_html'
)
insertion_point = models.CharField(
max_length=30,
help_text='Where in content: top, after_intro, before_h2_N, bottom'
)
status = models.CharField(
max_length=15,
choices=[
('generated', 'Generated'),
('inserted', 'Inserted'),
('removed', 'Removed'),
],
default='generated'
)
generated_at = models.DateTimeField(auto_now_add=True)
class Meta:
app_label = 'writer'
db_table = 'igny8_serp_enhancements'
unique_together = [('content', 'enhancement_type')]
```
**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel
**Relationships:** content FK → Content, site FK + sector FK + account FK (from SiteSectorBaseModel)
#### SchemaValidationResult (writer app)
```python
class SchemaValidationResult(SiteSectorBaseModel):
"""
Stores schema validation results per content per schema type.
"""
content = models.ForeignKey(
'writer.Content',
on_delete=models.CASCADE,
related_name='schema_validations'
)
schema_type = models.CharField(max_length=30)
is_valid = models.BooleanField(default=False)
errors = models.JSONField(default=list, help_text='List of validation error strings')
warnings = models.JSONField(default=list, help_text='List of validation warning strings')
validated_at = models.DateTimeField(auto_now_add=True)
class Meta:
app_label = 'writer'
db_table = 'igny8_schema_validation_results'
```
**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel
### 3.2 Modified Models
#### Content (writer app) — add field
```python
# Add to Content model:
serp_elements = models.JSONField(
default=dict,
blank=True,
help_text='Tracks which SERP enhancements are active: {type: True/False}'
)
```
**Existing field used:** `Content.schema_markup` (JSONField) — now systematically populated by this module instead of ad-hoc AI output.
### 3.3 Migration
Single migration in writer app:
```
igny8_core/migrations/XXXX_add_schema_serp_models.py
```
**Operations:**
1. `CreateModel('SchemaTemplate', ...)` — with unique_together constraint
2. `CreateModel('SERPEnhancement', ...)` — with unique_together constraint
3. `CreateModel('SchemaValidationResult', ...)`
4. `AddField('Content', 'serp_elements', JSONField(default=dict, blank=True))`
### 3.4 API Endpoints
All endpoints under `/api/v1/writer/` — extends the existing writer app URL namespace.
#### Schema Generation
| Method | Path | Description |
|--------|------|-------------|
| POST | `/api/v1/writer/schema/generate/` | Generate schema for single content. Body: `{content_id}`. Returns JSON-LD + updates `Content.schema_markup`. |
| POST | `/api/v1/writer/schema/validate/` | Validate existing schema against Google requirements. Body: `{content_id}`. Returns SchemaValidationResult. |
| POST | `/api/v1/writer/schema/batch-generate/` | Batch generate schema. Body: `{content_ids: [int], site_id}`. Queues Celery task. Returns task ID. |
| GET | `/api/v1/writer/schema/templates/` | List SchemaTemplate records. Query params: `account_id`, `schema_type`, `content_type_match`. |
| GET | `/api/v1/writer/schema/audit/?site_id=X` | Schema coverage audit — returns counts of content with/without schema per type. |
| POST | `/api/v1/writer/schema/retroactive/` | Trigger retroactive schema scan. Body: `{site_id, batch_size}`. Queues Celery task. |
#### SERP Enhancement
| Method | Path | Description |
|--------|------|-------------|
| POST | `/api/v1/writer/serp/enhance/` | Generate SERP elements for single content. Body: `{content_id, element_types: []}`. Returns SERPEnhancement records. |
| POST | `/api/v1/writer/serp/batch-enhance/` | Batch enhancement. Body: `{content_ids: [int], site_id}`. Queues Celery task. |
| GET | `/api/v1/writer/serp/preview/{content_id}/` | Preview enhancements — returns modified HTML without applying. |
| POST | `/api/v1/writer/serp/apply/{content_id}/` | Apply enhancements — injects HTML into `Content.content_html` and updates `Content.serp_elements`. |
| POST | `/api/v1/writer/serp/remove/{content_id}/` | Remove specific SERP elements. Body: `{element_types: []}`. |
**Permissions:** All endpoints use `AccountModelViewSet` or `SiteSectorModelViewSet` permission patterns from existing codebase.
### 3.5 AI Functions
#### GenerateSchemaFunction (extends BaseAIFunction)
**Registry key:** `generate_schema`
**Location:** `igny8_core/ai/functions/generate_schema.py`
```python
class GenerateSchemaFunction(BaseAIFunction):
"""
Generates JSON-LD structured data for content.
Determines applicable schema types from content_type, content_structure,
and HTML analysis. Produces schema-stacked output.
"""
function_name = 'generate_schema'
def validate(self, content_id, **kwargs):
# Verify content exists and has content_html
pass
def prepare(self, content_id, **kwargs):
# Load Content, determine applicable schema types
# Load matching SchemaTemplate records
# Extract structured_data from Content (from 02A)
pass
def build_prompt(self):
# Include: content title, meta_description, content_html excerpt,
# content_type, content_structure, structured_data,
# schema template as example, required_fields list
pass
def parse_response(self, response):
# Parse JSON-LD array from AI response
# Validate against required_fields
pass
def save_output(self, parsed):
# Save to Content.schema_markup
# Create SchemaValidationResult records
pass
```
**Input:** `content_id` (int)
**Output:** JSON-LD array saved to `Content.schema_markup`
#### GenerateSERPElementsFunction (extends BaseAIFunction)
**Registry key:** `generate_serp_elements`
**Location:** `igny8_core/ai/functions/generate_serp_elements.py`
```python
class GenerateSERPElementsFunction(BaseAIFunction):
"""
Generates on-page SERP enhancement HTML for content.
Uses content structure and applicability matrix to determine which elements
to generate. Returns HTML snippets for each element.
"""
function_name = 'generate_serp_elements'
def validate(self, content_id, element_types=None, **kwargs):
# Verify content exists
# If element_types not specified, determine from applicability matrix
pass
def prepare(self, content_id, element_types=None, **kwargs):
# Load Content, parse content_html for headings/stats/terms
# Load cluster keywords for PAA generation
pass
def build_prompt(self):
# Per element type, build specific sub-prompts:
# - TL;DR: "Summarize in 2-3 sentences..."
# - Key Takeaways: "Extract 3-5 main points..."
# - PAA: "Generate 4-6 related questions..."
# - Definitions: "Identify key terms and define..."
# etc.
pass
def parse_response(self, response):
# Parse per-element HTML snippets from AI response
pass
def save_output(self, parsed):
# Create/update SERPEnhancement records per element
pass
```
**Input:** `content_id` (int), optional `element_types` (list of strings)
**Output:** SERPEnhancement records created, not yet injected into content_html
### 3.6 Schema Validation Service
**Location:** `igny8_core/business/schema_validation.py`
```python
class SchemaValidationService:
"""
Validates JSON-LD schema against Google Rich Results requirements.
Not just schema.org compliance — checks Google-specific required fields.
"""
GOOGLE_REQUIRED_FIELDS = {
'article': ['headline', 'datePublished', 'author', 'image', 'publisher'],
'product': ['name', 'image', 'offers'],
'service': ['name', 'description', 'provider'],
'localbusiness': ['name', 'address'],
'organization': ['name', 'url', 'logo'],
'breadcrumb': ['itemListElement'],
'faq': ['mainEntity'],
'howto': ['name', 'step'],
'video': ['name', 'description', 'thumbnailUrl', 'uploadDate'],
'website': ['name', 'url', 'potentialAction'],
}
def validate(self, content_id):
"""
Validate all schema_markup entries for a content record.
Returns list of SchemaValidationResult records.
"""
pass
def _validate_single(self, schema_json, schema_type):
"""
Validate a single schema entry against required fields.
Returns (is_valid, errors[], warnings[]).
"""
pass
def auto_fix(self, content_id):
"""
Attempt to fix common schema issues:
- Missing dateModified → copy from updated_at
- Missing image → use first image from Images model
- Missing publisher → use site/account Organization schema
"""
pass
```
### 3.7 SERP Element Injection Service
**Location:** `igny8_core/business/serp_injection.py`
```python
class SERPInjectionService:
"""
Injects SERP enhancement HTML snippets into content_html.
Handles insertion point resolution and collision avoidance.
"""
INSERTION_ORDER = [
'tldr', # After first paragraph
'toc', # After intro, before first H2
'key_takeaways', # After TL;DR or after intro
'definition', # Inline, after first use of term
'comparison_table', # Within body at relevant H2
'stats_callout', # Inline, wrapping existing stats
'pro_con', # Within body at relevant section
'paa', # Before conclusion or after last H2
]
def inject(self, content_id):
"""
Inject all 'generated' SERPEnhancement records into content_html.
Updates Content.content_html and Content.serp_elements tracking field.
Marks SERPEnhancement records as 'inserted'.
"""
pass
def remove(self, content_id, element_types):
"""
Remove specified SERP elements from content_html.
Each element is wrapped in <div data-serp-element="{type}"> for removal.
"""
pass
def preview(self, content_id):
"""
Return modified content_html with enhancements WITHOUT saving.
"""
pass
```
**SERP Element HTML Wrapping Convention:**
All injected elements are wrapped with a data attribute for identification:
```html
<div data-serp-element="tldr" class="igny8-serp-tldr">
<!-- Generated TL;DR content -->
</div>
```
This allows reliable removal/replacement without corrupting surrounding content.
---
## 4. IMPLEMENTATION STEPS
### Step 1: Migration & Models
1. Create `SchemaTemplate` model in writer app
2. Create `SERPEnhancement` model in writer app
3. Create `SchemaValidationResult` model in writer app
4. Add `serp_elements` JSONField to Content model
5. Run migration
### Step 2: Schema Templates Seed Data
Create default SchemaTemplate records for each of the 10 schema types:
| schema_type | content_type_match | content_structure_match | is_default |
|------------|-------------------|------------------------|------------|
| `article` | `post` | `null` (any) | True |
| `product` | `product` | `null` | True |
| `product` | `post` | `product_page` | True |
| `service` | `page` | `service_page` | True |
| `localbusiness` | `page` | `null` | True |
| `organization` | `page` | `business_page` | True |
| `breadcrumb` | `post` | `null` | True |
| `breadcrumb` | `page` | `null` | True |
| `breadcrumb` | `product` | `null` | True |
| `faq` | `post` | `null` | True |
| `howto` | `post` | `null` | True |
| `video` | `post` | `null` | True |
| `website` | `page` | `null` | True |
Seed via data migration or management command `seed_schema_templates`.
### Step 3: AI Functions
1. Implement `GenerateSchemaFunction` in `igny8_core/ai/functions/generate_schema.py`
2. Implement `GenerateSERPElementsFunction` in `igny8_core/ai/functions/generate_serp_elements.py`
3. Register both in `igny8_core/ai/registry.py`
### Step 4: Services
1. Implement `SchemaValidationService` in `igny8_core/business/schema_validation.py`
2. Implement `SERPInjectionService` in `igny8_core/business/serp_injection.py`
### Step 5: Pipeline Integration
Integrate schema generation into the content pipeline after Stage 4 (content generation):
```python
# In content generation pipeline (01E blueprint-aware-pipeline):
# After GenerateContentFunction completes:
def post_content_generation(content_id):
# Auto-generate schema based on content type
generate_schema_fn = registry.get('generate_schema')
generate_schema_fn.execute(content_id=content_id)
# Auto-generate applicable SERP elements
generate_serp_fn = registry.get('generate_serp_elements')
generate_serp_fn.execute(content_id=content_id)
# Inject SERP elements into content_html
injection_service = SERPInjectionService()
injection_service.inject(content_id)
```
### Step 6: API Endpoints
1. Add schema endpoints to `igny8_core/urls/writer.py`
2. Create `SchemaGenerateView`, `SchemaValidateView`, `SchemaBatchGenerateView`
3. Create `SERPEnhanceView`, `SERPBatchEnhanceView`, `SERPPreviewView`, `SERPApplyView`
4. Create `SchemaAuditView`, `SchemaRetroactiveView`
### Step 7: Celery Tasks
Register in `igny8_core/tasks/` and add beat schedule entries:
```python
# igny8_core/tasks/schema_tasks.py
@shared_task(name='generate_schema_for_content')
def generate_schema_for_content(content_id):
"""After content generation, auto-generate schema."""
pass
@shared_task(name='retroactive_schema_scan')
def retroactive_schema_scan(site_id, batch_size=10):
"""Scan existing content and generate missing schemas in batches."""
pass
@shared_task(name='validate_schemas_batch')
def validate_schemas_batch(site_id):
"""Periodic validation of all schemas for a site."""
pass
```
**Beat Schedule Additions:**
| Task | Schedule | Notes |
|------|----------|-------|
| `validate_schemas_batch` | Weekly (Sunday 3:00 AM) | Validates all schemas, creates SchemaValidationResult records |
### Step 8: Serializers & Admin
1. Create DRF serializers for SchemaTemplate, SERPEnhancement, SchemaValidationResult
2. Register models in Django admin for inspection
### Step 9: Credit Cost Configuration
Add to `CreditCostConfig` (billing app):
| operation_type | default_cost | description |
|---------------|-------------|-------------|
| `schema_generation` | 1 | Generate JSON-LD schema for one content |
| `serp_element_generation` | 0.5 | Generate one SERP element |
| `schema_validation` | 0.1 | Validate schema for one content |
| `schema_batch` | 8-12 | Batch generate for 10 items (varies by content) |
Credit deduction follows existing `CreditUsageLog` pattern: log entry created per operation with `operation_type`, `credits_used`, `content` FK.
---
## 5. ACCEPTANCE CRITERIA
### Schema Generation
- [ ] Article/BlogPosting schema generated for all `content_type='post'` content
- [ ] Product schema generated for `content_type='product'` and `content_structure='product_page'` content
- [ ] Service schema generated for `content_structure='service_page'` content
- [ ] BreadcrumbList schema generated for all content using SAG hierarchy
- [ ] FAQPage schema auto-detected and generated when content has question-pattern headings
- [ ] HowTo schema auto-detected and generated when content has step-by-step lists
- [ ] Schema stacking works — content with FAQ + Article gets both schemas in array
- [ ] All schemas pass SchemaValidationService checks for Google required fields
### SERP Enhancement
- [ ] TL;DR box generated and injected for applicable content structures
- [ ] Table of Contents auto-generated from H2/H3 headings with working anchor links
- [ ] Key Takeaways bullet list generated for applicable content
- [ ] People Also Ask section generated with 4-6 questions + answers
- [ ] Comparison Tables generated for comparison/review content
- [ ] Pro/Con boxes generated for review/product_page content
- [ ] All SERP elements wrapped in `<div data-serp-element="{type}">` for reliable removal
- [ ] SERP elements can be removed without corrupting content
- [ ] Applicability matrix enforced — no TL;DR on landing_page, etc.
### Retroactive Enhancement
- [ ] Retroactive scan identifies content missing schema by type
- [ ] Priority ordering by traffic (GSC data) or creation date
- [ ] Preview mode shows changes without modifying Content
- [ ] Batch processing handles 10 items per task run
- [ ] Applied enhancements update Content.schema_markup and Content.serp_elements
### Validation
- [ ] SchemaValidationResult records created for each validation run
- [ ] Validation checks Google-specific required fields (not just schema.org)
- [ ] Auto-fix resolves common issues (missing dateModified, image, publisher)
- [ ] Weekly batch validation catches schema drift
### Integration
- [ ] Schema generation triggers automatically after content generation in pipeline
- [ ] SERP elements generated and injected as part of pipeline flow
- [ ] Credit costs deducted per CreditCostConfig entries
- [ ] All API endpoints respect account/site permission boundaries
---
## 6. CLAUDE CODE INSTRUCTIONS
### File Locations
```
igny8_core/
├── ai/
│ └── functions/
│ ├── generate_schema.py # GenerateSchemaFunction
│ └── generate_serp_elements.py # GenerateSERPElementsFunction
├── business/
│ ├── schema_validation.py # SchemaValidationService
│ └── serp_injection.py # SERPInjectionService
├── tasks/
│ └── schema_tasks.py # Celery tasks
├── urls/
│ └── writer.py # Add schema + serp endpoints
└── migrations/
└── XXXX_add_schema_serp_models.py # Models + Content.serp_elements
```
### Conventions
- **PKs:** BigAutoField (integer) — do NOT use UUIDs
- **Table prefix:** `igny8_` on all new tables
- **Celery app name:** `igny8_core`
- **URL pattern:** `/api/v1/writer/schema/...` and `/api/v1/writer/serp/...`
- **Permissions:** Use `AccountModelViewSet` / `SiteSectorModelViewSet` patterns
- **AI functions:** Extend `BaseAIFunction` with `validate()`, `prepare()`, `build_prompt()`, `parse_response()`, `save_output()`
- **Registry:** Register new AI functions in `igny8_core/ai/registry.py`
- **Frontend:** `.tsx` files with Zustand stores for state management
### Cross-References
| Doc | Relationship |
|-----|-------------|
| **02A** | Content type determines which schema type to generate; ContentTypeTemplate section layouts inform schema field population |
| **02F** | Optimizer detects schema gaps and triggers schema generation/fix |
| **02I** | VideoObject schema generated for content with linked VideoProject |
| **03A** | WP plugin standalone mode has its own schema module — different from this IGNY8-native implementation |
| **03B** | Connected mode pushes schema to WordPress via bulk endpoint |
| **01E** | Pipeline integration — schema generation hooks after Stage 4 content generation |
| **01G** | SAG health monitoring can incorporate schema completeness as a health factor |
### Key Decisions
1. **Writer app, not separate app** — SchemaTemplate, SERPEnhancement, SchemaValidationResult all live in the `writer` app since they are tightly coupled to Content
2. **Schema stacking** — multiple schemas per content stored as JSON array in `Content.schema_markup`
3. **SERP element wrapping** — all injected HTML uses `data-serp-element` attribute for non-destructive add/remove
4. **Preview before apply** — retroactive enhancements always go through preview state
5. **Content.serp_elements tracking field** — JSONField dict `{type: True/False}` for fast lookups without querying SERPEnhancement table