igny8/v2/V2-Execution-Docs/02G-rich-schema-serp.md

# IGNY8 Phase 2: Rich Schema & SERP Enhancement (02G)
## JSON-LD Schema Generation & On-Page SERP Element Injection

**Document Version:** 1.0
**Date:** 2026-03-23
**Phase:** IGNY8 Phase 2 — Feature Expansion
**Status:** Build Ready
**Source of Truth:** Codebase at `/data/app/igny8/`
**Audience:** Claude Code, Backend Developers, Architects

---

## 1. CURRENT STATE

### Schema Markup Today
The `Content` model (app_label=`writer`, db_table=`igny8_content`) has a `schema_markup` JSONField that stores raw JSON-LD. The AI function `generate_content` occasionally includes basic Article schema, but the output is inconsistent and unvalidated.

### What Works Now
- `Content.schema_markup` — JSONField exists, sometimes populated during generation
- `generate_content` AI function — may produce rudimentary Article schema as part of content output
- `ContentTypeTemplate` model (added by 02A) defines section layouts and presets per content type
- 02A added `Content.structured_data` JSONField for type-specific data (product specs, service steps, etc.)

### What Does Not Exist
- No systematic schema generation by content type
- No on-page SERP element injection (TL;DR, TOC, Key Takeaways, etc.)
- No schema validation against Google Rich Results requirements
- No retroactive enhancement of already-published content
- No SchemaTemplate model, no SERPEnhancement model, no validation records
- No SERP element tracking per content

### Phase 1 & 2A Foundation Available
- `SAGCluster.cluster_type` choices: `product_category`, `condition_problem`, `feature`, `brand`, `informational`, `comparison`
- 01E blueprint-aware pipeline provides `blueprint_context` with `cluster_type`, `content_structure`, `content_type`
- 02A content type routing provides type-specific generation with section layouts
- `Content.content_type` choices: `post`, `page`, `product`, `taxonomy`
- `Content.content_structure` choices: 14 structure types including `cluster_hub`, `product_page`, `service_page`, `comparison`, `review`

---

## 2. WHAT TO BUILD

### Overview
Build a schema generation and SERP enhancement system that:
1. Generates correct JSON-LD structured data for 10 schema types, mapped to content type/structure
2. Injects 8 on-page SERP elements into `content_html` to improve rich snippet eligibility
3. Validates schema against Google Rich Results requirements
4. Retroactively enhances existing published content with missing schema and SERP elements

### 2.1 JSON-LD Schema Types (10 Types)

Each schema type maps to specific `content_type` + `content_structure` combinations:

| # | Schema Type | Applies To | Key Fields |
|---|------------|-----------|------------|
| 1 | **Article / BlogPosting** | `post` (all structures) | headline, datePublished, dateModified, author (Person/Organization), publisher, image, description, mainEntityOfPage, wordCount, articleSection |
| 2 | **Product** | `product` / `product_page` | name, description, image, brand, offers (price, priceCurrency, availability, url), aggregateRating, review, sku, gtin |
| 3 | **Service** | `page` / `service_page` | name, description, provider (Organization), serviceType, areaServed, hasOfferCatalog, offers |
| 4 | **LocalBusiness** | Sites with physical location (site-level config) | name, address, telephone, openingHours, geo, image, priceRange, sameAs, hasMap |
| 5 | **Organization** | Site-wide (homepage schema) | name, url, logo, sameAs[], contactPoint, foundingDate, founders |
| 6 | **BreadcrumbList** | All pages | itemListElement [{position, name, item(URL)}] — auto-generated from SAG hierarchy or WP breadcrumb trail |
| 7 | **FAQPage** | Content with FAQ sections (auto-detected from H2/H3 question patterns) | mainEntity [{@type: Question, name, acceptedAnswer: {text}}] |
| 8 | **HowTo** | Step-by-step content (detected from ordered lists with process indicators) | name, step [{@type: HowToStep, name, text, image, url}], totalTime, estimatedCost |
| 9 | **VideoObject** | Content with video embeds (02I integration) | name, description, thumbnailUrl, uploadDate, duration, contentUrl, embedUrl |
| 10 | **WebSite + SearchAction** | Site-wide (homepage) | name, url, potentialAction (SearchAction with query-input) |

**Auto-Detection Rules:**
- FAQPage: detected when content has H2/H3 headings matching question patterns (starts with "What", "How", "Why", "When", "Is", "Can", "Does", "Should") or explicit `<div class="faq-section">` blocks
- HowTo: detected when content has ordered lists (`<ol>`) combined with process language ("Step 1", "First", "Next", etc.)
- VideoObject: detected when `<iframe>` or `<video>` tags present, or when 02I VideoProject is linked to content
- BreadcrumbList: always generated — uses SAG hierarchy (Site → Sector → Cluster → Content) or WordPress breadcrumb trail from SiteIntegration sync

**Schema Stacking:** A single content piece can have multiple schemas. An article with FAQ and video gets Article + FAQPage + VideoObject + BreadcrumbList — all in a single `<script type="application/ld+json">` array.

### 2.2 On-Page SERP Elements (8 Types)

SERP elements are HTML blocks injected into `content_html` to improve featured snippet and rich result eligibility:

| # | Element | Description | Insertion Point | Detection / Source |
|---|---------|-------------|----------------|-------------------|
| 1 | **TL;DR Box** | 2-3 sentence summary in styled box | Top of article, after first paragraph | AI-generated from content |
| 2 | **Table of Contents** | Auto-generated from H2/H3 headings with anchor links | After intro paragraph, before first H2 | Parsed from content headings |
| 3 | **Key Takeaways** | Bullet list of main points in styled box | After TL;DR or after intro | AI-generated from content |
| 4 | **Definition Boxes** | Highlighted term definitions | Inline, after first use of defined term | AI detects key terms + generates definitions |
| 5 | **Comparison Tables** | Structured HTML tables for comparison content | Within body, at relevant H2 section | AI-generated for `comparison`, `review` structures |
| 6 | **People Also Ask** | Related questions with expandable answers | Before conclusion or after last H2 | AI-generated from content + cluster keywords |
| 7 | **Statistics Callouts** | Visual callout boxes for key numbers/stats | Inline, wrapping existing stats in text | Regex detection of numbers/percentages in text |
| 8 | **Pro/Con Boxes** | Structured pros and cons for review/comparison content | Within body, at relevant section | AI-generated for `review`, `comparison`, `product_page` structures |

**SERP Element Applicability by Content Structure:**

| Structure | TL;DR | TOC | Key Takeaways | Definitions | Comparison | PAA | Stats | Pro/Con |
|-----------|-------|-----|---------------|-------------|------------|-----|-------|---------|
| `article` | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |
| `guide` | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |
| `comparison` | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |
| `review` | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |
| `listicle` | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ |
| `landing_page` | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ |
| `service_page` | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ |
| `product_page` | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |
| `cluster_hub` | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |

### 2.3 Retroactive Enhancement Engine

For existing published content that was generated before this module:

1. **Scan Phase:** Query all Content records where `schema_markup` is empty/incomplete OR `serp_elements` is null/empty
2. **Priority Ordering:** Highest-traffic pages first (using GSC data from 02C `GSCMetricsCache` if available, otherwise by `created_at` DESC)
3. **Generate Phase:** For each content, determine applicable schema types + SERP elements based on `content_type`, `content_structure`, and HTML analysis
4. **Preview Mode:** Store generated schema and SERP HTML in model records without modifying Content — user reviews before applying
5. **Apply Phase:** On approval, update `Content.schema_markup` and inject SERP element HTML into `Content.content_html`
6. **Batch Processing:** Process 10 content items per Celery task, with configurable batch size

---

## 3. DATA MODELS & APIS

### 3.1 New Models

#### SchemaTemplate (writer app)

```python
class SchemaTemplate(AccountBaseModel):
    """
    Reusable JSON-LD schema templates with placeholder fields.
    Account-level: account admins can customize templates.
    """
    schema_type = models.CharField(
        max_length=30,
        choices=[
            ('article', 'Article / BlogPosting'),
            ('product', 'Product'),
            ('service', 'Service'),
            ('localbusiness', 'LocalBusiness'),
            ('organization', 'Organization'),
            ('breadcrumb', 'BreadcrumbList'),
            ('faq', 'FAQPage'),
            ('howto', 'HowTo'),
            ('video', 'VideoObject'),
            ('website', 'WebSite + SearchAction'),
        ]
    )
    content_type_match = models.CharField(
        max_length=20,
        choices=CONTENT_TYPE_CHOICES,
        help_text='Which content_type this template applies to'
    )
    content_structure_match = models.CharField(
        max_length=30,
        choices=CONTENT_STRUCTURE_CHOICES,
        blank=True,
        null=True,
        help_text='Further filter by content_structure (null = any)'
    )
    template_json = models.JSONField(
        help_text='JSON-LD template with {{placeholder}} fields'
    )
    required_fields = models.JSONField(
        default=list,
        help_text='List of required field paths for validation'
    )
    is_default = models.BooleanField(default=False)

    class Meta:
        app_label = 'writer'
        db_table = 'igny8_schema_templates'
        unique_together = [
            ('account', 'schema_type', 'content_type_match', 'content_structure_match')
        ]
```

**PK:** BigAutoField (integer) — inherits from AccountBaseModel
**Relationships:** account FK (from AccountBaseModel)

#### SERPEnhancement (writer app)

```python
class SERPEnhancement(SiteSectorBaseModel):
    """
    Tracks individual SERP enhancement elements generated for content.
    One record per enhancement type per content.
    """
    ENHANCEMENT_TYPE_CHOICES = [
        ('tldr', 'TL;DR Box'),
        ('toc', 'Table of Contents'),
        ('key_takeaways', 'Key Takeaways'),
        ('definition', 'Definition Box'),
        ('comparison_table', 'Comparison Table'),
        ('paa', 'People Also Ask'),
        ('stats_callout', 'Statistics Callout'),
        ('pro_con', 'Pro/Con Box'),
    ]

    content = models.ForeignKey(
        'writer.Content',
        on_delete=models.CASCADE,
        related_name='serp_enhancement_records'
    )
    enhancement_type = models.CharField(max_length=20, choices=ENHANCEMENT_TYPE_CHOICES)
    html_snippet = models.TextField(
        help_text='Generated HTML block to inject into content_html'
    )
    insertion_point = models.CharField(
        max_length=30,
        help_text='Where in content: top, after_intro, before_h2_N, bottom'
    )
    status = models.CharField(
        max_length=15,
        choices=[
            ('generated', 'Generated'),
            ('inserted', 'Inserted'),
            ('removed', 'Removed'),
        ],
        default='generated'
    )
    generated_at = models.DateTimeField(auto_now_add=True)

    class Meta:
        app_label = 'writer'
        db_table = 'igny8_serp_enhancements'
        unique_together = [('content', 'enhancement_type')]
```

**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel
**Relationships:** content FK → Content, site FK + sector FK + account FK (from SiteSectorBaseModel)

#### SchemaValidationResult (writer app)

```python
class SchemaValidationResult(SiteSectorBaseModel):
    """
    Stores schema validation results per content per schema type.
    """
    content = models.ForeignKey(
        'writer.Content',
        on_delete=models.CASCADE,
        related_name='schema_validations'
    )
    schema_type = models.CharField(max_length=30)
    is_valid = models.BooleanField(default=False)
    errors = models.JSONField(default=list, help_text='List of validation error strings')
    warnings = models.JSONField(default=list, help_text='List of validation warning strings')
    validated_at = models.DateTimeField(auto_now_add=True)

    class Meta:
        app_label = 'writer'
        db_table = 'igny8_schema_validation_results'
```

**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel

### 3.2 Modified Models

#### Content (writer app) — add field

```python
# Add to Content model:
serp_elements = models.JSONField(
    default=dict,
    blank=True,
    help_text='Tracks which SERP enhancements are active: {type: True/False}'
)
```

**Existing field used:** `Content.schema_markup` (JSONField) — now systematically populated by this module instead of ad-hoc AI output.

### 3.3 Migration

Single migration in writer app:

```
igny8_core/migrations/XXXX_add_schema_serp_models.py
```

**Operations:**
1. `CreateModel('SchemaTemplate', ...)` — with unique_together constraint
2. `CreateModel('SERPEnhancement', ...)` — with unique_together constraint
3. `CreateModel('SchemaValidationResult', ...)`
4. `AddField('Content', 'serp_elements', JSONField(default=dict, blank=True))`

### 3.4 API Endpoints

All endpoints under `/api/v1/writer/` — extends the existing writer app URL namespace.

#### Schema Generation
| Method | Path | Description |
|--------|------|-------------|
| POST | `/api/v1/writer/schema/generate/` | Generate schema for single content. Body: `{content_id}`. Returns JSON-LD + updates `Content.schema_markup`. |
| POST | `/api/v1/writer/schema/validate/` | Validate existing schema against Google requirements. Body: `{content_id}`. Returns SchemaValidationResult. |
| POST | `/api/v1/writer/schema/batch-generate/` | Batch generate schema. Body: `{content_ids: [int], site_id}`. Queues Celery task. Returns task ID. |
| GET | `/api/v1/writer/schema/templates/` | List SchemaTemplate records. Query params: `account_id`, `schema_type`, `content_type_match`. |
| GET | `/api/v1/writer/schema/audit/?site_id=X` | Schema coverage audit — returns counts of content with/without schema per type. |
| POST | `/api/v1/writer/schema/retroactive/` | Trigger retroactive schema scan. Body: `{site_id, batch_size}`. Queues Celery task. |

#### SERP Enhancement
| Method | Path | Description |
|--------|------|-------------|
| POST | `/api/v1/writer/serp/enhance/` | Generate SERP elements for single content. Body: `{content_id, element_types: []}`. Returns SERPEnhancement records. |
| POST | `/api/v1/writer/serp/batch-enhance/` | Batch enhancement. Body: `{content_ids: [int], site_id}`. Queues Celery task. |
| GET | `/api/v1/writer/serp/preview/{content_id}/` | Preview enhancements — returns modified HTML without applying. |
| POST | `/api/v1/writer/serp/apply/{content_id}/` | Apply enhancements — injects HTML into `Content.content_html` and updates `Content.serp_elements`. |
| POST | `/api/v1/writer/serp/remove/{content_id}/` | Remove specific SERP elements. Body: `{element_types: []}`. |

**Permissions:** All endpoints use `AccountModelViewSet` or `SiteSectorModelViewSet` permission patterns from existing codebase.

### 3.5 AI Functions

#### GenerateSchemaFunction (extends BaseAIFunction)

**Registry key:** `generate_schema`
**Location:** `igny8_core/ai/functions/generate_schema.py`

```python
class GenerateSchemaFunction(BaseAIFunction):
    """
    Generates JSON-LD structured data for content.
    Determines applicable schema types from content_type, content_structure,
    and HTML analysis. Produces schema-stacked output.
    """
    function_name = 'generate_schema'

    def validate(self, content_id, **kwargs):
        # Verify content exists and has content_html
        pass

    def prepare(self, content_id, **kwargs):
        # Load Content, determine applicable schema types
        # Load matching SchemaTemplate records
        # Extract structured_data from Content (from 02A)
        pass

    def build_prompt(self):
        # Include: content title, meta_description, content_html excerpt,
        # content_type, content_structure, structured_data,
        # schema template as example, required_fields list
        pass

    def parse_response(self, response):
        # Parse JSON-LD array from AI response
        # Validate against required_fields
        pass

    def save_output(self, parsed):
        # Save to Content.schema_markup
        # Create SchemaValidationResult records
        pass
```

**Input:** `content_id` (int)
**Output:** JSON-LD array saved to `Content.schema_markup`

#### GenerateSERPElementsFunction (extends BaseAIFunction)

**Registry key:** `generate_serp_elements`
**Location:** `igny8_core/ai/functions/generate_serp_elements.py`

```python
class GenerateSERPElementsFunction(BaseAIFunction):
    """
    Generates on-page SERP enhancement HTML for content.
    Uses content structure and applicability matrix to determine which elements
    to generate. Returns HTML snippets for each element.
    """
    function_name = 'generate_serp_elements'

    def validate(self, content_id, element_types=None, **kwargs):
        # Verify content exists
        # If element_types not specified, determine from applicability matrix
        pass

    def prepare(self, content_id, element_types=None, **kwargs):
        # Load Content, parse content_html for headings/stats/terms
        # Load cluster keywords for PAA generation
        pass

    def build_prompt(self):
        # Per element type, build specific sub-prompts:
        # - TL;DR: "Summarize in 2-3 sentences..."
        # - Key Takeaways: "Extract 3-5 main points..."
        # - PAA: "Generate 4-6 related questions..."
        # - Definitions: "Identify key terms and define..."
        # etc.
        pass

    def parse_response(self, response):
        # Parse per-element HTML snippets from AI response
        pass

    def save_output(self, parsed):
        # Create/update SERPEnhancement records per element
        pass
```

**Input:** `content_id` (int), optional `element_types` (list of strings)
**Output:** SERPEnhancement records created, not yet injected into content_html

### 3.6 Schema Validation Service

**Location:** `igny8_core/business/schema_validation.py`

```python
class SchemaValidationService:
    """
    Validates JSON-LD schema against Google Rich Results requirements.
    Not just schema.org compliance — checks Google-specific required fields.
    """

    GOOGLE_REQUIRED_FIELDS = {
        'article': ['headline', 'datePublished', 'author', 'image', 'publisher'],
        'product': ['name', 'image', 'offers'],
        'service': ['name', 'description', 'provider'],
        'localbusiness': ['name', 'address'],
        'organization': ['name', 'url', 'logo'],
        'breadcrumb': ['itemListElement'],
        'faq': ['mainEntity'],
        'howto': ['name', 'step'],
        'video': ['name', 'description', 'thumbnailUrl', 'uploadDate'],
        'website': ['name', 'url', 'potentialAction'],
    }

    def validate(self, content_id):
        """
        Validate all schema_markup entries for a content record.
        Returns list of SchemaValidationResult records.
        """
        pass

    def _validate_single(self, schema_json, schema_type):
        """
        Validate a single schema entry against required fields.
        Returns (is_valid, errors[], warnings[]).
        """
        pass

    def auto_fix(self, content_id):
        """
        Attempt to fix common schema issues:
        - Missing dateModified → copy from updated_at
        - Missing image → use first image from Images model
        - Missing publisher → use site/account Organization schema
        """
        pass
```

### 3.7 SERP Element Injection Service

**Location:** `igny8_core/business/serp_injection.py`

```python
class SERPInjectionService:
    """
    Injects SERP enhancement HTML snippets into content_html.
    Handles insertion point resolution and collision avoidance.
    """

    INSERTION_ORDER = [
        'tldr',           # After first paragraph
        'toc',            # After intro, before first H2
        'key_takeaways',  # After TL;DR or after intro
        'definition',     # Inline, after first use of term
        'comparison_table',  # Within body at relevant H2
        'stats_callout',  # Inline, wrapping existing stats
        'pro_con',        # Within body at relevant section
        'paa',            # Before conclusion or after last H2
    ]

    def inject(self, content_id):
        """
        Inject all 'generated' SERPEnhancement records into content_html.
        Updates Content.content_html and Content.serp_elements tracking field.
        Marks SERPEnhancement records as 'inserted'.
        """
        pass

    def remove(self, content_id, element_types):
        """
        Remove specified SERP elements from content_html.
        Each element is wrapped in <div data-serp-element="{type}"> for removal.
        """
        pass

    def preview(self, content_id):
        """
        Return modified content_html with enhancements WITHOUT saving.
        """
        pass
```

**SERP Element HTML Wrapping Convention:**
All injected elements are wrapped with a data attribute for identification:
```html
<div data-serp-element="tldr" class="igny8-serp-tldr">
  <!-- Generated TL;DR content -->
</div>
```
This allows reliable removal/replacement without corrupting surrounding content.

---

## 4. IMPLEMENTATION STEPS

### Step 1: Migration & Models
1. Create `SchemaTemplate` model in writer app
2. Create `SERPEnhancement` model in writer app
3. Create `SchemaValidationResult` model in writer app
4. Add `serp_elements` JSONField to Content model
5. Run migration

### Step 2: Schema Templates Seed Data
Create default SchemaTemplate records for each of the 10 schema types:

| schema_type | content_type_match | content_structure_match | is_default |
|------------|-------------------|------------------------|------------|
| `article` | `post` | `null` (any) | True |
| `product` | `product` | `null` | True |
| `product` | `post` | `product_page` | True |
| `service` | `page` | `service_page` | True |
| `localbusiness` | `page` | `null` | True |
| `organization` | `page` | `business_page` | True |
| `breadcrumb` | `post` | `null` | True |
| `breadcrumb` | `page` | `null` | True |
| `breadcrumb` | `product` | `null` | True |
| `faq` | `post` | `null` | True |
| `howto` | `post` | `null` | True |
| `video` | `post` | `null` | True |
| `website` | `page` | `null` | True |

Seed via data migration or management command `seed_schema_templates`.

### Step 3: AI Functions
1. Implement `GenerateSchemaFunction` in `igny8_core/ai/functions/generate_schema.py`
2. Implement `GenerateSERPElementsFunction` in `igny8_core/ai/functions/generate_serp_elements.py`
3. Register both in `igny8_core/ai/registry.py`

### Step 4: Services
1. Implement `SchemaValidationService` in `igny8_core/business/schema_validation.py`
2. Implement `SERPInjectionService` in `igny8_core/business/serp_injection.py`

### Step 5: Pipeline Integration
Integrate schema generation into the content pipeline after Stage 4 (content generation):

```python
# In content generation pipeline (01E blueprint-aware-pipeline):
# After GenerateContentFunction completes:
def post_content_generation(content_id):
    # Auto-generate schema based on content type
    generate_schema_fn = registry.get('generate_schema')
    generate_schema_fn.execute(content_id=content_id)

    # Auto-generate applicable SERP elements
    generate_serp_fn = registry.get('generate_serp_elements')
    generate_serp_fn.execute(content_id=content_id)

    # Inject SERP elements into content_html
    injection_service = SERPInjectionService()
    injection_service.inject(content_id)
```

### Step 6: API Endpoints
1. Add schema endpoints to `igny8_core/urls/writer.py`
2. Create `SchemaGenerateView`, `SchemaValidateView`, `SchemaBatchGenerateView`
3. Create `SERPEnhanceView`, `SERPBatchEnhanceView`, `SERPPreviewView`, `SERPApplyView`
4. Create `SchemaAuditView`, `SchemaRetroactiveView`

### Step 7: Celery Tasks
Register in `igny8_core/tasks/` and add beat schedule entries:

```python
# igny8_core/tasks/schema_tasks.py

@shared_task(name='generate_schema_for_content')
def generate_schema_for_content(content_id):
    """After content generation, auto-generate schema."""
    pass

@shared_task(name='retroactive_schema_scan')
def retroactive_schema_scan(site_id, batch_size=10):
    """Scan existing content and generate missing schemas in batches."""
    pass

@shared_task(name='validate_schemas_batch')
def validate_schemas_batch(site_id):
    """Periodic validation of all schemas for a site."""
    pass
```

**Beat Schedule Additions:**

| Task | Schedule | Notes |
|------|----------|-------|
| `validate_schemas_batch` | Weekly (Sunday 3:00 AM) | Validates all schemas, creates SchemaValidationResult records |

### Step 8: Serializers & Admin
1. Create DRF serializers for SchemaTemplate, SERPEnhancement, SchemaValidationResult
2. Register models in Django admin for inspection

### Step 9: Credit Cost Configuration
Add to `CreditCostConfig` (billing app):

| operation_type | default_cost | description |
|---------------|-------------|-------------|
| `schema_generation` | 1 | Generate JSON-LD schema for one content |
| `serp_element_generation` | 0.5 | Generate one SERP element |
| `schema_validation` | 0.1 | Validate schema for one content |
| `schema_batch` | 8-12 | Batch generate for 10 items (varies by content) |

Credit deduction follows existing `CreditUsageLog` pattern: log entry created per operation with `operation_type`, `credits_used`, `content` FK.

---

## 5. ACCEPTANCE CRITERIA

### Schema Generation
- [ ] Article/BlogPosting schema generated for all `content_type='post'` content
- [ ] Product schema generated for `content_type='product'` and `content_structure='product_page'` content
- [ ] Service schema generated for `content_structure='service_page'` content
- [ ] BreadcrumbList schema generated for all content using SAG hierarchy
- [ ] FAQPage schema auto-detected and generated when content has question-pattern headings
- [ ] HowTo schema auto-detected and generated when content has step-by-step lists
- [ ] Schema stacking works — content with FAQ + Article gets both schemas in array
- [ ] All schemas pass SchemaValidationService checks for Google required fields

### SERP Enhancement
- [ ] TL;DR box generated and injected for applicable content structures
- [ ] Table of Contents auto-generated from H2/H3 headings with working anchor links
- [ ] Key Takeaways bullet list generated for applicable content
- [ ] People Also Ask section generated with 4-6 questions + answers
- [ ] Comparison Tables generated for comparison/review content
- [ ] Pro/Con boxes generated for review/product_page content
- [ ] All SERP elements wrapped in `<div data-serp-element="{type}">` for reliable removal
- [ ] SERP elements can be removed without corrupting content
- [ ] Applicability matrix enforced — no TL;DR on landing_page, etc.

### Retroactive Enhancement
- [ ] Retroactive scan identifies content missing schema by type
- [ ] Priority ordering by traffic (GSC data) or creation date
- [ ] Preview mode shows changes without modifying Content
- [ ] Batch processing handles 10 items per task run
- [ ] Applied enhancements update Content.schema_markup and Content.serp_elements

### Validation
- [ ] SchemaValidationResult records created for each validation run
- [ ] Validation checks Google-specific required fields (not just schema.org)
- [ ] Auto-fix resolves common issues (missing dateModified, image, publisher)
- [ ] Weekly batch validation catches schema drift

### Integration
- [ ] Schema generation triggers automatically after content generation in pipeline
- [ ] SERP elements generated and injected as part of pipeline flow
- [ ] Credit costs deducted per CreditCostConfig entries
- [ ] All API endpoints respect account/site permission boundaries

---

## 6. CLAUDE CODE INSTRUCTIONS

### File Locations
```
igny8_core/
├── ai/
│   └── functions/
│       ├── generate_schema.py          # GenerateSchemaFunction
│       └── generate_serp_elements.py   # GenerateSERPElementsFunction
├── business/
│   ├── schema_validation.py            # SchemaValidationService
│   └── serp_injection.py              # SERPInjectionService
├── tasks/
│   └── schema_tasks.py                # Celery tasks
├── urls/
│   └── writer.py                      # Add schema + serp endpoints
└── migrations/
    └── XXXX_add_schema_serp_models.py # Models + Content.serp_elements
```

### Conventions
- **PKs:** BigAutoField (integer) — do NOT use UUIDs
- **Table prefix:** `igny8_` on all new tables
- **Celery app name:** `igny8_core`
- **URL pattern:** `/api/v1/writer/schema/...` and `/api/v1/writer/serp/...`
- **Permissions:** Use `AccountModelViewSet` / `SiteSectorModelViewSet` patterns
- **AI functions:** Extend `BaseAIFunction` with `validate()`, `prepare()`, `build_prompt()`, `parse_response()`, `save_output()`
- **Registry:** Register new AI functions in `igny8_core/ai/registry.py`
- **Frontend:** `.tsx` files with Zustand stores for state management

### Cross-References
| Doc | Relationship |
|-----|-------------|
| **02A** | Content type determines which schema type to generate; ContentTypeTemplate section layouts inform schema field population |
| **02F** | Optimizer detects schema gaps and triggers schema generation/fix |
| **02I** | VideoObject schema generated for content with linked VideoProject |
| **03A** | WP plugin standalone mode has its own schema module — different from this IGNY8-native implementation |
| **03B** | Connected mode pushes schema to WordPress via bulk endpoint |
| **01E** | Pipeline integration — schema generation hooks after Stage 4 content generation |
| **01G** | SAG health monitoring can incorporate schema completeness as a health factor |

### Key Decisions
1. **Writer app, not separate app** — SchemaTemplate, SERPEnhancement, SchemaValidationResult all live in the `writer` app since they are tightly coupled to Content
2. **Schema stacking** — multiple schemas per content stored as JSON array in `Content.schema_markup`
3. **SERP element wrapping** — all injected HTML uses `data-serp-element` attribute for non-destructive add/remove
4. **Preview before apply** — retroactive enhancements always go through preview state
5. **Content.serp_elements tracking field** — JSONField dict `{type: True/False}` for fast lookups without querying SERPEnhancement table