703 lines
30 KiB
Markdown
703 lines
30 KiB
Markdown
# IGNY8 Phase 2: Rich Schema & SERP Enhancement (02G)
|
|
## JSON-LD Schema Generation & On-Page SERP Element Injection
|
|
|
|
**Document Version:** 1.0
|
|
**Date:** 2026-03-23
|
|
**Phase:** IGNY8 Phase 2 — Feature Expansion
|
|
**Status:** Build Ready
|
|
**Source of Truth:** Codebase at `/data/app/igny8/`
|
|
**Audience:** Claude Code, Backend Developers, Architects
|
|
|
|
---
|
|
|
|
## 1. CURRENT STATE
|
|
|
|
### Schema Markup Today
|
|
The `Content` model (app_label=`writer`, db_table=`igny8_content`) has a `schema_markup` JSONField that stores raw JSON-LD. The AI function `generate_content` occasionally includes basic Article schema, but the output is inconsistent and unvalidated.
|
|
|
|
### What Works Now
|
|
- `Content.schema_markup` — JSONField exists, sometimes populated during generation
|
|
- `generate_content` AI function — may produce rudimentary Article schema as part of content output
|
|
- `ContentTypeTemplate` model (added by 02A) defines section layouts and presets per content type
|
|
- 02A added `Content.structured_data` JSONField for type-specific data (product specs, service steps, etc.)
|
|
|
|
### What Does Not Exist
|
|
- No systematic schema generation by content type
|
|
- No on-page SERP element injection (TL;DR, TOC, Key Takeaways, etc.)
|
|
- No schema validation against Google Rich Results requirements
|
|
- No retroactive enhancement of already-published content
|
|
- No SchemaTemplate model, no SERPEnhancement model, no validation records
|
|
- No SERP element tracking per content
|
|
|
|
### Phase 1 & 2A Foundation Available
|
|
- `SAGCluster.cluster_type` choices: `product_category`, `condition_problem`, `feature`, `brand`, `informational`, `comparison`
|
|
- 01E blueprint-aware pipeline provides `blueprint_context` with `cluster_type`, `content_structure`, `content_type`
|
|
- 02A content type routing provides type-specific generation with section layouts
|
|
- `Content.content_type` choices: `post`, `page`, `product`, `taxonomy`
|
|
- `Content.content_structure` choices: 14 structure types including `cluster_hub`, `product_page`, `service_page`, `comparison`, `review`
|
|
|
|
---
|
|
|
|
## 2. WHAT TO BUILD
|
|
|
|
### Overview
|
|
Build a schema generation and SERP enhancement system that:
|
|
1. Generates correct JSON-LD structured data for 10 schema types, mapped to content type/structure
|
|
2. Injects 8 on-page SERP elements into `content_html` to improve rich snippet eligibility
|
|
3. Validates schema against Google Rich Results requirements
|
|
4. Retroactively enhances existing published content with missing schema and SERP elements
|
|
|
|
### 2.1 JSON-LD Schema Types (10 Types)
|
|
|
|
Each schema type maps to specific `content_type` + `content_structure` combinations:
|
|
|
|
| # | Schema Type | Applies To | Key Fields |
|
|
|---|------------|-----------|------------|
|
|
| 1 | **Article / BlogPosting** | `post` (all structures) | headline, datePublished, dateModified, author (Person/Organization), publisher, image, description, mainEntityOfPage, wordCount, articleSection |
|
|
| 2 | **Product** | `product` / `product_page` | name, description, image, brand, offers (price, priceCurrency, availability, url), aggregateRating, review, sku, gtin |
|
|
| 3 | **Service** | `page` / `service_page` | name, description, provider (Organization), serviceType, areaServed, hasOfferCatalog, offers |
|
|
| 4 | **LocalBusiness** | Sites with physical location (site-level config) | name, address, telephone, openingHours, geo, image, priceRange, sameAs, hasMap |
|
|
| 5 | **Organization** | Site-wide (homepage schema) | name, url, logo, sameAs[], contactPoint, foundingDate, founders |
|
|
| 6 | **BreadcrumbList** | All pages | itemListElement [{position, name, item(URL)}] — auto-generated from SAG hierarchy or WP breadcrumb trail |
|
|
| 7 | **FAQPage** | Content with FAQ sections (auto-detected from H2/H3 question patterns) | mainEntity [{@type: Question, name, acceptedAnswer: {text}}] |
|
|
| 8 | **HowTo** | Step-by-step content (detected from ordered lists with process indicators) | name, step [{@type: HowToStep, name, text, image, url}], totalTime, estimatedCost |
|
|
| 9 | **VideoObject** | Content with video embeds (02I integration) | name, description, thumbnailUrl, uploadDate, duration, contentUrl, embedUrl |
|
|
| 10 | **WebSite + SearchAction** | Site-wide (homepage) | name, url, potentialAction (SearchAction with query-input) |
|
|
|
|
**Auto-Detection Rules:**
|
|
- FAQPage: detected when content has H2/H3 headings matching question patterns (starts with "What", "How", "Why", "When", "Is", "Can", "Does", "Should") or explicit `<div class="faq-section">` blocks
|
|
- HowTo: detected when content has ordered lists (`<ol>`) combined with process language ("Step 1", "First", "Next", etc.)
|
|
- VideoObject: detected when `<iframe>` or `<video>` tags present, or when 02I VideoProject is linked to content
|
|
- BreadcrumbList: always generated — uses SAG hierarchy (Site → Sector → Cluster → Content) or WordPress breadcrumb trail from SiteIntegration sync
|
|
|
|
**Schema Stacking:** A single content piece can have multiple schemas. An article with FAQ and video gets Article + FAQPage + VideoObject + BreadcrumbList — all in a single `<script type="application/ld+json">` array.
|
|
|
|
### 2.2 On-Page SERP Elements (8 Types)
|
|
|
|
SERP elements are HTML blocks injected into `content_html` to improve featured snippet and rich result eligibility:
|
|
|
|
| # | Element | Description | Insertion Point | Detection / Source |
|
|
|---|---------|-------------|----------------|-------------------|
|
|
| 1 | **TL;DR Box** | 2-3 sentence summary in styled box | Top of article, after first paragraph | AI-generated from content |
|
|
| 2 | **Table of Contents** | Auto-generated from H2/H3 headings with anchor links | After intro paragraph, before first H2 | Parsed from content headings |
|
|
| 3 | **Key Takeaways** | Bullet list of main points in styled box | After TL;DR or after intro | AI-generated from content |
|
|
| 4 | **Definition Boxes** | Highlighted term definitions | Inline, after first use of defined term | AI detects key terms + generates definitions |
|
|
| 5 | **Comparison Tables** | Structured HTML tables for comparison content | Within body, at relevant H2 section | AI-generated for `comparison`, `review` structures |
|
|
| 6 | **People Also Ask** | Related questions with expandable answers | Before conclusion or after last H2 | AI-generated from content + cluster keywords |
|
|
| 7 | **Statistics Callouts** | Visual callout boxes for key numbers/stats | Inline, wrapping existing stats in text | Regex detection of numbers/percentages in text |
|
|
| 8 | **Pro/Con Boxes** | Structured pros and cons for review/comparison content | Within body, at relevant section | AI-generated for `review`, `comparison`, `product_page` structures |
|
|
|
|
**SERP Element Applicability by Content Structure:**
|
|
|
|
| Structure | TL;DR | TOC | Key Takeaways | Definitions | Comparison | PAA | Stats | Pro/Con |
|
|
|-----------|-------|-----|---------------|-------------|------------|-----|-------|---------|
|
|
| `article` | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |
|
|
| `guide` | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |
|
|
| `comparison` | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |
|
|
| `review` | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |
|
|
| `listicle` | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ |
|
|
| `landing_page` | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ |
|
|
| `service_page` | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ |
|
|
| `product_page` | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |
|
|
| `cluster_hub` | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |
|
|
|
|
### 2.3 Retroactive Enhancement Engine
|
|
|
|
For existing published content that was generated before this module:
|
|
|
|
1. **Scan Phase:** Query all Content records where `schema_markup` is empty/incomplete OR `serp_elements` is null/empty
|
|
2. **Priority Ordering:** Highest-traffic pages first (using GSC data from 02C `GSCMetricsCache` if available, otherwise by `created_at` DESC)
|
|
3. **Generate Phase:** For each content, determine applicable schema types + SERP elements based on `content_type`, `content_structure`, and HTML analysis
|
|
4. **Preview Mode:** Store generated schema and SERP HTML in model records without modifying Content — user reviews before applying
|
|
5. **Apply Phase:** On approval, update `Content.schema_markup` and inject SERP element HTML into `Content.content_html`
|
|
6. **Batch Processing:** Process 10 content items per Celery task, with configurable batch size
|
|
|
|
---
|
|
|
|
## 3. DATA MODELS & APIS
|
|
|
|
### 3.1 New Models
|
|
|
|
#### SchemaTemplate (writer app)
|
|
|
|
```python
|
|
class SchemaTemplate(AccountBaseModel):
|
|
"""
|
|
Reusable JSON-LD schema templates with placeholder fields.
|
|
Account-level: account admins can customize templates.
|
|
"""
|
|
schema_type = models.CharField(
|
|
max_length=30,
|
|
choices=[
|
|
('article', 'Article / BlogPosting'),
|
|
('product', 'Product'),
|
|
('service', 'Service'),
|
|
('localbusiness', 'LocalBusiness'),
|
|
('organization', 'Organization'),
|
|
('breadcrumb', 'BreadcrumbList'),
|
|
('faq', 'FAQPage'),
|
|
('howto', 'HowTo'),
|
|
('video', 'VideoObject'),
|
|
('website', 'WebSite + SearchAction'),
|
|
]
|
|
)
|
|
content_type_match = models.CharField(
|
|
max_length=20,
|
|
choices=CONTENT_TYPE_CHOICES,
|
|
help_text='Which content_type this template applies to'
|
|
)
|
|
content_structure_match = models.CharField(
|
|
max_length=30,
|
|
choices=CONTENT_STRUCTURE_CHOICES,
|
|
blank=True,
|
|
null=True,
|
|
help_text='Further filter by content_structure (null = any)'
|
|
)
|
|
template_json = models.JSONField(
|
|
help_text='JSON-LD template with {{placeholder}} fields'
|
|
)
|
|
required_fields = models.JSONField(
|
|
default=list,
|
|
help_text='List of required field paths for validation'
|
|
)
|
|
is_default = models.BooleanField(default=False)
|
|
|
|
class Meta:
|
|
app_label = 'writer'
|
|
db_table = 'igny8_schema_templates'
|
|
unique_together = [
|
|
('account', 'schema_type', 'content_type_match', 'content_structure_match')
|
|
]
|
|
```
|
|
|
|
**PK:** BigAutoField (integer) — inherits from AccountBaseModel
|
|
**Relationships:** account FK (from AccountBaseModel)
|
|
|
|
#### SERPEnhancement (writer app)
|
|
|
|
```python
|
|
class SERPEnhancement(SiteSectorBaseModel):
|
|
"""
|
|
Tracks individual SERP enhancement elements generated for content.
|
|
One record per enhancement type per content.
|
|
"""
|
|
ENHANCEMENT_TYPE_CHOICES = [
|
|
('tldr', 'TL;DR Box'),
|
|
('toc', 'Table of Contents'),
|
|
('key_takeaways', 'Key Takeaways'),
|
|
('definition', 'Definition Box'),
|
|
('comparison_table', 'Comparison Table'),
|
|
('paa', 'People Also Ask'),
|
|
('stats_callout', 'Statistics Callout'),
|
|
('pro_con', 'Pro/Con Box'),
|
|
]
|
|
|
|
content = models.ForeignKey(
|
|
'writer.Content',
|
|
on_delete=models.CASCADE,
|
|
related_name='serp_enhancement_records'
|
|
)
|
|
enhancement_type = models.CharField(max_length=20, choices=ENHANCEMENT_TYPE_CHOICES)
|
|
html_snippet = models.TextField(
|
|
help_text='Generated HTML block to inject into content_html'
|
|
)
|
|
insertion_point = models.CharField(
|
|
max_length=30,
|
|
help_text='Where in content: top, after_intro, before_h2_N, bottom'
|
|
)
|
|
status = models.CharField(
|
|
max_length=15,
|
|
choices=[
|
|
('generated', 'Generated'),
|
|
('inserted', 'Inserted'),
|
|
('removed', 'Removed'),
|
|
],
|
|
default='generated'
|
|
)
|
|
generated_at = models.DateTimeField(auto_now_add=True)
|
|
|
|
class Meta:
|
|
app_label = 'writer'
|
|
db_table = 'igny8_serp_enhancements'
|
|
unique_together = [('content', 'enhancement_type')]
|
|
```
|
|
|
|
**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel
|
|
**Relationships:** content FK → Content, site FK + sector FK + account FK (from SiteSectorBaseModel)
|
|
|
|
#### SchemaValidationResult (writer app)
|
|
|
|
```python
|
|
class SchemaValidationResult(SiteSectorBaseModel):
|
|
"""
|
|
Stores schema validation results per content per schema type.
|
|
"""
|
|
content = models.ForeignKey(
|
|
'writer.Content',
|
|
on_delete=models.CASCADE,
|
|
related_name='schema_validations'
|
|
)
|
|
schema_type = models.CharField(max_length=30)
|
|
is_valid = models.BooleanField(default=False)
|
|
errors = models.JSONField(default=list, help_text='List of validation error strings')
|
|
warnings = models.JSONField(default=list, help_text='List of validation warning strings')
|
|
validated_at = models.DateTimeField(auto_now_add=True)
|
|
|
|
class Meta:
|
|
app_label = 'writer'
|
|
db_table = 'igny8_schema_validation_results'
|
|
```
|
|
|
|
**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel
|
|
|
|
### 3.2 Modified Models
|
|
|
|
#### Content (writer app) — add field
|
|
|
|
```python
|
|
# Add to Content model:
|
|
serp_elements = models.JSONField(
|
|
default=dict,
|
|
blank=True,
|
|
help_text='Tracks which SERP enhancements are active: {type: True/False}'
|
|
)
|
|
```
|
|
|
|
**Existing field used:** `Content.schema_markup` (JSONField) — now systematically populated by this module instead of ad-hoc AI output.
|
|
|
|
### 3.3 Migration
|
|
|
|
Single migration in writer app:
|
|
|
|
```
|
|
igny8_core/migrations/XXXX_add_schema_serp_models.py
|
|
```
|
|
|
|
**Operations:**
|
|
1. `CreateModel('SchemaTemplate', ...)` — with unique_together constraint
|
|
2. `CreateModel('SERPEnhancement', ...)` — with unique_together constraint
|
|
3. `CreateModel('SchemaValidationResult', ...)`
|
|
4. `AddField('Content', 'serp_elements', JSONField(default=dict, blank=True))`
|
|
|
|
### 3.4 API Endpoints
|
|
|
|
All endpoints under `/api/v1/writer/` — extends the existing writer app URL namespace.
|
|
|
|
#### Schema Generation
|
|
| Method | Path | Description |
|
|
|--------|------|-------------|
|
|
| POST | `/api/v1/writer/schema/generate/` | Generate schema for single content. Body: `{content_id}`. Returns JSON-LD + updates `Content.schema_markup`. |
|
|
| POST | `/api/v1/writer/schema/validate/` | Validate existing schema against Google requirements. Body: `{content_id}`. Returns SchemaValidationResult. |
|
|
| POST | `/api/v1/writer/schema/batch-generate/` | Batch generate schema. Body: `{content_ids: [int], site_id}`. Queues Celery task. Returns task ID. |
|
|
| GET | `/api/v1/writer/schema/templates/` | List SchemaTemplate records. Query params: `account_id`, `schema_type`, `content_type_match`. |
|
|
| GET | `/api/v1/writer/schema/audit/?site_id=X` | Schema coverage audit — returns counts of content with/without schema per type. |
|
|
| POST | `/api/v1/writer/schema/retroactive/` | Trigger retroactive schema scan. Body: `{site_id, batch_size}`. Queues Celery task. |
|
|
|
|
#### SERP Enhancement
|
|
| Method | Path | Description |
|
|
|--------|------|-------------|
|
|
| POST | `/api/v1/writer/serp/enhance/` | Generate SERP elements for single content. Body: `{content_id, element_types: []}`. Returns SERPEnhancement records. |
|
|
| POST | `/api/v1/writer/serp/batch-enhance/` | Batch enhancement. Body: `{content_ids: [int], site_id}`. Queues Celery task. |
|
|
| GET | `/api/v1/writer/serp/preview/{content_id}/` | Preview enhancements — returns modified HTML without applying. |
|
|
| POST | `/api/v1/writer/serp/apply/{content_id}/` | Apply enhancements — injects HTML into `Content.content_html` and updates `Content.serp_elements`. |
|
|
| POST | `/api/v1/writer/serp/remove/{content_id}/` | Remove specific SERP elements. Body: `{element_types: []}`. |
|
|
|
|
**Permissions:** All endpoints use `AccountModelViewSet` or `SiteSectorModelViewSet` permission patterns from existing codebase.
|
|
|
|
### 3.5 AI Functions
|
|
|
|
#### GenerateSchemaFunction (extends BaseAIFunction)
|
|
|
|
**Registry key:** `generate_schema`
|
|
**Location:** `igny8_core/ai/functions/generate_schema.py`
|
|
|
|
```python
|
|
class GenerateSchemaFunction(BaseAIFunction):
|
|
"""
|
|
Generates JSON-LD structured data for content.
|
|
Determines applicable schema types from content_type, content_structure,
|
|
and HTML analysis. Produces schema-stacked output.
|
|
"""
|
|
function_name = 'generate_schema'
|
|
|
|
def validate(self, content_id, **kwargs):
|
|
# Verify content exists and has content_html
|
|
pass
|
|
|
|
def prepare(self, content_id, **kwargs):
|
|
# Load Content, determine applicable schema types
|
|
# Load matching SchemaTemplate records
|
|
# Extract structured_data from Content (from 02A)
|
|
pass
|
|
|
|
def build_prompt(self):
|
|
# Include: content title, meta_description, content_html excerpt,
|
|
# content_type, content_structure, structured_data,
|
|
# schema template as example, required_fields list
|
|
pass
|
|
|
|
def parse_response(self, response):
|
|
# Parse JSON-LD array from AI response
|
|
# Validate against required_fields
|
|
pass
|
|
|
|
def save_output(self, parsed):
|
|
# Save to Content.schema_markup
|
|
# Create SchemaValidationResult records
|
|
pass
|
|
```
|
|
|
|
**Input:** `content_id` (int)
|
|
**Output:** JSON-LD array saved to `Content.schema_markup`
|
|
|
|
#### GenerateSERPElementsFunction (extends BaseAIFunction)
|
|
|
|
**Registry key:** `generate_serp_elements`
|
|
**Location:** `igny8_core/ai/functions/generate_serp_elements.py`
|
|
|
|
```python
|
|
class GenerateSERPElementsFunction(BaseAIFunction):
|
|
"""
|
|
Generates on-page SERP enhancement HTML for content.
|
|
Uses content structure and applicability matrix to determine which elements
|
|
to generate. Returns HTML snippets for each element.
|
|
"""
|
|
function_name = 'generate_serp_elements'
|
|
|
|
def validate(self, content_id, element_types=None, **kwargs):
|
|
# Verify content exists
|
|
# If element_types not specified, determine from applicability matrix
|
|
pass
|
|
|
|
def prepare(self, content_id, element_types=None, **kwargs):
|
|
# Load Content, parse content_html for headings/stats/terms
|
|
# Load cluster keywords for PAA generation
|
|
pass
|
|
|
|
def build_prompt(self):
|
|
# Per element type, build specific sub-prompts:
|
|
# - TL;DR: "Summarize in 2-3 sentences..."
|
|
# - Key Takeaways: "Extract 3-5 main points..."
|
|
# - PAA: "Generate 4-6 related questions..."
|
|
# - Definitions: "Identify key terms and define..."
|
|
# etc.
|
|
pass
|
|
|
|
def parse_response(self, response):
|
|
# Parse per-element HTML snippets from AI response
|
|
pass
|
|
|
|
def save_output(self, parsed):
|
|
# Create/update SERPEnhancement records per element
|
|
pass
|
|
```
|
|
|
|
**Input:** `content_id` (int), optional `element_types` (list of strings)
|
|
**Output:** SERPEnhancement records created, not yet injected into content_html
|
|
|
|
### 3.6 Schema Validation Service
|
|
|
|
**Location:** `igny8_core/business/schema_validation.py`
|
|
|
|
```python
|
|
class SchemaValidationService:
|
|
"""
|
|
Validates JSON-LD schema against Google Rich Results requirements.
|
|
Not just schema.org compliance — checks Google-specific required fields.
|
|
"""
|
|
|
|
GOOGLE_REQUIRED_FIELDS = {
|
|
'article': ['headline', 'datePublished', 'author', 'image', 'publisher'],
|
|
'product': ['name', 'image', 'offers'],
|
|
'service': ['name', 'description', 'provider'],
|
|
'localbusiness': ['name', 'address'],
|
|
'organization': ['name', 'url', 'logo'],
|
|
'breadcrumb': ['itemListElement'],
|
|
'faq': ['mainEntity'],
|
|
'howto': ['name', 'step'],
|
|
'video': ['name', 'description', 'thumbnailUrl', 'uploadDate'],
|
|
'website': ['name', 'url', 'potentialAction'],
|
|
}
|
|
|
|
def validate(self, content_id):
|
|
"""
|
|
Validate all schema_markup entries for a content record.
|
|
Returns list of SchemaValidationResult records.
|
|
"""
|
|
pass
|
|
|
|
def _validate_single(self, schema_json, schema_type):
|
|
"""
|
|
Validate a single schema entry against required fields.
|
|
Returns (is_valid, errors[], warnings[]).
|
|
"""
|
|
pass
|
|
|
|
def auto_fix(self, content_id):
|
|
"""
|
|
Attempt to fix common schema issues:
|
|
- Missing dateModified → copy from updated_at
|
|
- Missing image → use first image from Images model
|
|
- Missing publisher → use site/account Organization schema
|
|
"""
|
|
pass
|
|
```
|
|
|
|
### 3.7 SERP Element Injection Service
|
|
|
|
**Location:** `igny8_core/business/serp_injection.py`
|
|
|
|
```python
|
|
class SERPInjectionService:
|
|
"""
|
|
Injects SERP enhancement HTML snippets into content_html.
|
|
Handles insertion point resolution and collision avoidance.
|
|
"""
|
|
|
|
INSERTION_ORDER = [
|
|
'tldr', # After first paragraph
|
|
'toc', # After intro, before first H2
|
|
'key_takeaways', # After TL;DR or after intro
|
|
'definition', # Inline, after first use of term
|
|
'comparison_table', # Within body at relevant H2
|
|
'stats_callout', # Inline, wrapping existing stats
|
|
'pro_con', # Within body at relevant section
|
|
'paa', # Before conclusion or after last H2
|
|
]
|
|
|
|
def inject(self, content_id):
|
|
"""
|
|
Inject all 'generated' SERPEnhancement records into content_html.
|
|
Updates Content.content_html and Content.serp_elements tracking field.
|
|
Marks SERPEnhancement records as 'inserted'.
|
|
"""
|
|
pass
|
|
|
|
def remove(self, content_id, element_types):
|
|
"""
|
|
Remove specified SERP elements from content_html.
|
|
Each element is wrapped in <div data-serp-element="{type}"> for removal.
|
|
"""
|
|
pass
|
|
|
|
def preview(self, content_id):
|
|
"""
|
|
Return modified content_html with enhancements WITHOUT saving.
|
|
"""
|
|
pass
|
|
```
|
|
|
|
**SERP Element HTML Wrapping Convention:**
|
|
All injected elements are wrapped with a data attribute for identification:
|
|
```html
|
|
<div data-serp-element="tldr" class="igny8-serp-tldr">
|
|
<!-- Generated TL;DR content -->
|
|
</div>
|
|
```
|
|
This allows reliable removal/replacement without corrupting surrounding content.
|
|
|
|
---
|
|
|
|
## 4. IMPLEMENTATION STEPS
|
|
|
|
### Step 1: Migration & Models
|
|
1. Create `SchemaTemplate` model in writer app
|
|
2. Create `SERPEnhancement` model in writer app
|
|
3. Create `SchemaValidationResult` model in writer app
|
|
4. Add `serp_elements` JSONField to Content model
|
|
5. Run migration
|
|
|
|
### Step 2: Schema Templates Seed Data
|
|
Create default SchemaTemplate records for each of the 10 schema types:
|
|
|
|
| schema_type | content_type_match | content_structure_match | is_default |
|
|
|------------|-------------------|------------------------|------------|
|
|
| `article` | `post` | `null` (any) | True |
|
|
| `product` | `product` | `null` | True |
|
|
| `product` | `post` | `product_page` | True |
|
|
| `service` | `page` | `service_page` | True |
|
|
| `localbusiness` | `page` | `null` | True |
|
|
| `organization` | `page` | `business_page` | True |
|
|
| `breadcrumb` | `post` | `null` | True |
|
|
| `breadcrumb` | `page` | `null` | True |
|
|
| `breadcrumb` | `product` | `null` | True |
|
|
| `faq` | `post` | `null` | True |
|
|
| `howto` | `post` | `null` | True |
|
|
| `video` | `post` | `null` | True |
|
|
| `website` | `page` | `null` | True |
|
|
|
|
Seed via data migration or management command `seed_schema_templates`.
|
|
|
|
### Step 3: AI Functions
|
|
1. Implement `GenerateSchemaFunction` in `igny8_core/ai/functions/generate_schema.py`
|
|
2. Implement `GenerateSERPElementsFunction` in `igny8_core/ai/functions/generate_serp_elements.py`
|
|
3. Register both in `igny8_core/ai/registry.py`
|
|
|
|
### Step 4: Services
|
|
1. Implement `SchemaValidationService` in `igny8_core/business/schema_validation.py`
|
|
2. Implement `SERPInjectionService` in `igny8_core/business/serp_injection.py`
|
|
|
|
### Step 5: Pipeline Integration
|
|
Integrate schema generation into the content pipeline after Stage 4 (content generation):
|
|
|
|
```python
|
|
# In content generation pipeline (01E blueprint-aware-pipeline):
|
|
# After GenerateContentFunction completes:
|
|
def post_content_generation(content_id):
|
|
# Auto-generate schema based on content type
|
|
generate_schema_fn = registry.get('generate_schema')
|
|
generate_schema_fn.execute(content_id=content_id)
|
|
|
|
# Auto-generate applicable SERP elements
|
|
generate_serp_fn = registry.get('generate_serp_elements')
|
|
generate_serp_fn.execute(content_id=content_id)
|
|
|
|
# Inject SERP elements into content_html
|
|
injection_service = SERPInjectionService()
|
|
injection_service.inject(content_id)
|
|
```
|
|
|
|
### Step 6: API Endpoints
|
|
1. Add schema endpoints to `igny8_core/urls/writer.py`
|
|
2. Create `SchemaGenerateView`, `SchemaValidateView`, `SchemaBatchGenerateView`
|
|
3. Create `SERPEnhanceView`, `SERPBatchEnhanceView`, `SERPPreviewView`, `SERPApplyView`
|
|
4. Create `SchemaAuditView`, `SchemaRetroactiveView`
|
|
|
|
### Step 7: Celery Tasks
|
|
Register in `igny8_core/tasks/` and add beat schedule entries:
|
|
|
|
```python
|
|
# igny8_core/tasks/schema_tasks.py
|
|
|
|
@shared_task(name='generate_schema_for_content')
|
|
def generate_schema_for_content(content_id):
|
|
"""After content generation, auto-generate schema."""
|
|
pass
|
|
|
|
@shared_task(name='retroactive_schema_scan')
|
|
def retroactive_schema_scan(site_id, batch_size=10):
|
|
"""Scan existing content and generate missing schemas in batches."""
|
|
pass
|
|
|
|
@shared_task(name='validate_schemas_batch')
|
|
def validate_schemas_batch(site_id):
|
|
"""Periodic validation of all schemas for a site."""
|
|
pass
|
|
```
|
|
|
|
**Beat Schedule Additions:**
|
|
|
|
| Task | Schedule | Notes |
|
|
|------|----------|-------|
|
|
| `validate_schemas_batch` | Weekly (Sunday 3:00 AM) | Validates all schemas, creates SchemaValidationResult records |
|
|
|
|
### Step 8: Serializers & Admin
|
|
1. Create DRF serializers for SchemaTemplate, SERPEnhancement, SchemaValidationResult
|
|
2. Register models in Django admin for inspection
|
|
|
|
### Step 9: Credit Cost Configuration
|
|
Add to `CreditCostConfig` (billing app):
|
|
|
|
| operation_type | default_cost | description |
|
|
|---------------|-------------|-------------|
|
|
| `schema_generation` | 1 | Generate JSON-LD schema for one content |
|
|
| `serp_element_generation` | 0.5 | Generate one SERP element |
|
|
| `schema_validation` | 0.1 | Validate schema for one content |
|
|
| `schema_batch` | 8-12 | Batch generate for 10 items (varies by content) |
|
|
|
|
Credit deduction follows existing `CreditUsageLog` pattern: log entry created per operation with `operation_type`, `credits_used`, `content` FK.
|
|
|
|
---
|
|
|
|
## 5. ACCEPTANCE CRITERIA
|
|
|
|
### Schema Generation
|
|
- [ ] Article/BlogPosting schema generated for all `content_type='post'` content
|
|
- [ ] Product schema generated for `content_type='product'` and `content_structure='product_page'` content
|
|
- [ ] Service schema generated for `content_structure='service_page'` content
|
|
- [ ] BreadcrumbList schema generated for all content using SAG hierarchy
|
|
- [ ] FAQPage schema auto-detected and generated when content has question-pattern headings
|
|
- [ ] HowTo schema auto-detected and generated when content has step-by-step lists
|
|
- [ ] Schema stacking works — content with FAQ + Article gets both schemas in array
|
|
- [ ] All schemas pass SchemaValidationService checks for Google required fields
|
|
|
|
### SERP Enhancement
|
|
- [ ] TL;DR box generated and injected for applicable content structures
|
|
- [ ] Table of Contents auto-generated from H2/H3 headings with working anchor links
|
|
- [ ] Key Takeaways bullet list generated for applicable content
|
|
- [ ] People Also Ask section generated with 4-6 questions + answers
|
|
- [ ] Comparison Tables generated for comparison/review content
|
|
- [ ] Pro/Con boxes generated for review/product_page content
|
|
- [ ] All SERP elements wrapped in `<div data-serp-element="{type}">` for reliable removal
|
|
- [ ] SERP elements can be removed without corrupting content
|
|
- [ ] Applicability matrix enforced — no TL;DR on landing_page, etc.
|
|
|
|
### Retroactive Enhancement
|
|
- [ ] Retroactive scan identifies content missing schema by type
|
|
- [ ] Priority ordering by traffic (GSC data) or creation date
|
|
- [ ] Preview mode shows changes without modifying Content
|
|
- [ ] Batch processing handles 10 items per task run
|
|
- [ ] Applied enhancements update Content.schema_markup and Content.serp_elements
|
|
|
|
### Validation
|
|
- [ ] SchemaValidationResult records created for each validation run
|
|
- [ ] Validation checks Google-specific required fields (not just schema.org)
|
|
- [ ] Auto-fix resolves common issues (missing dateModified, image, publisher)
|
|
- [ ] Weekly batch validation catches schema drift
|
|
|
|
### Integration
|
|
- [ ] Schema generation triggers automatically after content generation in pipeline
|
|
- [ ] SERP elements generated and injected as part of pipeline flow
|
|
- [ ] Credit costs deducted per CreditCostConfig entries
|
|
- [ ] All API endpoints respect account/site permission boundaries
|
|
|
|
---
|
|
|
|
## 6. CLAUDE CODE INSTRUCTIONS
|
|
|
|
### File Locations
|
|
```
|
|
igny8_core/
|
|
├── ai/
|
|
│ └── functions/
|
|
│ ├── generate_schema.py # GenerateSchemaFunction
|
|
│ └── generate_serp_elements.py # GenerateSERPElementsFunction
|
|
├── business/
|
|
│ ├── schema_validation.py # SchemaValidationService
|
|
│ └── serp_injection.py # SERPInjectionService
|
|
├── tasks/
|
|
│ └── schema_tasks.py # Celery tasks
|
|
├── urls/
|
|
│ └── writer.py # Add schema + serp endpoints
|
|
└── migrations/
|
|
└── XXXX_add_schema_serp_models.py # Models + Content.serp_elements
|
|
```
|
|
|
|
### Conventions
|
|
- **PKs:** BigAutoField (integer) — do NOT use UUIDs
|
|
- **Table prefix:** `igny8_` on all new tables
|
|
- **Celery app name:** `igny8_core`
|
|
- **URL pattern:** `/api/v1/writer/schema/...` and `/api/v1/writer/serp/...`
|
|
- **Permissions:** Use `AccountModelViewSet` / `SiteSectorModelViewSet` patterns
|
|
- **AI functions:** Extend `BaseAIFunction` with `validate()`, `prepare()`, `build_prompt()`, `parse_response()`, `save_output()`
|
|
- **Registry:** Register new AI functions in `igny8_core/ai/registry.py`
|
|
- **Frontend:** `.tsx` files with Zustand stores for state management
|
|
|
|
### Cross-References
|
|
| Doc | Relationship |
|
|
|-----|-------------|
|
|
| **02A** | Content type determines which schema type to generate; ContentTypeTemplate section layouts inform schema field population |
|
|
| **02F** | Optimizer detects schema gaps and triggers schema generation/fix |
|
|
| **02I** | VideoObject schema generated for content with linked VideoProject |
|
|
| **03A** | WP plugin standalone mode has its own schema module — different from this IGNY8-native implementation |
|
|
| **03B** | Connected mode pushes schema to WordPress via bulk endpoint |
|
|
| **01E** | Pipeline integration — schema generation hooks after Stage 4 content generation |
|
|
| **01G** | SAG health monitoring can incorporate schema completeness as a health factor |
|
|
|
|
### Key Decisions
|
|
1. **Writer app, not separate app** — SchemaTemplate, SERPEnhancement, SchemaValidationResult all live in the `writer` app since they are tightly coupled to Content
|
|
2. **Schema stacking** — multiple schemas per content stored as JSON array in `Content.schema_markup`
|
|
3. **SERP element wrapping** — all injected HTML uses `data-serp-element` attribute for non-destructive add/remove
|
|
4. **Preview before apply** — retroactive enhancements always go through preview state
|
|
5. **Content.serp_elements tracking field** — JSONField dict `{type: True/False}` for fast lookups without querying SERPEnhancement table
|