diff --git a/v2/V2-Execution-Docs/02A-content-types-extension.md b/v2/V2-Execution-Docs/02A-content-types-extension.md new file mode 100644 index 00000000..04582d00 --- /dev/null +++ b/v2/V2-Execution-Docs/02A-content-types-extension.md @@ -0,0 +1,552 @@ +# IGNY8 Phase 2: Content Types Extension (02A) +## Type-Specific Content Generation Pipeline + +**Document Version:** 1.0 +**Date:** 2026-03-23 +**Phase:** IGNY8 Phase 2 — Feature Expansion +**Status:** Build Ready +**Source of Truth:** Codebase at `/data/app/igny8/` +**Audience:** Claude Code, Backend Developers, Architects + +--- + +## 1. CURRENT STATE + +### Content Model Today +The `Content` model (app_label=`writer`, db_table=`igny8_content`) already supports multiple content types via two fields: + +```python +CONTENT_TYPE_CHOICES = [ + ('post', 'Post'), + ('page', 'Page'), + ('product', 'Product'), + ('taxonomy', 'Taxonomy'), +] + +CONTENT_STRUCTURE_CHOICES = [ + ('article', 'Article'), + ('guide', 'Guide'), + ('comparison', 'Comparison'), + ('review', 'Review'), + ('listicle', 'Listicle'), + ('landing_page', 'Landing Page'), + ('business_page', 'Business Page'), + ('service_page', 'Service Page'), + ('general', 'General'), + ('cluster_hub', 'Cluster Hub'), + ('product_page', 'Product Page'), + ('category_archive', 'Category Archive'), + ('tag_archive', 'Tag Archive'), + ('attribute_archive', 'Attribute Archive'), +] +``` + +These same choices exist on `Tasks` (db_table=`igny8_tasks`) and `ContentIdeas` (db_table=`igny8_content_ideas`). + +### What Works Now +- The `generate_content` AI function in `igny8_core/ai/functions/generate_content.py` produces blog-style articles regardless of the `content_type` field +- Only `content_type='post'` with `content_structure='article'` is actively used by the automation pipeline +- Pipeline Stage 4 (Tasks → Content) does not route to type-specific prompts +- No type-specific section layouts, presets, or schema generation exist + +### Phase 1 Foundation Available +- `SAGCluster.cluster_type` choices: `product_category`, `condition_problem`, `feature`, `brand`, `informational`, `comparison` +- `SAGCluster.hub_page_type` (default `cluster_hub`) and `hub_page_structure` (guide_tutorial, product_comparison, category_overview, problem_solution, resource_library) +- 01E blueprint-aware pipeline provides `blueprint_context` to each stage with `cluster_type`, `content_structure`, and `content_type` fields +- 01E defines 12 content type → template key mappings (e.g., `sag_hub_guide`, `sag_blog_comparison`, `sag_product_page`) + +### Gap +The template keys from 01E (`sag_hub_guide`, `sag_blog_comparison`, etc.) route to LLM prompt templates — but those templates don't exist yet. The actual type-specific prompt logic, section layouts, field schemas, and generation presets are what this doc delivers. + +--- + +## 2. WHAT TO BUILD + +### Overview +Extend the content generation pipeline to produce structurally different output for 6 content type categories. Each type gets: +- **Section layout templates** — defining the structure of the generated content (sections, order, constraints) +- **Type-specific AI prompts** — prompt templates tailored to the content type's purpose +- **Generation presets** — default word counts, image counts, schema types, tone +- **Structured data fields** — type-specific data (product specs, service steps, comparison items) stored in a JSONField + +### Content Type Extensions + +**Type 1: Pages** (content_type=`page`) +| Structure | Purpose | Section Layout | +|-----------|---------|----------------| +| `landing_page` | Conversion-focused landing page | Hero → Features → Benefits → Social Proof → CTA | +| `business_page` | About/company page | Company Intro → History → Values → Team → CTA | +| `service_page` | Service offering page | Problem → Solution → Process → Outcomes → Pricing → FAQ → CTA | +| `general` | Generic page | Intro → Body Sections → CTA | +| `cluster_hub` | Cluster hub/pillar page | Overview → Subtopic Grid → Detailed Guides → Internal Links → FAQ | + +- **AI prompt tone:** Professional, conversion-focused, concise, benefit-driven +- **Schema:** WebPage (default), AboutPage, ContactPage as appropriate +- **Default word count:** 1,000–3,000 depending on structure + +**Type 2: Products** (content_type=`product`) +| Structure | Purpose | Section Layout | +|-----------|---------|----------------| +| `product_page` | Single product review/description | Overview → Features → Specifications → Pros/Cons → Verdict | +| `comparison` | A vs B product comparison | Introduction → Feature Matrix → Category Breakdown → Verdict | +| `listicle` | Top products roundup | Introduction → Product Cards (ranked) → Comparison Table → Verdict | + +- **Structured data fields:** `price_range` (JSON), `features` (JSON array), `specifications` (JSON), `pros_cons` (JSON {pros: [], cons: []}) +- **AI prompt tone:** Feature-benefit mapping, objective analysis, buyer-persona aware +- **Schema:** Product with offers, aggregateRating, review +- **Image presets:** Product hero image, feature highlight visuals, comparison table graphic +- **Default word count:** 1,500–4,000 + +**Type 3: Services** (content_type=`page`, content_structure=`service_page`) +| Structure | Purpose | Section Layout | +|-----------|---------|----------------| +| `service_page` | Core service offering page | Problem → Solution → Process Steps → Outcomes → Pricing Tiers → FAQ → CTA | +| `landing_page` | Service-specific landing (area variant) | Hero → Service Intro → Benefits → Testimonials → Area Info → CTA | + +- **Structured data fields:** `process_steps` (JSON array), `outcomes` (JSON array), `pricing_tiers` (JSON), `areas_served` (JSON array), `faqs` (JSON array of {question, answer}) +- **AI prompt tone:** Problem-solution, trust-building, process explanation, CTA-heavy +- **Schema:** Service, ProfessionalService, LocalBusiness+hasOfferCatalog +- **Geographic targeting:** Generate area-specific variations from a base service page +- **Default word count:** 1,500–3,500 + +**Type 4: Company Pages** (content_type=`page`, content_structure=`business_page`) +| Structure | Purpose | +|-----------|---------| +| `business_page` | About company, team, careers, press pages | + +- **AI prompt tone:** Brand voice emphasis, story-driven, credibility markers +- **Schema:** Organization, AboutPage +- **Default word count:** 800–2,000 + +**Type 5: Comparison Pages** (content_type=`post`, content_structure=`comparison`) +| Structure | Purpose | Section Layout | +|-----------|---------|----------------| +| `comparison` | A vs B analysis | Introduction → Feature Matrix → Category-by-Category → Winner → FAQ | +| `listicle` | Top N alternatives/roundup | Introduction → Comparison Table → Individual Reviews → Verdict | + +- **Structured data fields:** `comparison_items` (JSON array of {name, features, pros, cons, rating, verdict}) +- **AI prompt tone:** Objective analysis, data-driven, comparison tables, winner selection with reasoning +- **Schema:** Article with itemListElement (for multi-comparison) +- **Default word count:** 2,000–5,000 + +**Type 6: Brand Pages** (content_type=`page`, content_structure=`brand_page`) +| Structure | Purpose | +|-----------|---------| +| `brand_page` | Brand overview, review, or alternative recommendation page | + +- **AI prompt tone:** Brand-focused, factual, company background included +- **Schema:** Organization (for overview), Article (for review) +- **Default word count:** 1,000–3,000 + +### Blueprint-to-Type Mapping +When the pipeline executes with SAG context (01E), the `SAGCluster.cluster_type` informs which content types to generate: + +| SAGCluster.cluster_type | Primary content_type | Primary content_structure | +|-------------------------|---------------------|--------------------------| +| `informational` | post | article, guide | +| `comparison` | post | comparison, listicle | +| `product_category` | product | product_page, listicle | +| `feature` | page | landing_page, service_page | +| `brand` | page | brand_page | +| `condition_problem` | post | guide, article | + +Hub pages for any cluster type use `content_type=page`, `content_structure=cluster_hub`. + +--- + +## 3. DATA MODELS & APIs + +### New Choices (add to existing choice lists) + +```python +# Add to CONTENT_STRUCTURE_CHOICES on Content, Tasks, ContentIdeas +('brand_page', 'Brand Page'), +``` + +Note: Most structures already exist in the codebase. Only `brand_page` is new. + +### Modified Models + +**Content** (db_table=`igny8_content`) — add fields: +```python +sections = models.JSONField( + default=list, blank=True, + help_text="Ordered section data for structured content types" +) +# Structure: [{"type": "hero", "heading": "...", "body": "...", "cta": "..."}, ...] + +structured_data = models.JSONField( + default=dict, blank=True, + help_text="Type-specific data: product specs, service steps, comparison items" +) +# Structure varies by content_type — see type definitions above +``` + +**Tasks** (db_table=`igny8_tasks`) — add fields: +```python +structure_template = models.JSONField( + default=dict, blank=True, + help_text="Section layout template for content generation" +) +# Structure: {"sections": [{"type": "hero", "required": true, "max_words": 200}, ...]} + +type_presets = models.JSONField( + default=dict, blank=True, + help_text="Type-specific generation parameters" +) +# Structure: {"tone": "professional", "schema_type": "Product", "image_count": 3, ...} +``` + +### New Model + +```python +class ContentTypeTemplate(AccountBaseModel): + """ + Defines section layout and AI prompt templates per content_type + content_structure combination. + System-provided defaults (is_system=True) plus per-account custom templates. + """ + content_type = models.CharField(max_length=50, choices=CONTENT_TYPE_CHOICES, db_index=True) + content_structure = models.CharField(max_length=50, choices=CONTENT_STRUCTURE_CHOICES, db_index=True) + template_name = models.CharField(max_length=200) + section_layout = models.JSONField( + default=list, + help_text="Ordered sections: [{type, label, required, max_words, guidance}]" + ) + ai_prompt_template = models.TextField( + help_text="Base AI prompt template for this type. Supports {variables}." + ) + default_schema_type = models.CharField(max_length=100, blank=True, default='') + default_word_count_min = models.IntegerField(default=1000) + default_word_count_max = models.IntegerField(default=3000) + default_image_count = models.IntegerField(default=2) + tone = models.CharField(max_length=100, default='professional') + is_system = models.BooleanField(default=False, help_text="System-provided template (not editable by users)") + is_active = models.BooleanField(default=True) + + class Meta: + app_label = 'writer' + db_table = 'igny8_content_type_templates' + unique_together = [['account', 'content_type', 'content_structure', 'template_name']] + ordering = ['content_type', 'content_structure'] +``` + +### Migration + +``` +igny8_core/migrations/XXXX_content_types_extension.py +``` + +Fields added: +1. `Content.sections` — JSONField, default=list +2. `Content.structured_data` — JSONField, default=dict +3. `Tasks.structure_template` — JSONField, default=dict +4. `Tasks.type_presets` — JSONField, default=dict +5. `ContentTypeTemplate` new table +6. Add `('brand_page', 'Brand Page')` to CONTENT_STRUCTURE_CHOICES + +### API Endpoints + +**Content Type Templates:** +``` +GET /api/v1/writer/content-type-templates/ # List templates (filtered by content_type, content_structure) +POST /api/v1/writer/content-type-templates/ # Create custom template +GET /api/v1/writer/content-type-templates/{id}/ # Template detail +PUT /api/v1/writer/content-type-templates/{id}/ # Update custom template +DELETE /api/v1/writer/content-type-templates/{id}/ # Delete custom template (system templates cannot be deleted) +GET /api/v1/writer/content-type-templates/{id}/preview/ # Preview: generate sample section layout +``` + +**Modified Endpoints:** +``` +POST /api/v1/writer/tasks/ # Extend: accepts structure_template + type_presets +POST /api/v1/writer/generate/ # Extend: routes to type-specific AI prompt +GET /api/v1/writer/content/{id}/ # Response now includes sections + structured_data +``` + +**ViewSet:** +```python +# igny8_core/modules/writer/views/content_type_template_views.py +class ContentTypeTemplateViewSet(AccountModelViewSet): + serializer_class = ContentTypeTemplateSerializer + queryset = ContentTypeTemplate.objects.all() + filterset_fields = ['content_type', 'content_structure', 'is_system', 'is_active'] + + def get_queryset(self): + # Return system templates + account's custom templates + return ContentTypeTemplate.objects.filter( + models.Q(is_system=True) | models.Q(account=self.request.account) + ) + + @action(detail=True, methods=['get']) + def preview(self, request, pk=None): + template = self.get_object() + # Return rendered section layout with placeholder content + return Response({'sections': template.section_layout}) +``` + +**URL Registration:** +```python +# igny8_core/modules/writer/urls.py — add to existing router +router.register('content-type-templates', ContentTypeTemplateViewSet, basename='content-type-template') +``` + +### AI Function Extension + +Extend `GenerateContentFunction` in `igny8_core/ai/functions/generate_content.py`: + +```python +class GenerateContentFunction(BaseAIFunction): + def prepare(self, payload: dict, account=None) -> Any: + tasks = super().prepare(payload, account) + for task in tasks: + # Load template if not already set on task + if not task.structure_template: + template = ContentTypeTemplate.objects.filter( + models.Q(account=account) | models.Q(is_system=True), + content_type=task.content_type, + content_structure=task.content_structure, + is_active=True + ).order_by('-is_system').first() # Prefer account-specific over system + if template: + task._template = template + return tasks + + def build_prompt(self, data: Any, account=None) -> str: + task = data # Single task (batch_size=1) + template = getattr(task, '_template', None) + + if template: + prompt = template.ai_prompt_template.format( + title=task.title, + keywords=task.keywords or '', + word_count=task.word_count, + content_type=task.content_type, + content_structure=task.content_structure, + sections=json.dumps(template.section_layout), + schema_type=template.default_schema_type, + tone=template.tone, + ) + else: + # Fallback to existing blog-style prompt + prompt = self._build_default_prompt(task) + + # Inject blueprint context if available (from 01E) + blueprint_context = getattr(task, 'blueprint_context', None) + if blueprint_context: + prompt += f"\n\nCluster Context: {json.dumps(blueprint_context)}" + + return prompt + + def parse_response(self, response: str, step_tracker=None) -> Any: + parsed = super().parse_response(response, step_tracker) + # Extract sections array and structured_data if present in AI response + if isinstance(parsed, dict): + parsed.setdefault('sections', []) + parsed.setdefault('structured_data', {}) + return parsed + + def save_output(self, parsed, original_data, account=None, **kwargs) -> Dict: + result = super().save_output(parsed, original_data, account, **kwargs) + # Persist sections and structured_data on Content + if 'content_id' in result: + Content.objects.filter(id=result['content_id']).update( + sections=parsed.get('sections', []), + structured_data=parsed.get('structured_data', {}), + ) + return result +``` + +### System Template Seed Data + +Create a management command to seed default templates: + +```python +# igny8_core/management/commands/seed_content_type_templates.py +``` + +Seed templates (is_system=True, account=None): + +| content_type | content_structure | template_name | default_schema_type | word_count_range | +|---|---|---|---|---| +| post | article | Blog Article | Article | 1000–2500 | +| post | guide | Comprehensive Guide | Article | 2000–4000 | +| post | comparison | Comparison Article | Article | 2000–5000 | +| post | review | Product Review | Review | 1500–3000 | +| post | listicle | Listicle | Article | 1500–3500 | +| page | landing_page | Landing Page | WebPage | 1000–2500 | +| page | business_page | Business Page | AboutPage | 800–2000 | +| page | service_page | Service Page | Service | 1500–3500 | +| page | general | General Page | WebPage | 500–2000 | +| page | cluster_hub | Cluster Hub Page | CollectionPage | 2000–5000 | +| page | brand_page | Brand Page | Organization | 1000–3000 | +| product | product_page | Product Page | Product | 1500–4000 | +| taxonomy | category_archive | Category Archive | CollectionPage | 500–1500 | +| taxonomy | tag_archive | Tag Archive | CollectionPage | 500–1500 | +| taxonomy | attribute_archive | Attribute Archive | CollectionPage | 500–1500 | + +### Credit Costs + +No change to existing credit costs. Type routing changes the prompt structure but not the token volume — still 4–8 credits per content generation via `CreditCostConfig(operation_type='content_generation')`. + +--- + +## 4. IMPLEMENTATION STEPS + +### Step 1: Add New Fields to Existing Models +```bash +# Add fields to Content, Tasks models +# Add 'brand_page' to CONTENT_STRUCTURE_CHOICES +``` + +Files to modify: +- `backend/igny8_core/business/content/models.py` — add `sections`, `structured_data` to Content; add `structure_template`, `type_presets` to Tasks; add `brand_page` to CONTENT_STRUCTURE_CHOICES +- `backend/igny8_core/business/planning/models.py` — add `brand_page` to ContentIdeas CONTENT_STRUCTURE_CHOICES + +### Step 2: Create ContentTypeTemplate Model +```bash +# Create new model in writer app +``` + +File to create: +- `backend/igny8_core/business/content/content_type_template.py` (or add to existing `models.py`) + +### Step 3: Create and Run Migration +```bash +cd /data/app/igny8/backend +python manage.py makemigrations --name content_types_extension +python manage.py migrate +``` + +### Step 4: Create Serializers +Files to create/modify: +- `backend/igny8_core/modules/writer/serializers/content_type_template_serializer.py` +- Modify existing content serializer to include `sections` and `structured_data` +- Modify existing task serializer to include `structure_template` and `type_presets` + +### Step 5: Create ViewSet and URLs +Files to create: +- `backend/igny8_core/modules/writer/views/content_type_template_views.py` +- Modify `backend/igny8_core/modules/writer/urls.py` — register new ViewSet + +### Step 6: Extend GenerateContentFunction +File to modify: +- `backend/igny8_core/ai/functions/generate_content.py` — add type routing logic + +### Step 7: Create System Template Seed Command +File to create: +- `backend/igny8_core/management/commands/seed_content_type_templates.py` + +```bash +python manage.py seed_content_type_templates +``` + +### Step 8: Create Type-Specific Prompt Templates +Files to create in `backend/igny8_core/ai/prompts/`: +- `page_prompts.py` — landing_page, business_page, service_page, general, cluster_hub +- `product_prompts.py` — product_page, product comparison, product roundup +- `comparison_prompts.py` — versus, multi-comparison, alternatives +- `brand_prompts.py` — brand overview, brand review + +### Step 9: Frontend Updates +Files to create/modify in `frontend/src/`: +- `pages/Writer/ContentTypeTemplates.tsx` — template management page +- `stores/contentTypeTemplateStore.ts` — Zustand store +- `api/contentTypeTemplates.ts` — API client +- Modify task creation form to show type-specific template selection +- Modify content viewer to render section-based content + +### Step 10: Tests +```bash +cd /data/app/igny8/backend +python manage.py test igny8_core.business.content.tests.test_content_type_templates +python manage.py test igny8_core.ai.tests.test_generate_content_types +``` + +--- + +## 5. ACCEPTANCE CRITERIA + +- [ ] `Content.sections` and `Content.structured_data` fields exist and migrate successfully +- [ ] `Tasks.structure_template` and `Tasks.type_presets` fields exist and migrate successfully +- [ ] `ContentTypeTemplate` model created with all fields, `igny8_content_type_templates` table exists +- [ ] `brand_page` added to CONTENT_STRUCTURE_CHOICES on Content, Tasks, ContentIdeas +- [ ] 15 system templates seeded via management command +- [ ] `GET /api/v1/writer/content-type-templates/` returns system + account templates +- [ ] `POST /api/v1/writer/content-type-templates/` creates custom template (is_system=False) +- [ ] System templates cannot be modified or deleted by account users +- [ ] `GenerateContentFunction` routes to type-specific prompt when template exists +- [ ] `GenerateContentFunction` falls back to existing blog-style prompt when no template found +- [ ] Content generated with type template populates `sections` and `structured_data` fields +- [ ] Blueprint context from 01E is injected into prompts when SAG data available +- [ ] Frontend template management page allows CRUD on custom templates +- [ ] Task creation form shows template selection filtered by content_type + content_structure +- [ ] All new API endpoints require authentication and enforce account isolation +- [ ] Existing `content_type='post'` + `content_structure='article'` generation works unchanged (backward compatible) + +--- + +## 6. CLAUDE CODE INSTRUCTIONS + +### Execution Order +1. Read `backend/igny8_core/business/content/models.py` — understand existing Content and Tasks models +2. Read `backend/igny8_core/ai/functions/generate_content.py` — understand current generation logic +3. Read `backend/igny8_core/ai/base.py` and `backend/igny8_core/ai/registry.py` — understand base pattern +4. Add new fields + ContentTypeTemplate model +5. Create migration, run `makemigrations` + `migrate` +6. Build serializers, ViewSet, URLs +7. Extend GenerateContentFunction with type routing +8. Create seed command and run it +9. Build prompt templates per type +10. Build frontend components + +### Key Constraints +- ALL primary keys are `BigAutoField` (integer). No UUIDs anywhere. +- Model class names are PLURAL where applicable: `Tasks`, `ContentIdeas`, `Clusters`, `Keywords`, `Images`. `Content` stays singular. +- Frontend files use `.tsx` extension, Zustand for state management, Vitest for testing +- Celery app name is `igny8_core` +- All new tables use `igny8_` prefix +- Follow existing ViewSet pattern: `AccountModelViewSet` for account-scoped resources +- Follow existing serializer pattern: `ModelSerializer` with explicit `fields` +- Follow existing URL pattern: register on `DefaultRouter` in `igny8_core/modules/writer/urls.py` + +### File Tree (New/Modified) +``` +backend/igny8_core/ +├── business/content/ +│ └── models.py # MODIFY: add sections, structured_data, structure_template, type_presets, brand_page choice +│ └── content_type_template.py # NEW: ContentTypeTemplate model (or add to models.py) +├── business/planning/ +│ └── models.py # MODIFY: add brand_page to ContentIdeas CONTENT_STRUCTURE_CHOICES +├── ai/functions/ +│ └── generate_content.py # MODIFY: type routing + template-aware prompt building +├── ai/prompts/ +│ ├── page_prompts.py # NEW: landing page, business page, service page prompts +│ ├── product_prompts.py # NEW: product page, comparison, roundup prompts +│ ├── comparison_prompts.py # NEW: versus, multi-comparison prompts +│ └── brand_prompts.py # NEW: brand overview/review prompts +├── management/commands/ +│ └── seed_content_type_templates.py # NEW: seed system templates +├── modules/writer/ +│ ├── serializers/ +│ │ └── content_type_template_serializer.py # NEW +│ ├── views/ +│ │ └── content_type_template_views.py # NEW +│ └── urls.py # MODIFY: register new ViewSet +├── migrations/ +│ └── XXXX_content_types_extension.py # NEW: auto-generated + +frontend/src/ +├── pages/Writer/ +│ └── ContentTypeTemplates.tsx # NEW: template management +├── stores/ +│ └── contentTypeTemplateStore.ts # NEW: Zustand store +├── api/ +│ └── contentTypeTemplates.ts # NEW: API client +``` + +### Cross-References +- **01E** (blueprint-aware pipeline): blueprint_context injection, cluster_type → content_type mapping +- **01A** (SAG data foundation): SAGCluster.cluster_type, hub_page_type, hub_page_structure +- **02B** (taxonomy term content): uses content_type=taxonomy with ContentTypeTemplate +- **02G** (rich schema): schema_type from ContentTypeTemplate.default_schema_type +- **03B** (WP plugin connected): content sync maps content_type to WordPress post types diff --git a/v2/V2-Execution-Docs/02B-taxonomy-term-content.md b/v2/V2-Execution-Docs/02B-taxonomy-term-content.md new file mode 100644 index 00000000..09bac70c --- /dev/null +++ b/v2/V2-Execution-Docs/02B-taxonomy-term-content.md @@ -0,0 +1,642 @@ +# IGNY8 Phase 2: Taxonomy Term Content (02B) +## Rich Content Generation for Taxonomy Terms + +**Document Version:** 1.0 +**Date:** 2026-03-23 +**Phase:** IGNY8 Phase 2 — Feature Expansion +**Status:** Build Ready +**Source of Truth:** Codebase at `/data/app/igny8/` +**Audience:** Claude Code, Backend Developers, Architects + +--- + +## 1. CURRENT STATE + +### Existing Taxonomy Infrastructure +The taxonomy system is partially built: + +**ContentTaxonomy** (writer app, db_table=`igny8_content_taxonomies`): +- Stores taxonomy term references synced from WordPress +- Fields: `name`, `slug`, `external_id` (WP term ID), `taxonomy_type` (category/tag/product_cat/product_tag/attribute) +- No content generation — terms are metadata only (name + slug + external reference) + +**ContentTaxonomyRelation** (writer app): +- Links `Content` to `ContentTaxonomy` (many-to-many through table) +- Allows assigning existing taxonomy terms to content pieces + +**Content Model** (writer app, db_table=`igny8_content`): +- `content_type='taxonomy'` exists in CONTENT_TYPE_CHOICES but is unused by the generation pipeline +- CONTENT_STRUCTURE_CHOICES includes `category_archive`, `tag_archive`, `attribute_archive` +- `taxonomy_terms` ManyToManyField through ContentTaxonomyRelation + +**Tasks Model** (writer app, db_table=`igny8_tasks`): +- `taxonomy_term` ForeignKey to ContentTaxonomy (nullable, db_column='taxonomy_id') +- Not used by automation pipeline — present as a field only + +**SiteIntegration** (integration app): +- WordPress connections exist via `SiteIntegration` model +- `SyncEvent` logs operations but taxonomy sync is stubbed/incomplete + +### What Doesn't Exist +- No content generation for taxonomy terms (categories, tags, attributes) +- No cluster mapping for taxonomy terms +- No WordPress → IGNY8 taxonomy sync (full fetch and reconcile) +- No IGNY8 → WordPress term content push +- No AI function for term content generation +- No admin interface for managing term-to-cluster mapping + +--- + +## 2. WHAT TO BUILD + +### Overview +Make taxonomy terms first-class SEO content pages by: +1. **Syncing terms from WordPress** — fetch all categories, tags, WooCommerce taxonomies +2. **Mapping terms to clusters** — automatic keyword-overlap + semantic matching +3. **Generating rich content** — AI-generated landing page content for each term +4. **Pushing content back** — sync generated content to WordPress term descriptions + meta + +### Taxonomy Sync (WordPress → IGNY8) + +Full bidirectional sync leveraging existing `SiteIntegration`: + +**Fetch targets:** +- WordPress categories (`taxonomy_type='category'`) +- WordPress tags (`taxonomy_type='tag'`) +- WooCommerce product categories (`taxonomy_type='product_cat'`) +- WooCommerce product tags (`taxonomy_type='product_tag'`) +- WooCommerce product attributes (`taxonomy_type='attribute'`, e.g., `pa_color`, `pa_size`) + +**Sync logic:** +1. Use existing `SiteIntegration.credentials_json` to authenticate WP REST API +2. Fetch all terms via `GET /wp-json/wp/v2/categories`, `/tags`, `/product_cat`, etc. +3. Reconcile: create new `ContentTaxonomy` records, update changed ones, flag deleted +4. Store parent/child hierarchy for categories +5. Log sync as `SyncEvent` with `event_type='metadata_sync'` + +### Cluster Mapping Service + +A shared service (`cluster_mapping_service.py`) that maps taxonomy terms to keyword clusters: + +**Algorithm:** +| Factor | Weight | Method | +|--------|--------|--------| +| Keyword overlap | 40% | Compare term name + slug against cluster keywords | +| Semantic similarity | 40% | Embedding-based cosine similarity (term name vs cluster description) | +| Title match | 20% | Exact/partial match of term name in cluster name | + +**Output per term:** +- `primary_cluster_id` — best-match cluster +- `secondary_cluster_ids` — additional related clusters (up to 3) +- `mapping_confidence` — 0.0 to 1.0 score +- `mapping_status`: + - `auto_mapped` (confidence ≥ 0.6) — assigned automatically + - `suggested` (confidence 0.3–0.6) — suggested for manual review + - `unmapped` (confidence < 0.3) — no good match found + +### Term Content Generation + +Each taxonomy term gets rich, SEO-optimized content: + +**Generated sections:** +1. **H1 Title** — optimized for the term + primary cluster keywords +2. **Rich description** — 500–1,500 words covering the topic +3. **FAQ section** — 5–8 questions and answers +4. **Related terms** — links to sibling/child terms +5. **Meta title** — 50–60 characters +6. **Meta description** — 150–160 characters + +**AI function:** `GenerateTermContentFunction(BaseAIFunction)`: +- Input: term name, taxonomy_type, assigned cluster keywords, existing content titles under term, parent/sibling terms for context +- Output: structured JSON with sections (intro, overview, FAQ, related) +- Uses `ContentTypeTemplate` from 02A where `content_type='taxonomy'` + +### Term Content Sync (IGNY8 → WordPress) + +Push generated content to WordPress: +- Custom WP REST endpoint: `POST /wp-json/igny8/v1/terms/{id}/content` +- Stores in WordPress term meta: + - `_igny8_term_content` — HTML content + - `_igny8_term_faq` — JSON FAQ array + - `_igny8_term_meta_title` — SEO title + - `_igny8_term_meta_description` — SEO description +- Updates native WordPress term description with the generated content +- Schema: CollectionPage with itemListElement for listed content + +--- + +## 3. DATA MODELS & APIs + +### Modified Models + +**ContentTaxonomy** (db_table=`igny8_content_taxonomies`) — add fields: +```python +# Cluster mapping +cluster = models.ForeignKey( + 'planner.Clusters', on_delete=models.SET_NULL, + null=True, blank=True, related_name='taxonomy_terms', + help_text="Primary cluster this term maps to" +) +secondary_cluster_ids = models.JSONField( + default=list, blank=True, + help_text="Additional related cluster IDs" +) +mapping_confidence = models.FloatField( + default=0.0, + help_text="Cluster mapping confidence score 0.0-1.0" +) +mapping_status = models.CharField( + max_length=20, default='unmapped', + choices=[ + ('auto_mapped', 'Auto Mapped'), + ('manual_mapped', 'Manual Mapped'), + ('suggested', 'Suggested'), + ('unmapped', 'Unmapped'), + ], + db_index=True +) + +# Generated content +term_content = models.TextField( + blank=True, default='', + help_text="Generated rich HTML content for the term page" +) +term_faq = models.JSONField( + default=list, blank=True, + help_text="Generated FAQ: [{question, answer}]" +) +meta_title = models.CharField(max_length=255, blank=True, default='') +meta_description = models.TextField(blank=True, default='') +content_status = models.CharField( + max_length=20, default='none', + choices=[ + ('none', 'No Content'), + ('generating', 'Generating'), + ('generated', 'Generated'), + ('published', 'Published to WP'), + ], + db_index=True +) + +# Hierarchy +parent_term = models.ForeignKey( + 'self', on_delete=models.SET_NULL, + null=True, blank=True, related_name='child_terms' +) +term_count = models.IntegerField( + default=0, + help_text="Number of posts/products using this term" +) + +# Sync tracking +last_synced_from_wp = models.DateTimeField(null=True, blank=True) +last_pushed_to_wp = models.DateTimeField(null=True, blank=True) +``` + +### New AI Function + +```python +# igny8_core/ai/functions/generate_term_content.py + +class GenerateTermContentFunction(BaseAIFunction): + """Generate rich SEO content for taxonomy terms.""" + + def get_name(self) -> str: + return 'generate_term_content' + + def get_metadata(self) -> Dict: + return { + 'display_name': 'Generate Term Content', + 'description': 'Generate rich landing page content for taxonomy terms', + 'phases': { + 'INIT': 'Initializing...', + 'PREP': 'Loading term and cluster data...', + 'AI_CALL': 'Generating term content...', + 'PARSE': 'Parsing response...', + 'SAVE': 'Saving term content...', + 'DONE': 'Complete!' + } + } + + def get_max_items(self) -> int: + return 10 # Process up to 10 terms per batch + + def validate(self, payload: dict, account=None) -> Dict: + term_ids = payload.get('ids', []) + if not term_ids: + return {'valid': False, 'error': 'No term IDs provided'} + return {'valid': True} + + def prepare(self, payload: dict, account=None) -> List: + term_ids = payload.get('ids', []) + terms = ContentTaxonomy.objects.filter( + id__in=term_ids, + account=account + ).select_related('cluster', 'parent_term') + return list(terms) + + def build_prompt(self, data: Any, account=None) -> str: + term = data # Single term + # Build context: cluster keywords, existing content, siblings + cluster_keywords = [] + if term.cluster: + cluster_keywords = list( + term.cluster.keywords.values_list('keyword', flat=True)[:20] + ) + sibling_terms = list( + ContentTaxonomy.objects.filter( + taxonomy_type=term.taxonomy_type, + site=term.site, + parent_term=term.parent_term + ).exclude(id=term.id).values_list('name', flat=True)[:10] + ) + # Use ContentTypeTemplate from 02A if available + # Fall back to default term prompt + return self._build_term_prompt(term, cluster_keywords, sibling_terms) + + def parse_response(self, response: str, step_tracker=None) -> Dict: + # Parse structured JSON: {content_html, faq, meta_title, meta_description} + pass + + def save_output(self, parsed, original_data, account=None, **kwargs) -> Dict: + term = original_data + term.term_content = parsed.get('content_html', '') + term.term_faq = parsed.get('faq', []) + term.meta_title = parsed.get('meta_title', '') + term.meta_description = parsed.get('meta_description', '') + term.content_status = 'generated' + term.save() + return {'count': 1, 'items_updated': [term.id]} +``` + +Register in `igny8_core/ai/registry.py`: +```python +register_lazy_function('generate_term_content', lambda: GenerateTermContentFunction) +``` + +### New Service + +```python +# igny8_core/business/content/cluster_mapping_service.py + +class ClusterMappingService: + """Maps taxonomy terms to keyword clusters using multi-factor scoring.""" + + KEYWORD_OVERLAP_WEIGHT = 0.4 + SEMANTIC_SIMILARITY_WEIGHT = 0.4 + TITLE_MATCH_WEIGHT = 0.2 + AUTO_MAP_THRESHOLD = 0.6 + SUGGEST_THRESHOLD = 0.3 + + def map_terms_to_clusters(self, site_id: int, account_id: int) -> Dict: + """ + Map all unmapped ContentTaxonomy terms to Clusters for a site. + Returns: {mapped: int, suggested: int, unmapped: int} + """ + pass + + def map_single_term(self, term: ContentTaxonomy) -> Dict: + """ + Map a single term. Returns: + {cluster_id, secondary_ids, confidence, status} + """ + pass + + def _keyword_overlap_score(self, term_name: str, cluster_keywords: list) -> float: + pass + + def _semantic_similarity_score(self, term_name: str, cluster_description: str) -> float: + pass + + def _title_match_score(self, term_name: str, cluster_name: str) -> float: + pass +``` + +### New Celery Tasks + +```python +# igny8_core/tasks/taxonomy_tasks.py + +@shared_task(bind=True, max_retries=3, default_retry_delay=60) +def sync_taxonomy_from_wordpress(self, site_id: int, account_id: int): + """Fetch all taxonomy terms from WordPress and reconcile with ContentTaxonomy.""" + pass + +@shared_task(bind=True, max_retries=3, default_retry_delay=60) +def map_terms_to_clusters(self, site_id: int, account_id: int): + """Run cluster mapping on all unmapped terms for a site.""" + pass + +@shared_task(bind=True, max_retries=3, default_retry_delay=60) +def generate_term_content_task(self, term_ids: list, account_id: int): + """Generate content for a batch of taxonomy terms.""" + pass + +@shared_task(bind=True, max_retries=3, default_retry_delay=60) +def push_term_content_to_wordpress(self, term_id: int, account_id: int): + """Push generated term content to WordPress via REST API.""" + pass +``` + +### Migration + +``` +igny8_core/migrations/XXXX_taxonomy_term_content.py +``` + +Fields added to `ContentTaxonomy`: +1. `cluster` — ForeignKey to Clusters (nullable) +2. `secondary_cluster_ids` — JSONField +3. `mapping_confidence` — FloatField +4. `mapping_status` — CharField +5. `term_content` — TextField +6. `term_faq` — JSONField +7. `meta_title` — CharField +8. `meta_description` — TextField +9. `content_status` — CharField +10. `parent_term` — ForeignKey to self (nullable) +11. `term_count` — IntegerField +12. `last_synced_from_wp` — DateTimeField (nullable) +13. `last_pushed_to_wp` — DateTimeField (nullable) + +### API Endpoints + +``` +# Taxonomy Term Management +GET /api/v1/writer/taxonomy/terms/ # List terms with mapping status (filterable) +GET /api/v1/writer/taxonomy/terms/{id}/ # Term detail +GET /api/v1/writer/taxonomy/terms/unmapped/ # Terms needing cluster assignment +GET /api/v1/writer/taxonomy/terms/stats/ # Summary: mapped/unmapped/generated/published counts + +# WordPress Sync +POST /api/v1/writer/taxonomy/terms/sync/ # Trigger WP → IGNY8 sync +GET /api/v1/writer/taxonomy/terms/sync/status/ # Last sync time + status + +# Cluster Mapping +POST /api/v1/writer/taxonomy/terms/{id}/map-cluster/ # Manual cluster assignment +POST /api/v1/writer/taxonomy/terms/auto-map/ # Run auto-mapping for all unmapped terms +GET /api/v1/writer/taxonomy/terms/{id}/cluster-suggestions/ # Get AI cluster suggestions for a term + +# Content Generation +POST /api/v1/writer/taxonomy/terms/create-tasks/ # Bulk create generation tasks for selected terms +POST /api/v1/writer/taxonomy/terms/{id}/generate/ # Generate content for single term +POST /api/v1/writer/taxonomy/terms/generate-bulk/ # Generate content for multiple terms + +# Publishing to WordPress +POST /api/v1/writer/taxonomy/terms/{id}/publish/ # Push single term content to WP +POST /api/v1/writer/taxonomy/terms/publish-bulk/ # Push multiple terms to WP +``` + +**ViewSet:** +```python +# igny8_core/modules/writer/views/taxonomy_term_views.py +class TaxonomyTermViewSet(SiteSectorModelViewSet): + serializer_class = TaxonomyTermSerializer + queryset = ContentTaxonomy.objects.all() + filterset_fields = ['taxonomy_type', 'mapping_status', 'content_status', 'site'] + + @action(detail=False, methods=['get']) + def unmapped(self, request): + qs = self.get_queryset().filter(mapping_status='unmapped') + return self.paginate_and_respond(qs) + + @action(detail=False, methods=['get']) + def stats(self, request): + site_id = request.query_params.get('site_id') + qs = self.get_queryset().filter(site_id=site_id) + return Response({ + 'total': qs.count(), + 'mapped': qs.filter(mapping_status__in=['auto_mapped', 'manual_mapped']).count(), + 'suggested': qs.filter(mapping_status='suggested').count(), + 'unmapped': qs.filter(mapping_status='unmapped').count(), + 'content_generated': qs.filter(content_status='generated').count(), + 'content_published': qs.filter(content_status='published').count(), + }) + + @action(detail=False, methods=['post']) + def sync(self, request): + site_id = request.data.get('site_id') + sync_taxonomy_from_wordpress.delay(site_id, request.account.id) + return Response({'message': 'Taxonomy sync started'}) + + @action(detail=True, methods=['post'], url_path='map-cluster') + def map_cluster(self, request, pk=None): + term = self.get_object() + cluster_id = request.data.get('cluster_id') + term.cluster_id = cluster_id + term.mapping_status = 'manual_mapped' + term.mapping_confidence = 1.0 + term.save() + return Response(TaxonomyTermSerializer(term).data) + + @action(detail=False, methods=['post'], url_path='auto-map') + def auto_map(self, request): + site_id = request.data.get('site_id') + map_terms_to_clusters.delay(site_id, request.account.id) + return Response({'message': 'Auto-mapping started'}) + + @action(detail=True, methods=['get'], url_path='cluster-suggestions') + def cluster_suggestions(self, request, pk=None): + term = self.get_object() + service = ClusterMappingService() + suggestions = service.get_suggestions(term, top_n=5) + return Response({'suggestions': suggestions}) + + @action(detail=True, methods=['post']) + def generate(self, request, pk=None): + term = self.get_object() + generate_term_content_task.delay([term.id], request.account.id) + return Response({'message': 'Content generation started'}) + + @action(detail=True, methods=['post']) + def publish(self, request, pk=None): + term = self.get_object() + push_term_content_to_wordpress.delay(term.id, request.account.id) + return Response({'message': 'Publishing to WordPress started'}) +``` + +**URL Registration:** +```python +# igny8_core/modules/writer/urls.py — add to existing router +router.register('taxonomy/terms', TaxonomyTermViewSet, basename='taxonomy-term') +``` + +### Credit Costs + +| Operation | Credits | Via | +|-----------|---------|-----| +| Taxonomy sync (WordPress → IGNY8) | 1 per batch | CreditCostConfig: `taxonomy_sync` | +| Term content generation | 4–6 per term | CreditCostConfig: `term_content_generation` | +| Term content optimization | 3–5 per term | CreditCostConfig: `term_content_optimization` | + +Add to `CreditCostConfig`: +```python +CreditCostConfig.objects.get_or_create( + operation_type='taxonomy_sync', + defaults={'display_name': 'Taxonomy Sync', 'base_credits': 1} +) +CreditCostConfig.objects.get_or_create( + operation_type='term_content_generation', + defaults={'display_name': 'Term Content Generation', 'base_credits': 5} +) +``` + +Add to `CreditUsageLog.OPERATION_TYPE_CHOICES`: +```python +('taxonomy_sync', 'Taxonomy Sync'), +('term_content_generation', 'Term Content Generation'), +``` + +--- + +## 4. IMPLEMENTATION STEPS + +### Step 1: Add Fields to ContentTaxonomy +File to modify: +- `backend/igny8_core/business/content/models.py` (or wherever ContentTaxonomy is defined) +- Add all 13 new fields listed in migration section + +### Step 2: Create and Run Migration +```bash +cd /data/app/igny8/backend +python manage.py makemigrations --name taxonomy_term_content +python manage.py migrate +``` + +### Step 3: Build ClusterMappingService +File to create: +- `backend/igny8_core/business/content/cluster_mapping_service.py` + +### Step 4: Create GenerateTermContentFunction +File to create: +- `backend/igny8_core/ai/functions/generate_term_content.py` + +Register in: +- `backend/igny8_core/ai/registry.py` + +### Step 5: Create Celery Tasks +File to create: +- `backend/igny8_core/tasks/taxonomy_tasks.py` + +Register in Celery beat schedule (optional — these are primarily on-demand): +- `sync_taxonomy_from_wordpress` — can be periodic (weekly) or on-demand + +### Step 6: Add Credit Cost Entries +Add `taxonomy_sync` and `term_content_generation` to: +- `CreditCostConfig` seed data +- `CreditUsageLog.OPERATION_TYPE_CHOICES` + +### Step 7: Build Serializers +File to create: +- `backend/igny8_core/modules/writer/serializers/taxonomy_term_serializer.py` + +### Step 8: Build ViewSet and URLs +File to create: +- `backend/igny8_core/modules/writer/views/taxonomy_term_views.py` + +Modify: +- `backend/igny8_core/modules/writer/urls.py` + +### Step 9: Frontend +Files to create/modify in `frontend/src/`: +- `pages/Writer/TaxonomyTerms.tsx` — term list with mapping status indicators +- `pages/Writer/TaxonomyTermDetail.tsx` — term detail with generated content preview +- `components/Writer/ClusterMappingPanel.tsx` — cluster assignment/suggestion UI +- `stores/taxonomyTermStore.ts` — Zustand store +- `api/taxonomyTerms.ts` — API client + +### Step 10: Tests +```bash +cd /data/app/igny8/backend +python manage.py test igny8_core.business.content.tests.test_cluster_mapping +python manage.py test igny8_core.ai.tests.test_generate_term_content +python manage.py test igny8_core.modules.writer.tests.test_taxonomy_term_views +``` + +--- + +## 5. ACCEPTANCE CRITERIA + +- [ ] All 13 new fields on ContentTaxonomy migrate successfully +- [ ] `GenerateTermContentFunction` registered in AI function registry +- [ ] WordPress → IGNY8 taxonomy sync fetches categories, tags, WooCommerce taxonomies +- [ ] Sync creates/updates ContentTaxonomy records with correct taxonomy_type +- [ ] Parent/child hierarchy preserved via parent_term FK +- [ ] SyncEvent logged with event_type='metadata_sync' after each sync operation +- [ ] ClusterMappingService maps terms with confidence scores +- [ ] Terms with confidence ≥ 0.6 auto-mapped, 0.3–0.6 suggested, < 0.3 unmapped +- [ ] Manual cluster assignment sets mapping_status='manual_mapped' with confidence=1.0 +- [ ] Term content generation produces: content_html, FAQ, meta_title, meta_description +- [ ] content_status transitions: none → generating → generated → published +- [ ] Publishing pushes content to WordPress via `POST /wp-json/igny8/v1/terms/{id}/content` +- [ ] All API endpoints require authentication and enforce account isolation +- [ ] Frontend term list shows mapping status badges (mapped/suggested/unmapped) +- [ ] Frontend supports manual cluster assignment from suggestion list +- [ ] Credit deduction works for taxonomy_sync and term_content_generation operations +- [ ] Backward compatible — existing ContentTaxonomy records unaffected (new fields nullable/defaulted) + +--- + +## 6. CLAUDE CODE INSTRUCTIONS + +### Execution Order +1. Read `backend/igny8_core/business/content/models.py` — find ContentTaxonomy and ContentTaxonomyRelation +2. Read `backend/igny8_core/business/planning/models.py` — understand Clusters model for FK reference +3. Read `backend/igny8_core/ai/functions/generate_content.py` — reference pattern for new AI function +4. Read `backend/igny8_core/ai/registry.py` — understand registration pattern +5. Add fields to ContentTaxonomy model +6. Create migration and run it +7. Build ClusterMappingService +8. Build GenerateTermContentFunction + register it +9. Build Celery tasks +10. Build serializers, ViewSet, URLs +11. Build frontend components + +### Key Constraints +- ALL primary keys are `BigAutoField` (integer). No UUIDs. +- Model class names PLURAL: `Clusters`, `Keywords`, `Tasks`, `ContentIdeas`, `Images`. `Content` stays singular. `ContentTaxonomy` stays singular. +- Frontend: `.tsx` files, Zustand stores, Vitest testing +- Celery app name: `igny8_core` +- All new db_tables use `igny8_` prefix +- Follow existing ViewSet pattern: `SiteSectorModelViewSet` for site-scoped resources +- AI functions follow `BaseAIFunction` pattern with lazy registry + +### File Tree (New/Modified) +``` +backend/igny8_core/ +├── business/content/ +│ ├── models.py # MODIFY: add fields to ContentTaxonomy +│ └── cluster_mapping_service.py # NEW: ClusterMappingService +├── ai/functions/ +│ └── generate_term_content.py # NEW: GenerateTermContentFunction +├── ai/ +│ └── registry.py # MODIFY: register generate_term_content +├── tasks/ +│ └── taxonomy_tasks.py # NEW: sync, map, generate, publish tasks +├── modules/writer/ +│ ├── serializers/ +│ │ └── taxonomy_term_serializer.py # NEW +│ ├── views/ +│ │ └── taxonomy_term_views.py # NEW +│ └── urls.py # MODIFY: register taxonomy/terms route +├── migrations/ +│ └── XXXX_taxonomy_term_content.py # NEW: auto-generated + +frontend/src/ +├── pages/Writer/ +│ ├── TaxonomyTerms.tsx # NEW: term list page +│ └── TaxonomyTermDetail.tsx # NEW: term detail + content preview +├── components/Writer/ +│ └── ClusterMappingPanel.tsx # NEW: cluster assignment UI +├── stores/ +│ └── taxonomyTermStore.ts # NEW: Zustand store +├── api/ +│ └── taxonomyTerms.ts # NEW: API client +``` + +### Cross-References +- **02A** (content types extension): ContentTypeTemplate for content_type='taxonomy' provides prompt template +- **01A** (SAG data foundation): SAGAttribute → taxonomy mapping context +- **01D** (setup wizard): wizard creates initial taxonomy plan used for cluster mapping +- **03B** (WP plugin connected): connected plugin receives term content via REST endpoint +- **03C** (companion theme): theme renders term landing pages using pushed content diff --git a/v2/V2-Execution-Docs/02C-gsc-integration.md b/v2/V2-Execution-Docs/02C-gsc-integration.md new file mode 100644 index 00000000..d4ba638b --- /dev/null +++ b/v2/V2-Execution-Docs/02C-gsc-integration.md @@ -0,0 +1,774 @@ +# IGNY8 Phase 2: GSC Integration (02C) +## Google Search Console — Indexing, Inspection & Analytics + +**Document Version:** 1.0 +**Date:** 2026-03-23 +**Phase:** IGNY8 Phase 2 — Feature Expansion +**Status:** Build Ready +**Source of Truth:** Codebase at `/data/app/igny8/` +**Audience:** Claude Code, Backend Developers, Architects + +--- + +## 1. CURRENT STATE + +### Existing Integration Infrastructure +- `SiteIntegration` model (db_table=`igny8_site_integrations`) stores WordPress connections with `platform='wordpress'` +- `SyncEvent` model (db_table=`igny8_sync_events`) logs publish/sync operations +- Integration app registered at `/api/v1/integration/` +- No Google API connections of any kind exist +- No OAuth 2.0 infrastructure for third-party APIs +- `IntegrationProvider` model supports `provider_type`: ai/payment/email/storage — no `search_engine` type yet + +### Content Publishing Flow +- When `Content.site_status` changes to `published`, a `PublishingRecord` is created +- Content gets `external_url` and `external_id` after WordPress publish +- No automatic indexing request after publish +- No tracking of whether published URLs are indexed by Google + +### What Doesn't Exist +- Google Search Console OAuth connection +- URL Inspection API integration +- Indexing queue with priority and quota management +- Search analytics data collection/dashboard +- Re-inspection scheduling +- Plugin-side index status display +- Any GSC-related models, endpoints, or tasks + +--- + +## 2. WHAT TO BUILD + +### Overview +Full Google Search Console integration with four capabilities: +1. **OAuth Connection** — connect GSC property via Google OAuth 2.0 +2. **URL Inspection** — inspect URLs via Google's URL Inspection API (2K/day quota), auto-inspect after publish +3. **Indexing Queue** — priority-based queue with quota management and re-inspection scheduling +4. **Search Analytics** — fetch and cache search performance data (clicks, impressions, CTR, position) + +### OAuth 2.0 Connection Flow + +``` +User clicks "Connect GSC" → + IGNY8 backend generates Google OAuth URL → + User authorizes in Google consent screen → + Google redirects to /api/v1/integration/gsc/callback/ → + Backend stores encrypted access_token + refresh_token → + Backend fetches user's GSC properties → + User selects property → GSCConnection created +``` + +**Google Cloud project requirements:** +- Search Console API enabled +- URL Inspection API enabled (separate from Search Console API) +- OAuth 2.0 client ID (Web application type) +- Scopes: `https://www.googleapis.com/auth/webmasters.readonly`, `https://www.googleapis.com/auth/indexing` +- Redirect URI: `https://{domain}/api/v1/integration/gsc/callback/` + +**Token management:** +- Access tokens expire after 1 hour → background refresh via Celery task +- Refresh tokens stored encrypted in `GSCConnection.refresh_token` +- If refresh fails → set `GSCConnection.status='expired'`, notify user + +### URL Inspection API + +**Google API endpoint:** +``` +POST https://searchconsole.googleapis.com/v1/urlInspection/index:inspect +Body: {"inspectionUrl": "https://example.com/page", "siteUrl": "sc-domain:example.com"} +``` + +**Response fields tracked:** +| Field | Type | Stored In | +|-------|------|-----------| +| `verdict` | PASS/PARTIAL/FAIL/NEUTRAL | URLInspectionRecord.verdict | +| `coverageState` | e.g., "Submitted and indexed" | URLInspectionRecord.coverage_state | +| `indexingState` | e.g., "INDEXING_ALLOWED" | URLInspectionRecord.indexing_state | +| `robotsTxtState` | e.g., "ALLOWED" | URLInspectionRecord.last_inspection_result (JSON) | +| `lastCrawlTime` | ISO datetime | URLInspectionRecord.last_crawled | +| Full response | JSON | URLInspectionRecord.last_inspection_result | + +**Quota:** 2,000 inspections per day per GSC property (resets midnight Pacific Time) +**Rate limit:** 1 request per 3 seconds (safe limit: 600 per 30 minutes) + +### Indexing Queue System + +Priority-based queue that respects daily quota: + +| Priority | Trigger | Description | +|----------|---------|-------------| +| 100 | Content published (auto) | Newly published content auto-queued | +| 90 | Re-inspection (auto) | Scheduled follow-up check | +| 70 | Manual inspect request | User requests specific URL inspection | +| 50 | Information query | Check status only, no submit | +| 30 | Scheduled bulk re-inspect | Periodic re-check of all URLs | + +**Queue processing:** +- Celery task runs every 5 minutes +- Checks `GSCDailyQuota` for remaining capacity +- Processes items in priority order (highest first) +- Respects 1 request/3 second rate limit +- Status flow: `queued → processing → completed/failed/quota_exceeded` + +### Re-Inspection Schedule + +After initial inspection, automatically schedule follow-up checks: + +| Check | Timing | Purpose | +|-------|--------|---------| +| Check 1 | 24 hours after submission | Quick verification | +| Check 2 | 3 days after | Give Google time to crawl | +| Check 3 | 6 days after | Most URLs indexed by now | +| Check 4 | 13 days after | Final automatic check | + +If still not indexed after Check 4 → mark `status='manual_review'`, stop auto-checking. + +### Search Analytics + +**Google API endpoint:** +``` +POST https://searchconsole.googleapis.com/v3/sites/{siteUrl}/searchAnalytics/query +Body: { + "startDate": "2025-01-01", + "endDate": "2025-03-23", + "dimensions": ["page", "query", "date"], + "rowLimit": 25000 +} +``` + +**Metrics collected:** clicks, impressions, ctr, position +**Dimensions:** page, query (keyword), country, device, date +**Date range:** up to 16 months historical +**Caching:** Results cached in `GSCMetricsCache` with 24-hour TTL, refreshed daily via Celery + +### Auto-Indexing After Publish + +When `Content.site_status` changes to `'published'` and the content has an `external_url`: +1. Check if `GSCConnection` exists for the site with status='active' +2. Create or update `URLInspectionRecord` for the URL +3. Add to `IndexingQueue` with priority=100 +4. If SAG data available: hub pages get inspected before supporting articles (blueprint-aware priority) + +### Plugin-Side Status Sync + +IGNY8 pushes index statuses to the WordPress plugin: +- Endpoint: `POST /wp-json/igny8/v1/gsc/status-sync` +- Payload: `{urls: [{url, status, verdict, last_inspected}]}` +- Plugin displays status badges on WP post list table: + - ⏳ `pending_inspection` + - ✓ `indexed` + - ✗ `not_indexed` + - ➡ `indexing_requested` + - 🚫 `error_noindex` + +--- + +## 3. DATA MODELS & APIs + +### New Models (integration app) + +```python +class GSCConnection(AccountBaseModel): + """Google Search Console OAuth connection per site.""" + site = models.ForeignKey( + 'igny8_core_auth.Site', on_delete=models.CASCADE, + related_name='gsc_connections' + ) + google_email = models.CharField(max_length=255) + access_token = models.TextField(help_text="Encrypted OAuth access token") + refresh_token = models.TextField(help_text="Encrypted OAuth refresh token") + token_expiry = models.DateTimeField() + gsc_property_url = models.CharField( + max_length=500, + help_text="GSC property URL, e.g., sc-domain:example.com" + ) + status = models.CharField( + max_length=20, default='active', + choices=[ + ('active', 'Active'), + ('expired', 'Token Expired'), + ('revoked', 'Access Revoked'), + ], + db_index=True + ) + + class Meta: + app_label = 'integration' + db_table = 'igny8_gsc_connections' + unique_together = [['site', 'gsc_property_url']] + + +class URLInspectionRecord(SiteSectorBaseModel): + """Tracks URL inspection history and indexing status.""" + url = models.URLField(max_length=2000, db_index=True) + content = models.ForeignKey( + 'writer.Content', on_delete=models.SET_NULL, + null=True, blank=True, related_name='inspection_records', + help_text="Linked IGNY8 content (null for external/non-IGNY8 URLs)" + ) + last_inspection_result = models.JSONField( + default=dict, help_text="Full Google API response" + ) + verdict = models.CharField( + max_length=20, blank=True, default='', + help_text="PASS/PARTIAL/FAIL/NEUTRAL" + ) + coverage_state = models.CharField(max_length=100, blank=True, default='') + indexing_state = models.CharField(max_length=100, blank=True, default='') + last_crawled = models.DateTimeField(null=True, blank=True) + last_inspected = models.DateTimeField(null=True, blank=True) + inspection_count = models.IntegerField(default=0) + next_inspection = models.DateTimeField( + null=True, blank=True, + help_text="Scheduled next re-inspection datetime" + ) + status = models.CharField( + max_length=30, default='pending_inspection', + choices=[ + ('pending_inspection', 'Pending Inspection'), + ('indexed', 'Indexed'), + ('not_indexed', 'Not Indexed'), + ('indexing_requested', 'Indexing Requested'), + ('error_noindex', 'Error / No Index'), + ('manual_review', 'Manual Review Needed'), + ], + db_index=True + ) + + class Meta: + app_label = 'integration' + db_table = 'igny8_url_inspection_records' + unique_together = [['site', 'url']] + ordering = ['-last_inspected'] + + +class IndexingQueue(SiteSectorBaseModel): + """Priority queue for URL inspection API requests.""" + url = models.URLField(max_length=2000) + url_inspection_record = models.ForeignKey( + URLInspectionRecord, on_delete=models.SET_NULL, + null=True, blank=True, related_name='queue_entries' + ) + priority = models.IntegerField( + default=50, db_index=True, + help_text="100=auto-publish, 90=re-inspect, 70=manual, 50=info, 30=bulk" + ) + status = models.CharField( + max_length=20, default='queued', + choices=[ + ('queued', 'Queued'), + ('processing', 'Processing'), + ('completed', 'Completed'), + ('failed', 'Failed'), + ('quota_exceeded', 'Quota Exceeded'), + ], + db_index=True + ) + date_added = models.DateTimeField(auto_now_add=True) + date_processed = models.DateTimeField(null=True, blank=True) + error_message = models.TextField(blank=True, default='') + + class Meta: + app_label = 'integration' + db_table = 'igny8_indexing_queue' + ordering = ['-priority', 'date_added'] + + +class GSCMetricsCache(SiteSectorBaseModel): + """Cached search analytics data from GSC API.""" + metric_type = models.CharField( + max_length=50, db_index=True, + choices=[ + ('search_analytics', 'Search Analytics'), + ('page_performance', 'Page Performance'), + ('keyword_performance', 'Keyword Performance'), + ] + ) + dimension_filters = models.JSONField( + default=dict, + help_text="Filters used for this query: {dimensions, filters}" + ) + data = models.JSONField( + default=list, + help_text="Cached query results" + ) + date_range_start = models.DateField() + date_range_end = models.DateField() + expires_at = models.DateTimeField( + help_text="Cache expiry — refresh after this time" + ) + + class Meta: + app_label = 'integration' + db_table = 'igny8_gsc_metrics_cache' + ordering = ['-date_range_end'] + + +class GSCDailyQuota(SiteSectorBaseModel): + """Tracks daily URL Inspection API usage per site/property.""" + date = models.DateField(db_index=True) + inspections_used = models.IntegerField(default=0) + quota_limit = models.IntegerField(default=2000) + + class Meta: + app_label = 'integration' + db_table = 'igny8_gsc_daily_quota' + unique_together = [['site', 'date']] +``` + +### Migration + +``` +igny8_core/migrations/XXXX_gsc_integration.py +``` + +New tables: +1. `igny8_gsc_connections` +2. `igny8_url_inspection_records` +3. `igny8_indexing_queue` +4. `igny8_gsc_metrics_cache` +5. `igny8_gsc_daily_quota` + +### API Endpoints + +``` +# OAuth Connection +POST /api/v1/integration/gsc/connect/ # Initiate OAuth (returns redirect URL) +GET /api/v1/integration/gsc/callback/ # OAuth callback (stores tokens) +DELETE /api/v1/integration/gsc/disconnect/ # Revoke + delete connection +GET /api/v1/integration/gsc/properties/ # List connected GSC properties +GET /api/v1/integration/gsc/status/ # Connection status + +# Quota +GET /api/v1/integration/gsc/quota/ # Today's quota usage (used/limit) + +# URL Inspection +POST /api/v1/integration/gsc/inspect/ # Queue single URL for inspection +POST /api/v1/integration/gsc/inspect/bulk/ # Queue multiple URLs +GET /api/v1/integration/gsc/inspections/ # List inspection records (filterable) +GET /api/v1/integration/gsc/inspections/{id}/ # Single inspection detail + +# Search Analytics +GET /api/v1/integration/gsc/analytics/ # Search analytics (cached) +GET /api/v1/integration/gsc/analytics/keywords/ # Keyword performance +GET /api/v1/integration/gsc/analytics/pages/ # Page performance +GET /api/v1/integration/gsc/analytics/export/ # CSV export + +# Queue Management (admin) +GET /api/v1/integration/gsc/queue/ # View queue status +POST /api/v1/integration/gsc/queue/clear/ # Clear failed/quota_exceeded items +``` + +### Services + +```python +# igny8_core/business/integration/gsc_service.py + +class GSCService: + """Google Search Console API client wrapper.""" + + def get_oauth_url(self, site_id: int, redirect_uri: str) -> str: + """Generate Google OAuth consent URL.""" + pass + + def handle_oauth_callback(self, code: str, site_id: int, account_id: int) -> GSCConnection: + """Exchange auth code for tokens, create GSCConnection.""" + pass + + def refresh_access_token(self, connection: GSCConnection) -> bool: + """Refresh expired access token using refresh_token.""" + pass + + def inspect_url(self, connection: GSCConnection, url: str) -> Dict: + """Call URL Inspection API. Returns parsed response.""" + pass + + def fetch_search_analytics( + self, connection: GSCConnection, + start_date: str, end_date: str, + dimensions: list, row_limit: int = 25000 + ) -> List[Dict]: + """Fetch search analytics data from GSC API.""" + pass + + def list_properties(self, connection: GSCConnection) -> List[str]: + """List all GSC properties accessible by the connected account.""" + pass + + +class IndexingQueueProcessor: + """Processes the indexing queue respecting quota and rate limits.""" + + RATE_LIMIT_SECONDS = 3 # 1 request per 3 seconds + QUOTA_BUFFER = 50 # Reserve 50 inspections for manual use + + def process_queue(self, site_id: int): + """Process queued items for a site, respecting daily quota.""" + quota = GSCDailyQuota.objects.get_or_create( + site_id=site_id, + date=timezone.now().date(), + defaults={'quota_limit': 2000} + )[0] + + remaining = quota.quota_limit - quota.inspections_used - self.QUOTA_BUFFER + if remaining <= 0: + return {'processed': 0, 'reason': 'quota_exceeded'} + + items = IndexingQueue.objects.filter( + site_id=site_id, + status='queued' + ).order_by('-priority', 'date_added')[:remaining] + + processed = 0 + for item in items: + item.status = 'processing' + item.save() + try: + result = self.gsc_service.inspect_url(connection, item.url) + self._update_inspection_record(item, result) + item.status = 'completed' + item.date_processed = timezone.now() + quota.inspections_used += 1 + processed += 1 + time.sleep(self.RATE_LIMIT_SECONDS) + except QuotaExceededException: + item.status = 'quota_exceeded' + break + except Exception as e: + item.status = 'failed' + item.error_message = str(e) + finally: + item.save() + quota.save() + + return {'processed': processed} + + def _update_inspection_record(self, queue_item, result): + """Create/update URLInspectionRecord from API result.""" + record, created = URLInspectionRecord.objects.update_or_create( + site=queue_item.site, + url=queue_item.url, + defaults={ + 'last_inspection_result': result, + 'verdict': result.get('inspectionResult', {}).get('indexStatusResult', {}).get('verdict', ''), + 'coverage_state': result.get('inspectionResult', {}).get('indexStatusResult', {}).get('coverageState', ''), + 'indexing_state': result.get('inspectionResult', {}).get('indexStatusResult', {}).get('indexingState', ''), + 'last_inspected': timezone.now(), + 'inspection_count': models.F('inspection_count') + 1, + } + ) + # Schedule re-inspection + self._schedule_next_inspection(record) + + def _schedule_next_inspection(self, record): + """Schedule follow-up inspection based on inspection count.""" + delays = {1: 1, 2: 3, 3: 6, 4: 13} # days after inspection + if record.inspection_count in delays: + record.next_inspection = timezone.now() + timedelta(days=delays[record.inspection_count]) + record.save() + elif record.inspection_count > 4 and record.verdict != 'PASS': + record.status = 'manual_review' + record.next_inspection = None + record.save() +``` + +### Celery Tasks + +```python +# igny8_core/tasks/gsc_tasks.py + +@shared_task(bind=True, max_retries=3, default_retry_delay=60) +def process_indexing_queue(self, site_id: int = None): + """Process pending indexing queue items. Runs every 5 minutes.""" + # If site_id provided, process that site only + # Otherwise, process all sites with active GSCConnection + pass + +@shared_task(bind=True, max_retries=3, default_retry_delay=300) +def refresh_gsc_tokens(self): + """Refresh expiring GSC OAuth tokens. Runs hourly.""" + expiring = GSCConnection.objects.filter( + status='active', + token_expiry__lte=timezone.now() + timedelta(minutes=10) + ) + for conn in expiring: + GSCService().refresh_access_token(conn) + +@shared_task(bind=True, max_retries=3, default_retry_delay=300) +def fetch_search_analytics(self): + """Fetch and cache search analytics for all connected sites. Runs daily.""" + pass + +@shared_task(bind=True, max_retries=3, default_retry_delay=60) +def schedule_reinspections(self): + """Add due re-inspections to the queue. Runs daily.""" + due = URLInspectionRecord.objects.filter( + next_inspection__lte=timezone.now(), + status__in=['not_indexed', 'indexing_requested'] + ) + for record in due: + IndexingQueue.objects.get_or_create( + site=record.site, + url=record.url, + status='queued', + defaults={'priority': 90, 'url_inspection_record': record} + ) + +@shared_task(bind=True) +def auto_queue_published_content(self, content_id: int): + """Queue newly published content for GSC inspection. Triggered by publish signal.""" + content = Content.objects.get(id=content_id) + if not content.external_url: + return + connection = GSCConnection.objects.filter( + site=content.site, status='active' + ).first() + if not connection: + return + record, _ = URLInspectionRecord.objects.get_or_create( + site=content.site, url=content.external_url, + defaults={'content': content, 'sector': content.sector, 'account': content.account} + ) + IndexingQueue.objects.create( + site=content.site, sector=content.sector, account=content.account, + url=content.external_url, url_inspection_record=record, + priority=100, status='queued' + ) +``` + +**Beat schedule additions** (add to `igny8_core/celery.py`): +```python +'process-indexing-queue': { + 'task': 'gsc.process_indexing_queue', + 'schedule': crontab(minute='*/5'), # Every 5 minutes +}, +'refresh-gsc-tokens': { + 'task': 'gsc.refresh_gsc_tokens', + 'schedule': crontab(minute=30), # Every hour at :30 +}, +'fetch-search-analytics': { + 'task': 'gsc.fetch_search_analytics', + 'schedule': crontab(hour=4, minute=0), # Daily at 4 AM +}, +'schedule-reinspections': { + 'task': 'gsc.schedule_reinspections', + 'schedule': crontab(hour=5, minute=0), # Daily at 5 AM +}, +``` + +### Auto-Indexing Signal + +Connect to Content model's post-publish flow: + +```python +# igny8_core/business/integration/signals.py + +from django.db.models.signals import post_save +from django.dispatch import receiver + +@receiver(post_save, sender='writer.Content') +def queue_for_gsc_inspection(sender, instance, **kwargs): + """When content is published, auto-queue for GSC inspection.""" + if instance.site_status == 'published' and instance.external_url: + auto_queue_published_content.delay(instance.id) +``` + +### Credit Costs + +| Operation | Credits | Notes | +|-----------|---------|-------| +| GSC OAuth connection setup | 1 | One-time per connection | +| URL inspections (per 100) | 0.1 | Batch pricing | +| Indexing request (per URL) | 0.05 | Minimal cost | +| Analytics caching (per site/month) | 0.5 | Monthly recurring | + +Add to `CreditCostConfig`: +```python +CreditCostConfig.objects.get_or_create( + operation_type='gsc_inspection', + defaults={'display_name': 'GSC URL Inspection', 'base_credits': 1} +) +CreditCostConfig.objects.get_or_create( + operation_type='gsc_analytics', + defaults={'display_name': 'GSC Analytics Sync', 'base_credits': 1} +) +``` + +--- + +## 4. IMPLEMENTATION STEPS + +### Step 1: Create GSC Models +File to create/modify: +- `backend/igny8_core/business/integration/gsc_models.py` (or add to existing `models.py`) +- 5 new models: GSCConnection, URLInspectionRecord, IndexingQueue, GSCMetricsCache, GSCDailyQuota + +### Step 2: Create and Run Migration +```bash +cd /data/app/igny8/backend +python manage.py makemigrations --name gsc_integration +python manage.py migrate +``` + +### Step 3: Build GSCService +File to create: +- `backend/igny8_core/business/integration/gsc_service.py` +- Requires `google-auth`, `google-auth-oauthlib`, `google-api-python-client` packages + +Add to `requirements.txt`: +``` +google-auth>=2.0.0 +google-auth-oauthlib>=1.0.0 +google-api-python-client>=2.0.0 +``` + +### Step 4: Build IndexingQueueProcessor +File to create: +- `backend/igny8_core/business/integration/indexing_queue_processor.py` + +### Step 5: Build Celery Tasks +File to create: +- `backend/igny8_core/tasks/gsc_tasks.py` + +Add beat schedule entries to: +- `backend/igny8_core/celery.py` + +### Step 6: Build Auto-Indexing Signal +File to create: +- `backend/igny8_core/business/integration/signals.py` + +Register in: +- `backend/igny8_core/business/integration/apps.py` — `ready()` method + +### Step 7: Build Serializers +File to create: +- `backend/igny8_core/modules/integration/serializers/gsc_serializers.py` + +### Step 8: Build ViewSets and URLs +Files to create: +- `backend/igny8_core/modules/integration/views/gsc_views.py` +- Modify `backend/igny8_core/modules/integration/urls.py` — register GSC endpoints + +### Step 9: Add OAuth Settings +Add to `backend/igny8_core/settings.py`: +```python +# Google OAuth 2.0 (GSC Integration) +GOOGLE_CLIENT_ID = env('GOOGLE_CLIENT_ID', default='') +GOOGLE_CLIENT_SECRET = env('GOOGLE_CLIENT_SECRET', default='') +GOOGLE_REDIRECT_URI = env('GOOGLE_REDIRECT_URI', default='') +``` + +### Step 10: Frontend +Files to create in `frontend/src/`: +- `pages/Integration/GSCDashboard.tsx` — main GSC dashboard +- `pages/Integration/GSCAnalytics.tsx` — search analytics with charts +- `pages/Integration/GSCInspections.tsx` — URL inspection list with status badges +- `pages/Integration/GSCConnect.tsx` — OAuth connection flow +- `components/Integration/QuotaIndicator.tsx` — daily quota usage bar +- `components/Integration/InspectionStatusBadge.tsx` — status badges +- `stores/gscStore.ts` — Zustand store +- `api/gsc.ts` — API client + +### Step 11: Tests +```bash +cd /data/app/igny8/backend +python manage.py test igny8_core.business.integration.tests.test_gsc_service +python manage.py test igny8_core.business.integration.tests.test_indexing_queue +python manage.py test igny8_core.modules.integration.tests.test_gsc_views +``` + +--- + +## 5. ACCEPTANCE CRITERIA + +- [ ] 5 new database tables created and migrated successfully +- [ ] Google OAuth 2.0 flow works: connect → consent → callback → tokens stored +- [ ] GSC properties listed after successful OAuth connection +- [ ] Token refresh works automatically before expiry via Celery task +- [ ] URL Inspection API calls succeed and results stored in URLInspectionRecord +- [ ] Daily quota tracked in GSCDailyQuota, respects 2,000/day limit +- [ ] Rate limit of 1 request/3 seconds enforced in queue processor +- [ ] Re-inspection schedule runs: 1 day, 3 days, 6 days, 13 days after initial check +- [ ] URLs not indexed after Check 4 marked as 'manual_review' +- [ ] Content publish triggers auto-queue at priority 100 +- [ ] Search analytics data fetched and cached with 24-hour TTL +- [ ] Analytics endpoints return cached data with date range filtering +- [ ] All endpoints require authentication and enforce account isolation +- [ ] Frontend GSC dashboard shows: connection status, quota usage, inspection list, analytics charts +- [ ] inspection status badges display correctly on URL list +- [ ] `google-auth`, `google-auth-oauthlib`, `google-api-python-client` added to requirements.txt +- [ ] Disconnecting GSC revokes token and deletes GSCConnection + +--- + +## 6. CLAUDE CODE INSTRUCTIONS + +### Execution Order +1. Read `backend/igny8_core/business/integration/models.py` — understand existing SiteIntegration, SyncEvent +2. Read `backend/igny8_core/modules/integration/urls.py` — understand existing URL patterns +3. Read `backend/igny8_core/celery.py` — understand beat schedule registration +4. Add new packages to requirements.txt +5. Create GSC models (5 models) +6. Create migration, run it +7. Build GSCService (OAuth + API client) +8. Build IndexingQueueProcessor +9. Build Celery tasks (4 tasks) + register in beat schedule +10. Build auto-indexing signal +11. Build serializers, ViewSets, URLs +12. Build frontend components + +### Key Constraints +- ALL primary keys are `BigAutoField` (integer). No UUIDs. +- Model class names: GSCConnection, URLInspectionRecord, IndexingQueue, GSCMetricsCache, GSCDailyQuota (descriptive names, not plural) +- Frontend: `.tsx` files, Zustand stores, Vitest testing +- Celery app name: `igny8_core` +- All db_tables use `igny8_` prefix +- Tokens MUST be encrypted at rest (use same encryption as SiteIntegration.credentials_json) +- OAuth client_id/secret must be in environment variables, never in code +- Follow existing integration app patterns for URL structure + +### File Tree (New/Modified) +``` +backend/igny8_core/ +├── business/integration/ +│ ├── models.py # MODIFY or NEW gsc_models.py: 5 new models +│ ├── gsc_service.py # NEW: GSCService (OAuth + API) +│ ├── indexing_queue_processor.py # NEW: IndexingQueueProcessor +│ ├── signals.py # NEW: auto-indexing signal +│ └── apps.py # MODIFY: register signals in ready() +├── tasks/ +│ └── gsc_tasks.py # NEW: 4 Celery tasks +├── celery.py # MODIFY: add 4 beat schedule entries +├── settings.py # MODIFY: add GOOGLE_* settings +├── modules/integration/ +│ ├── serializers/ +│ │ └── gsc_serializers.py # NEW +│ ├── views/ +│ │ └── gsc_views.py # NEW +│ └── urls.py # MODIFY: register GSC endpoints +├── migrations/ +│ └── XXXX_gsc_integration.py # NEW: auto-generated +├── requirements.txt # MODIFY: add google-auth packages + +frontend/src/ +├── pages/Integration/ +│ ├── GSCDashboard.tsx # NEW +│ ├── GSCAnalytics.tsx # NEW +│ ├── GSCInspections.tsx # NEW +│ └── GSCConnect.tsx # NEW +├── components/Integration/ +│ ├── QuotaIndicator.tsx # NEW +│ └── InspectionStatusBadge.tsx # NEW +├── stores/ +│ └── gscStore.ts # NEW: Zustand store +├── api/ +│ └── gsc.ts # NEW: API client +``` + +### Cross-References +- **01E** (blueprint-aware pipeline): triggers auto-indexing after publish, hub pages prioritized +- **02E** (backlinks): GSC impressions data feeds backlink KPI dashboard +- **02F** (optimizer): GSC position data identifies optimization candidates +- **03A** (WP plugin standalone): standalone plugin has GSC dashboard tab +- **03B** (WP plugin connected): connected mode syncs index statuses from IGNY8 to WP +- **04B** (reporting): GSC metrics (clicks, impressions, CTR) feed into service reports diff --git a/v2/V2-Execution-Docs/02D-linker-internal.md b/v2/V2-Execution-Docs/02D-linker-internal.md new file mode 100644 index 00000000..408e486a --- /dev/null +++ b/v2/V2-Execution-Docs/02D-linker-internal.md @@ -0,0 +1,735 @@ +# IGNY8 Phase 2: Internal Linker (02D) +## SAG-Based Internal Linking Engine + +**Document Version:** 1.0 +**Date:** 2026-03-23 +**Phase:** IGNY8 Phase 2 — Feature Expansion +**Status:** Build Ready +**Source of Truth:** Codebase at `/data/app/igny8/` +**Audience:** Claude Code, Backend Developers, Architects + +--- + +## 1. CURRENT STATE + +### Internal Linking Today +There is **no** internal linking system in IGNY8. Content is generated and published without any cross-linking strategy. Links within content are only those the AI incidentally includes during generation. + +### What Exists +- `Content` model (app_label=`writer`, db_table=`igny8_content`) — stores `content_html` where links would be inserted +- `SAGCluster` and `SAGBlueprint` models (from 01A) — provide the cluster hierarchy for link topology +- The 7-stage automation pipeline (01E) generates and publishes content but has no linking stage between generation and publish +- `SiteIntegration` model (app_label=`integration`) tracks WordPress connections + +### What Does Not Exist +- No SAGLink model, no LinkMap model, no SAGLinkAudit model +- No link scoring algorithm +- No anchor text management +- No link density enforcement +- No link insertion into content_html +- No orphan page detection +- No link health monitoring +- No link audit system + +### Foundation Available +- `SAGBlueprint` (01A) — defines the SAG hierarchy (site → sectors → clusters → content) +- `SAGCluster` (01A) — cluster_type, hub_page_type, hub_page_structure +- `SAGAttribute` (01A) — attribute values shared across clusters (basis for cross-cluster linking) +- 01E pipeline — post-generation hook point available between Stage 4 (Content) and Stage 7 (Publish) +- `Content.content_type` and `Content.content_structure` — determines link density rules +- 02B `ContentTaxonomy` with cluster mapping — taxonomy-to-cluster relationships for taxonomy contextual links + +--- + +## 2. WHAT TO BUILD + +### Overview +Build a SAG-aware internal linking engine that automatically plans, scores, and inserts internal links into content. The system operates in two modes: new content mode (pipeline integration) and existing content remediation (audit + fix). + +### 2.1 Seven Link Types + +| # | Link Type | Direction | Description | Limit | Placement | +|---|-----------|-----------|-------------|-------|-----------| +| 1 | **Vertical Upward** | Supporting → Hub | MANDATORY: every supporting article links to its cluster hub | 1 per article | First 2 paragraphs | +| 2 | **Vertical Downward** | Hub → Supporting | Hub lists ALL its supporting articles | No cap | "Related Articles" section + contextual body links | +| 3 | **Horizontal Sibling** | Supporting ↔ Supporting | Same-cluster articles linking to each other | Max 2 per article | Natural content overlap points | +| 4 | **Cross-Cluster** | Hub ↔ Hub | Hubs sharing a SAGAttribute value can cross-link | Max 2 per hub | Contextual body links | +| 5 | **Taxonomy Contextual** | Term Page → Hubs | Term pages link to ALL cluster hubs using that attribute | No cap | Auto-generated from 02B taxonomy-cluster mapping | +| 6 | **Breadcrumb** | Hierarchical | Home → Sector → [Attribute] → Hub → Current Page | 1 chain per page | Top of page (auto-generated from SAG hierarchy) | +| 7 | **Related Content** | Cross-cluster allowed | 2-3 links in "Related Reading" section at end of article | 2-3 per article | End of article section | + +**Link Density Rules (outbound per page type, by word count):** + +| Page Type | <1000 words | 1000-2000 words | 2000+ words | +|-----------|------------|-----------------|-------------| +| Hub (`cluster_hub`) | 5-10 | 10-15 | 15-20 | +| Blog (article/guide/etc.) | 2-5 | 3-8 | 4-12 | +| Product/Service | 2-3 | 3-5 | 3-5 | +| Term Page (taxonomy) | 3+ | 3+ | unlimited | + +### 2.2 Link Scoring Algorithm (5 Factors) + +Each candidate link target receives a score (0-100): + +| Factor | Weight | Description | +|--------|--------|-------------| +| Shared attribute values | 40% | Count of SAGAttribute values shared between source and target clusters | +| Target page authority | 25% | Inbound link count of target page (from LinkMap) | +| Keyword overlap | 20% | Common keywords between source cluster and target content | +| Content recency | 10% | Newer content gets a boost (exponential decay over 6 months) | +| Link count gap | 5% | Pages with fewest inbound links get a priority boost | + +**Threshold:** Score ≥ 60 qualifies for automatic linking. Scores 40-59 are suggested for manual review. + +### 2.3 Anchor Text Rules + +| Rule | Value | +|------|-------| +| Min length | 2 words | +| Max length | 8 words | +| Grammatically natural | Must read naturally in surrounding sentence | +| No exact-match overuse | Same exact anchor cannot be used >3 times to same target URL | +| Anchor distribution per target | Primary keyword 60%, page title 30%, natural phrase 10% | +| Diversification audit | Flag if any single anchor accounts for >40% of links to a target | + +**Anchor Types:** +- `primary_keyword` — cluster primary keyword +- `page_title` — target content's title (or shortened version) +- `natural` — AI-selected contextually appropriate phrase +- `branded` — brand/site name (for homepage links) + +### 2.4 Two Operating Modes + +#### A. New Content Mode (Pipeline Integration) +Runs after Stage 4 (content generated), before Stage 7 (publish): + +1. Content generated by pipeline → link planning triggers +2. Calculate link targets using scoring algorithm +3. Insert links into `content_html` at natural positions +4. Store link plan in SAGLink records +5. If content is a hub → auto-generate "Related Articles" section with links to all supporting articles in cluster +6. **Mandatory check:** if content is a supporting article, verify vertical_up link to hub exists; insert if missing + +#### B. Existing Content Remediation (Audit + Fix) +For already-published content without proper internal linking: + +1. **Crawl phase:** Scan all published content for a site, extract all `` tags, build LinkMap +2. **Audit analysis:** + - Orphan pages: 0 inbound internal links + - Over-linked pages: outbound > density max for page type/word count + - Under-linked pages: outbound < density min + - Missing mandatory links: supporting articles without hub uplink + - Broken links: target URL returns 4xx/5xx +3. **Recommendation generation:** Priority-scored fix recommendations with AI-suggested anchor text +4. **Batch application:** Insert missing links across multiple content records + +### 2.5 Cluster-Level Link Health Score + +Per-cluster health score (0-100) for link coverage: + +| Factor | Points | +|--------|--------| +| Hub published and linked (has outbound + inbound links) | 25 | +| All supporting articles have mandatory uplink to hub | 25 | +| At least 1 cross-cluster link from hub | 15 | +| Term pages link to hub | 15 | +| No broken links in cluster | 10 | +| Link density within range for all pages | 10 | + +Site-wide link health = average of all cluster scores. Feeds into SAG health monitoring (01G). + +--- + +## 3. DATA MODELS & APIS + +### 3.1 New Models + +#### SAGLink (new `linker` app) + +```python +class SAGLink(SiteSectorBaseModel): + """ + Represents a planned or inserted internal link between two content pages. + Tracks link type, anchor text, score, and status through lifecycle. + """ + blueprint = models.ForeignKey( + 'planner.SAGBlueprint', + on_delete=models.SET_NULL, + null=True, + blank=True, + related_name='sag_links' + ) + source_content = models.ForeignKey( + 'writer.Content', + on_delete=models.CASCADE, + related_name='outbound_sag_links' + ) + target_content = models.ForeignKey( + 'writer.Content', + on_delete=models.CASCADE, + related_name='inbound_sag_links' + ) + link_type = models.CharField( + max_length=20, + choices=[ + ('vertical_up', 'Vertical Upward'), + ('vertical_down', 'Vertical Downward'), + ('horizontal', 'Horizontal Sibling'), + ('cross_cluster', 'Cross-Cluster'), + ('taxonomy', 'Taxonomy Contextual'), + ('breadcrumb', 'Breadcrumb'), + ('related', 'Related Content'), + ] + ) + anchor_text = models.CharField(max_length=200) + anchor_type = models.CharField( + max_length=20, + choices=[ + ('primary_keyword', 'Primary Keyword'), + ('page_title', 'Page Title'), + ('natural', 'Natural Phrase'), + ('branded', 'Branded'), + ] + ) + placement_zone = models.CharField( + max_length=20, + choices=[ + ('in_body', 'In Body'), + ('related_section', 'Related Section'), + ('breadcrumb', 'Breadcrumb'), + ('sidebar', 'Sidebar'), + ] + ) + placement_position = models.IntegerField( + null=True, + blank=True, + help_text='Paragraph number for in_body placement' + ) + score = models.FloatField( + default=0, + help_text='Link scoring algorithm result (0-100)' + ) + status = models.CharField( + max_length=15, + choices=[ + ('planned', 'Planned'), + ('inserted', 'Inserted'), + ('verified', 'Verified'), + ('broken', 'Broken'), + ('removed', 'Removed'), + ], + default='planned' + ) + is_mandatory = models.BooleanField( + default=False, + help_text='True for vertical_up links (supporting → hub)' + ) + inserted_at = models.DateTimeField(null=True, blank=True) + + class Meta: + app_label = 'linker' + db_table = 'igny8_sag_links' +``` + +**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel + +#### SAGLinkAudit (linker app) + +```python +class SAGLinkAudit(SiteSectorBaseModel): + """ + Stores results of a site-wide or cluster-level link audit. + """ + blueprint = models.ForeignKey( + 'planner.SAGBlueprint', + on_delete=models.SET_NULL, + null=True, + blank=True, + related_name='link_audits' + ) + audit_date = models.DateTimeField(auto_now_add=True) + total_links = models.IntegerField(default=0) + missing_mandatory = models.IntegerField(default=0) + orphan_pages = models.IntegerField(default=0) + broken_links = models.IntegerField(default=0) + over_linked_pages = models.IntegerField(default=0) + under_linked_pages = models.IntegerField(default=0) + cluster_scores = models.JSONField( + default=dict, + help_text='{cluster_id: {score, missing, issues[]}}' + ) + recommendations = models.JSONField( + default=list, + help_text='[{content_id, action, link_type, target_id, anchor_suggestion, priority}]' + ) + overall_health_score = models.FloatField( + default=0, + help_text='Average of cluster scores (0-100)' + ) + + class Meta: + app_label = 'linker' + db_table = 'igny8_sag_link_audits' +``` + +**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel + +#### LinkMap (linker app) + +```python +class LinkMap(SiteSectorBaseModel): + """ + Full link map of all internal (and external) links found in published content. + Built by crawling content_html of all published content records. + """ + source_url = models.URLField() + source_content = models.ForeignKey( + 'writer.Content', + on_delete=models.SET_NULL, + null=True, + blank=True, + related_name='outbound_link_map' + ) + target_url = models.URLField() + target_content = models.ForeignKey( + 'writer.Content', + on_delete=models.SET_NULL, + null=True, + blank=True, + related_name='inbound_link_map' + ) + anchor_text = models.CharField(max_length=500) + is_internal = models.BooleanField(default=True) + is_follow = models.BooleanField(default=True) + position = models.CharField( + max_length=20, + choices=[ + ('in_content', 'In Content'), + ('navigation', 'Navigation'), + ('footer', 'Footer'), + ('sidebar', 'Sidebar'), + ], + default='in_content' + ) + last_verified = models.DateTimeField(null=True, blank=True) + status = models.CharField( + max_length=15, + choices=[ + ('active', 'Active'), + ('broken', 'Broken'), + ('removed', 'Removed'), + ], + default='active' + ) + + class Meta: + app_label = 'linker' + db_table = 'igny8_link_map' +``` + +**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel + +### 3.2 Modified Models + +#### Content (writer app) — add 4 fields + +```python +# Add to Content model: +link_plan = models.JSONField( + null=True, + blank=True, + help_text='Planned links before insertion: [{target_id, link_type, anchor, score}]' +) +links_inserted = models.BooleanField( + default=False, + help_text='Whether link plan has been applied to content_html' +) +inbound_link_count = models.IntegerField( + default=0, + help_text='Cached count of inbound internal links' +) +outbound_link_count = models.IntegerField( + default=0, + help_text='Cached count of outbound internal links' +) +``` + +### 3.3 New App Registration + +Create linker app: +- **App config:** `igny8_core/modules/linker/apps.py` with `app_label = 'linker'` +- **Add to INSTALLED_APPS** in `igny8_core/settings.py` + +### 3.4 Migration + +``` +igny8_core/migrations/XXXX_add_linker_models.py +``` + +**Operations:** +1. `CreateModel('SAGLink', ...)` — with indexes on source_content, target_content, link_type, status +2. `CreateModel('SAGLinkAudit', ...)` +3. `CreateModel('LinkMap', ...)` — with index on source_url, target_url +4. `AddField('Content', 'link_plan', JSONField(null=True, blank=True))` +5. `AddField('Content', 'links_inserted', BooleanField(default=False))` +6. `AddField('Content', 'inbound_link_count', IntegerField(default=0))` +7. `AddField('Content', 'outbound_link_count', IntegerField(default=0))` + +### 3.5 API Endpoints + +All endpoints under `/api/v1/linker/`: + +#### Link Management +| Method | Path | Description | +|--------|------|-------------| +| GET | `/api/v1/linker/links/?site_id=X` | List all SAGLink records with filters (link_type, status, cluster_id, source_content_id) | +| POST | `/api/v1/linker/links/plan/` | Generate link plan for a content piece. Body: `{content_id}`. Returns planned SAGLink records. | +| POST | `/api/v1/linker/links/insert/` | Insert planned links into content_html. Body: `{content_id}`. Modifies Content.content_html. | +| POST | `/api/v1/linker/links/batch-insert/` | Batch insert for multiple content. Body: `{content_ids: [int]}`. Queues Celery task. | +| GET | `/api/v1/linker/content/{id}/links/` | All inbound + outbound links for a specific content piece. | + +#### Link Audit +| Method | Path | Description | +|--------|------|-------------| +| GET | `/api/v1/linker/audit/?site_id=X` | Latest SAGLinkAudit results. | +| POST | `/api/v1/linker/audit/run/` | Trigger site-wide link audit. Body: `{site_id}`. Queues Celery task. Returns task ID. | +| GET | `/api/v1/linker/audit/recommendations/?site_id=X` | Get fix recommendations from latest audit. | +| POST | `/api/v1/linker/audit/apply/` | Apply recommended fixes in batch. Body: `{site_id, recommendation_ids: [int]}`. | + +#### Link Map & Health +| Method | Path | Description | +|--------|------|-------------| +| GET | `/api/v1/linker/link-map/?site_id=X` | Full LinkMap for site with pagination. | +| GET | `/api/v1/linker/orphans/?site_id=X` | List orphan pages (0 inbound internal links). | +| GET | `/api/v1/linker/health/?site_id=X` | Cluster-level link health scores. | + +**Permissions:** All endpoints use `SiteSectorModelViewSet` permission patterns. + +### 3.6 Link Planning Service + +**Location:** `igny8_core/business/link_planning.py` + +```python +class LinkPlanningService: + """ + Generates internal link plans for content based on SAG hierarchy + and scoring algorithm. + """ + + SCORE_WEIGHTS = { + 'shared_attributes': 0.40, + 'target_authority': 0.25, + 'keyword_overlap': 0.20, + 'content_recency': 0.10, + 'link_count_gap': 0.05, + } + + AUTO_LINK_THRESHOLD = 60 + REVIEW_THRESHOLD = 40 + + def plan(self, content_id): + """ + Generate link plan for a content piece. + 1. Identify content's cluster and role (hub vs supporting) + 2. Determine mandatory links (vertical_up for supporting) + 3. Score all candidate targets + 4. Select targets within density limits + 5. Generate anchor text per link + 6. Create SAGLink records with status='planned' + Returns list of planned SAGLink records. + """ + pass + + def _get_mandatory_links(self, content, cluster): + """Vertical upward: supporting → hub. Always added.""" + pass + + def _get_candidates(self, content, cluster, blueprint): + """Gather all potential link targets from cluster and related clusters.""" + pass + + def _score_candidate(self, source_content, target_content, source_cluster, + target_cluster, blueprint): + """Calculate 0-100 score using 5-factor algorithm.""" + pass + + def _select_within_density(self, content, scored_candidates): + """Filter candidates to stay within density limits for page type and word count.""" + pass + + def _generate_anchor_text(self, source_content, target_content, link_type): + """AI-generate contextually appropriate anchor text.""" + pass +``` + +### 3.7 Link Insertion Service + +**Location:** `igny8_core/business/link_insertion.py` + +```python +class LinkInsertionService: + """ + Inserts planned links into content_html. + Handles placement, anchor text insertion, and collision avoidance. + """ + + def insert(self, content_id): + """ + Insert all planned SAGLink records into Content.content_html. + 1. Load all SAGLinks where source_content=content_id, status='planned' + 2. Parse content_html + 3. For each link, find insertion point based on placement_zone + position + 4. Insert tag with anchor text + 5. Update SAGLink status='inserted', set inserted_at + 6. Update Content.content_html, links_inserted=True, outbound_link_count + 7. Update target Content.inbound_link_count + """ + pass + + def _find_insertion_point(self, html_tree, link): + """ + Find best insertion point in parsed HTML: + - in_body: find paragraph at placement_position, find natural spot for anchor + - related_section: append to "Related Articles" section (create if missing) + - breadcrumb: insert breadcrumb trail at top + """ + pass + + def _insert_link(self, html_tree, position, anchor_text, target_url): + """Insert tag at position without breaking existing HTML.""" + pass +``` + +### 3.8 Link Audit Service + +**Location:** `igny8_core/business/link_audit.py` + +```python +class LinkAuditService: + """ + Runs site-wide link audits: builds link map, identifies issues, + generates recommendations. + """ + + def run_audit(self, site_id): + """ + Full audit: + 1. Crawl all published Content for site + 2. Extract all tags, build/update LinkMap records + 3. Identify orphan pages, over/under-linked, missing mandatory, broken + 4. Calculate per-cluster health scores + 5. Generate prioritized recommendations + 6. Create SAGLinkAudit record + Returns SAGLinkAudit instance. + """ + pass + + def _build_link_map(self, site_id): + """Extract links from all published content_html, create LinkMap records.""" + pass + + def _find_orphans(self, site_id): + """Content with 0 inbound internal links.""" + pass + + def _check_density(self, site_id): + """Compare outbound counts against density rules per page type.""" + pass + + def _check_mandatory(self, site_id): + """Verify all supporting articles have vertical_up link to their hub.""" + pass + + def _calculate_cluster_health(self, site_id, cluster): + """Calculate 0-100 health score per cluster.""" + pass + + def _generate_recommendations(self, issues): + """Priority-scored recommendations with AI-suggested anchor text.""" + pass +``` + +### 3.9 Celery Tasks + +**Location:** `igny8_core/tasks/linker_tasks.py` + +```python +@shared_task(name='generate_link_plan') +def generate_link_plan(content_id): + """Runs after content generation, before publish. Creates SAGLink records.""" + pass + +@shared_task(name='run_link_audit') +def run_link_audit(site_id): + """Scheduled weekly or triggered manually. Full site-wide audit.""" + pass + +@shared_task(name='verify_links') +def verify_links(site_id): + """Check for broken links via HTTP status checks on LinkMap URLs.""" + pass + +@shared_task(name='rebuild_link_map') +def rebuild_link_map(site_id): + """Full crawl of published content to rebuild LinkMap from scratch.""" + pass +``` + +**Beat Schedule Additions:** + +| Task | Schedule | Notes | +|------|----------|-------| +| `run_link_audit` | Weekly (Sunday 1:00 AM) | Site-wide audit for all active sites | +| `verify_links` | Weekly (Wednesday 2:00 AM) | HTTP check all active LinkMap entries | + +--- + +## 4. IMPLEMENTATION STEPS + +### Step 1: Create Linker App +1. Create `igny8_core/modules/linker/` directory with `__init__.py` and `apps.py` +2. Add `linker` to `INSTALLED_APPS` in settings.py +3. Create models: SAGLink, SAGLinkAudit, LinkMap + +### Step 2: Migration +1. Create migration for 3 new models +2. Add 4 new fields to Content model (link_plan, links_inserted, inbound_link_count, outbound_link_count) +3. Run migration + +### Step 3: Services +1. Implement `LinkPlanningService` in `igny8_core/business/link_planning.py` +2. Implement `LinkInsertionService` in `igny8_core/business/link_insertion.py` +3. Implement `LinkAuditService` in `igny8_core/business/link_audit.py` + +### Step 4: Pipeline Integration +Insert link planning + insertion between Stage 4 and Stage 7: + +```python +# After content generation completes in pipeline: +def post_content_generation(content_id): + # 02G: Generate schema + SERP elements + # ... + # 02D: Plan and insert internal links + link_service = LinkPlanningService() + link_service.plan(content_id) + insertion_service = LinkInsertionService() + insertion_service.insert(content_id) +``` + +### Step 5: API Endpoints +1. Create `igny8_core/urls/linker.py` with link, audit, and health endpoints +2. Create views extending `SiteSectorModelViewSet` +3. Register URL patterns under `/api/v1/linker/` + +### Step 6: Celery Tasks +1. Implement all 4 tasks in `igny8_core/tasks/linker_tasks.py` +2. Add `run_link_audit` and `verify_links` to Celery beat schedule + +### Step 7: Serializers & Admin +1. Create DRF serializers for SAGLink, SAGLinkAudit, LinkMap +2. Register models in Django admin + +### Step 8: Credit Cost Configuration +Add to `CreditCostConfig` (billing app): + +| operation_type | default_cost | description | +|---------------|-------------|-------------| +| `link_audit` | 1 | Site-wide link audit | +| `link_generation` | 0.5 | Generate 1-5 links with AI anchor text | +| `link_audit_full` | 3-5 | Full site audit with recommendations | + +--- + +## 5. ACCEPTANCE CRITERIA + +### Link Types +- [ ] Vertical upward link (supporting → hub) automatically inserted for all supporting articles +- [ ] Vertical downward links (hub → supporting) generated with "Related Articles" section +- [ ] Horizontal sibling links (max 2) between same-cluster supporting articles +- [ ] Cross-cluster links (max 2) between hubs sharing SAGAttribute values +- [ ] Taxonomy contextual links from term pages to all relevant cluster hubs +- [ ] Breadcrumb chain generated from SAG hierarchy for all content +- [ ] Related content section (2-3 links) generated at end of article + +### Link Scoring +- [ ] 5-factor scoring algorithm produces 0-100 scores +- [ ] Links with score ≥ 60 auto-inserted +- [ ] Links with score 40-59 suggested for manual review +- [ ] Score algorithm uses: shared attributes (40%), authority (25%), keyword overlap (20%), recency (10%), gap boost (5%) + +### Anchor Text +- [ ] Anchor text 2-8 words, grammatically natural +- [ ] Same exact anchor not used >3 times to same target +- [ ] Distribution per target: 60% primary keyword, 30% page title, 10% natural +- [ ] Diversification audit flags if any anchor >40% of links to a target + +### Link Density +- [ ] Hub pages: 5-20 outbound links based on word count +- [ ] Blog pages: 2-12 outbound links based on word count +- [ ] Product/Service pages: 2-5 outbound links +- [ ] Term pages: 3+ outbound, unlimited for taxonomy contextual + +### Audit & Remediation +- [ ] Link audit identifies orphan pages, over/under-linked, missing mandatory, broken links +- [ ] Cluster-level health score (0-100) calculated per cluster +- [ ] Recommendations generated with priority scores and AI-suggested anchors +- [ ] Batch application of recommendations modifies content_html correctly + +### Pipeline Integration +- [ ] Link plan generated automatically after content generation in pipeline +- [ ] Links inserted before publish stage +- [ ] Mandatory vertical_up link verified before allowing publish +- [ ] Content.inbound_link_count and outbound_link_count updated on insert + +--- + +## 6. CLAUDE CODE INSTRUCTIONS + +### File Locations +``` +igny8_core/ +├── modules/ +│ └── linker/ +│ ├── __init__.py +│ ├── apps.py # app_label = 'linker' +│ └── models.py # SAGLink, SAGLinkAudit, LinkMap +├── business/ +│ ├── link_planning.py # LinkPlanningService +│ ├── link_insertion.py # LinkInsertionService +│ └── link_audit.py # LinkAuditService +├── tasks/ +│ └── linker_tasks.py # Celery tasks +├── urls/ +│ └── linker.py # Linker endpoints +└── migrations/ + └── XXXX_add_linker_models.py +``` + +### Conventions +- **PKs:** BigAutoField (integer) — do NOT use UUIDs +- **Table prefix:** `igny8_` on all new tables +- **App label:** `linker` (new app) +- **Celery app name:** `igny8_core` +- **URL pattern:** `/api/v1/linker/...` +- **Permissions:** Use `SiteSectorModelViewSet` permission pattern +- **Model inheritance:** SAGLink and SAGLinkAudit extend `SiteSectorBaseModel`; LinkMap extends `SiteSectorBaseModel` +- **Frontend:** `.tsx` files with Zustand stores for state management + +### Cross-References +| Doc | Relationship | +|-----|-------------| +| **01A** | SAGBlueprint/SAGCluster/SAGAttribute provide hierarchy and cross-cluster relationships | +| **01E** | Pipeline integration — link planning hooks after Stage 4, before Stage 7 | +| **01G** | SAG health monitoring incorporates cluster link health scores | +| **02B** | ContentTaxonomy cluster mapping enables taxonomy contextual links | +| **02E** | External backlinks complement internal links; authority distributed by internal links | +| **02F** | Optimizer identifies internal link opportunities and feeds to linker | +| **03A** | WP plugin standalone mode has its own internal linking module — separate from this | +| **03C** | Theme renders breadcrumbs and related content sections generated by linker | + +### Key Decisions +1. **New `linker` app** — Separate app because linking is a distinct domain with its own models, not tightly coupled to writer or planner +2. **SAGLink stores planned AND inserted** — Single model tracks the full lifecycle from planning through insertion to verification +3. **LinkMap is separate from SAGLink** — LinkMap stores the actual crawled link state (including non-SAG links); SAGLink stores the planned/managed links +4. **Cached counts on Content** — `inbound_link_count` and `outbound_link_count` are denormalized for fast queries; updated on insert/removal +5. **HTML parsing for insertion** — Use Python HTML parser (BeautifulSoup or lxml) for safe link insertion without corrupting content_html diff --git a/v2/V2-Execution-Docs/02E-linker-external-backlinks.md b/v2/V2-Execution-Docs/02E-linker-external-backlinks.md new file mode 100644 index 00000000..cacfb63f --- /dev/null +++ b/v2/V2-Execution-Docs/02E-linker-external-backlinks.md @@ -0,0 +1,734 @@ +# IGNY8 Phase 2: External Linker & Backlinks (02E) +## SAG-Based External Backlink Campaign Engine + +**Document Version:** 1.0 +**Date:** 2026-03-23 +**Phase:** IGNY8 Phase 2 — Feature Expansion +**Status:** Build Ready +**Source of Truth:** Codebase at `/data/app/igny8/` +**Audience:** Claude Code, Backend Developers, Architects + +--- + +## 1. CURRENT STATE + +### External Linking Today +There is **no** backlink management in IGNY8. No external API integrations exist for link building platforms. No campaign generation, tracking, or KPI monitoring. Backlink building is entirely manual and external to the platform. + +### What Exists +- `SAGBlueprint` and `SAGCluster` models (01A) — provide the hierarchy and cluster assignments for target page identification +- `Keywords` model (planner app) — provides search volume data for tier assignment +- `Content` model with `content_type` and `content_structure` — classifies pages for tiering +- `GSCMetricsCache` (02C) — provides organic traffic and impression data for KPI tracking +- `SAGLink` and `LinkMap` models (02D) — internal link data complements external strategy +- The `linker` app (02D) — provides app namespace for related models + +### What Does Not Exist +- No SAGCampaign model, SAGBacklink model, or CampaignKPISnapshot model +- No page tier assignment system +- No country-specific strategy profiles +- No marketplace API integrations (FatGrid, PRNews.io, etc.) +- No campaign generation algorithm +- No anchor text planning +- No quality scoring for backlink opportunities +- No tipping point detection +- No dead link monitoring for placed backlinks + +--- + +## 2. WHAT TO BUILD + +### Overview +Build a SAG-based backlink campaign engine that generates country-specific link-building campaigns targeting hub pages. The system leverages the SAG hierarchy to focus backlinks on high-value pages (T1-T3) and lets internal linking (02D) distribute authority to supporting content. + +### 2.1 Hub-Only Strategy (Core Principle) + +Backlinks target ONLY T1-T3 pages (homepage + cluster hubs + key service/product pages). SAG internal linking (02D) distributes authority from hubs downstream to 70+ supporting pages per cluster. + +**Budget Allocation:** +- 70-85% to T1-T3 (homepage, top hubs, products/services) +- 15-30% to T4-T5 (authority magnets: guides, tools, supporting articles) +- 0% to term/taxonomy pages (get authority via internal links) + +Typically 20-30 target pages per site. + +### 2.2 Page Tier Assignment + +| Tier | Pages | Links/Page Target | Description | +|------|-------|-------------------|-------------| +| **T1** | Homepage (1 page) | 10-30 | Brand authority anchor | +| **T2** | Top 40% hubs by search volume | 5-15 | Primary money pages | +| **T3** | Remaining hubs + products/services | 3-10 | Supporting money pages | +| **T4** | Supporting blog articles | 1-4 | Content authority | +| **T5** | Authority magnets (guides, tools) | 2-6 | Link bait pages | + +**Tier Assignment Algorithm:** +1. Load SAGBlueprint → identify all published hub pages +2. Sort hubs by total cluster keyword search volume (from Keywords model) +3. Top 40% of hubs = T2, remaining hubs = T3 +4. Products/services pages = T3 +5. Supporting blog articles = T4 +6. Content with `content_structure` in (`guide`, `comparison`, `listicle`) and high word count = T5 + +### 2.3 Country-Specific Strategy Profiles + +Four pre-built profiles with different timelines, budgets, and quality thresholds: + +| Parameter | Pakistan (PK) | Canada (CA) | UK | USA | +|-----------|--------------|-------------|-----|------| +| Timeline | 8 months | 12 months | 14 months | 18 months | +| Budget range | $2-5K | $3-7K | $3-9K | $5-13K | +| Target DR | 25-30 | 35-40 | 35-40 | 40-45 | +| Quality threshold | ≥5/11 | ≥6/11 | ≥6/11 | ≥7/11 | +| Exact match anchor | 5-10% | 3-7% | 3-7% | 2-5% | +| Velocity phases | ramp→peak→cruise→maintenance | Same 4 phases | Same | Same | + +**Anchor Text Mix by Country:** + +| Anchor Type | PK | CA | UK | USA | +|-------------|-----|-----|-----|------| +| Branded | 30-35% | 35-40% | 35-40% | 35-45% | +| Naked URL | 15-20% | 15-20% | 15-20% | 15-20% | +| Generic | 15-20% | 15-20% | 15-20% | 15-20% | +| Partial Match | 15-20% | 12-18% | 12-18% | 10-15% | +| Exact Match | 5-10% | 3-7% | 3-7% | 2-5% | +| LSI/Topical | 5-10% | 5-10% | 5-10% | 5-8% | +| Brand+Keyword | — | 3-5% | 3-5% | 3-5% | + +### 2.4 Campaign Generation Algorithm + +1. Load SAGBlueprint → identify all published hub pages +2. Assign tiers based on search volume data (from Keywords model) +3. Select country profile → calculate `referring_domains_needed` +4. `links_per_tier = referring_domains_needed × tier_allocation_%` +5. `budget_estimate = links × cost_per_link × link_mix_%` +6. Distribute across monthly velocity curve (ramp → peak → cruise → maintenance) +7. Assign pages to months by priority (keyword difficulty, search volume, commercial value) +8. Pre-generate 3 anchor text variants per page per anchor type +9. Set quality requirements per country threshold + +### 2.5 Marketplace Integrations + +#### FatGrid API +- **Base URL:** `https://api.fatgrid.com/api/public` +- **Auth:** API key in request header +- **Endpoints:** + - Domain Lookup — DR, DA, traffic, niche for a domain + - Marketplace Browse — filterable by DR, traffic, price, niche + - Bulk Domain Lookup — up to 1,000 domains per request +- **15+ Aggregated Marketplaces:** Collaborator.pro, PRNews.io, Adsy.com, WhitePress.com, Bazoom.com, MeUp.com, etc. +- **Usage:** IGNY8 proxies FatGrid API calls to find and filter link placement opportunities + +#### PR Distribution (3 Tiers) +| Tier | Provider | Price Range | Reach | +|------|----------|-------------|-------| +| PR Basic | EIN Presswire | $99-499/release | AP News, Bloomberg, 115+ US TV | +| PR Premium | PRNews.io | $500-5K/placement | Yahoo Finance, Forbes-tier publications | +| PR Enterprise | Linking News (white-label) | $500-2K/distribution | ABC, NBC, FOX, Yahoo, Bloomberg | + +### 2.6 Quality Scoring (Per Backlink Opportunity) + +**Auto-Checkable Factors (7 points):** +1. Organic traffic >500/month +2. Domain Rating / Domain Authority > country threshold +3. Indexed in Google +4. Not on known PBN/spam farm blocklist +5. Traffic trend stable or growing +6. Niche relevance to content topic +7. Dofollow link confirmed + +**Manual Review Factors (4 points):** +8. Outbound links <100 on linking page +9. Niche relevance (editorial check) +10. Editorial quality of surrounding content +11. Dofollow confirmed (manual verification) + +**Total: 0-11 points. Country thresholds:** PK ≥5, CA ≥6, UK ≥6, USA ≥7 + +### 2.7 Authority Tipping Point Detection + +Monitor for 3+ simultaneous indicators: +- Domain Rating reached country target +- Pages with GSC impressions >100 but 0 SAGBacklinks start getting organic clicks +- Un-linked pages rank on page 2-3 (positions 11-30) +- New content ranks passively without dedicated backlinks +- Keywords in top 10 exceed threshold: PK 10+, UK/CA 15+, USA 20+ + +**When triggered:** Recommend: reduce link-building velocity, shift budget to content creation, enter maintenance mode. + +### 2.8 Dead Link Monitoring + +- Periodic HTTP checks on all placed backlinks (status=`live`) +- Status tracking: `live` → `dead` (404/403/removed) → `replaced` +- Impact scoring: estimate authority loss based on source DR and link type +- Auto-generate replacement recommendations +- Reserve 10-15% monthly budget for replacements + +--- + +## 3. DATA MODELS & APIS + +### 3.1 New Models + +All models in the `linker` app (same app as 02D). + +#### SAGCampaign (linker app) + +```python +class SAGCampaign(SiteSectorBaseModel): + """ + Backlink campaign generated from SAG data + country profile. + """ + blueprint = models.ForeignKey( + 'planner.SAGBlueprint', + on_delete=models.SET_NULL, + null=True, + blank=True, + related_name='backlink_campaigns' + ) + country_code = models.CharField( + max_length=3, + help_text='PK, CA, UK, or USA' + ) + status = models.CharField( + max_length=15, + choices=[ + ('draft', 'Draft'), + ('active', 'Active'), + ('paused', 'Paused'), + ('completed', 'Completed'), + ], + default='draft' + ) + tier_assignments = models.JSONField( + default=dict, + help_text='{content_id: tier_level (T1/T2/T3/T4/T5)}' + ) + total_links_target = models.IntegerField(default=0) + budget_estimate_min = models.DecimalField( + max_digits=10, decimal_places=2, default=0 + ) + budget_estimate_max = models.DecimalField( + max_digits=10, decimal_places=2, default=0 + ) + timeline_months = models.IntegerField(default=12) + monthly_plan = models.JSONField( + default=list, + help_text='[{month, links_target, pages[], budget}]' + ) + anchor_text_plan = models.JSONField( + default=dict, + help_text='{content_id: [{text, type, allocated}]}' + ) + country_profile = models.JSONField( + default=dict, + help_text='Full profile snapshot at campaign creation' + ) + kpi_data = models.JSONField( + default=dict, + help_text='Monthly KPI snapshots summary' + ) + started_at = models.DateTimeField(null=True, blank=True) + + class Meta: + app_label = 'linker' + db_table = 'igny8_sag_campaigns' +``` + +**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel + +#### SAGBacklink (linker app) + +```python +class SAGBacklink(SiteSectorBaseModel): + """ + Individual backlink record within a campaign. + Tracks from planning through placement to ongoing monitoring. + """ + campaign = models.ForeignKey( + 'linker.SAGCampaign', + on_delete=models.CASCADE, + related_name='backlinks' + ) + blueprint = models.ForeignKey( + 'planner.SAGBlueprint', + on_delete=models.SET_NULL, + null=True, + blank=True + ) + target_content = models.ForeignKey( + 'writer.Content', + on_delete=models.CASCADE, + related_name='backlinks' + ) + target_url = models.URLField() + target_tier = models.CharField( + max_length=3, + choices=[ + ('T1', 'Tier 1 — Homepage'), + ('T2', 'Tier 2 — Top Hubs'), + ('T3', 'Tier 3 — Other Hubs/Products'), + ('T4', 'Tier 4 — Supporting Articles'), + ('T5', 'Tier 5 — Authority Magnets'), + ] + ) + source_url = models.URLField( + blank=True, + default='', + help_text='May not be known at planning stage' + ) + source_domain = models.CharField(max_length=255, blank=True, default='') + source_dr = models.IntegerField(null=True, blank=True) + source_traffic = models.IntegerField(null=True, blank=True) + anchor_text = models.CharField(max_length=200) + anchor_type = models.CharField( + max_length=20, + choices=[ + ('branded', 'Branded'), + ('naked_url', 'Naked URL'), + ('generic', 'Generic'), + ('partial_match', 'Partial Match'), + ('exact_match', 'Exact Match'), + ('lsi', 'LSI/Topical'), + ('brand_keyword', 'Brand + Keyword'), + ] + ) + link_type = models.CharField( + max_length=20, + choices=[ + ('guest_post', 'Guest Post'), + ('niche_edit', 'Niche Edit'), + ('pr_distribution', 'PR Distribution'), + ('directory', 'Directory'), + ('resource_page', 'Resource Page'), + ('broken_link', 'Broken Link Building'), + ('haro', 'HARO/Journalist Query'), + ] + ) + marketplace = models.CharField( + max_length=20, + blank=True, + default='', + help_text='fatgrid, prnews, ein, linking_news, manual' + ) + cost = models.DecimalField( + max_digits=10, decimal_places=2, null=True, blank=True + ) + quality_score = models.FloatField( + null=True, + blank=True, + help_text='0-11 quality score' + ) + country_relevant = models.BooleanField(default=True) + date_ordered = models.DateField(null=True, blank=True) + date_live = models.DateField(null=True, blank=True) + date_last_checked = models.DateField(null=True, blank=True) + status = models.CharField( + max_length=15, + choices=[ + ('planned', 'Planned'), + ('ordered', 'Ordered'), + ('live', 'Live'), + ('dead', 'Dead'), + ('replaced', 'Replaced'), + ('rejected', 'Rejected'), + ], + default='planned' + ) + notes = models.TextField(blank=True, default='') + + class Meta: + app_label = 'linker' + db_table = 'igny8_sag_backlinks' +``` + +**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel + +#### CampaignKPISnapshot (linker app) + +```python +class CampaignKPISnapshot(SiteSectorBaseModel): + """ + Monthly KPI snapshot for a backlink campaign. + Tracks domain metrics, link counts, keyword rankings, and tipping point indicators. + """ + campaign = models.ForeignKey( + 'linker.SAGCampaign', + on_delete=models.CASCADE, + related_name='kpi_snapshots' + ) + snapshot_date = models.DateField() + dr = models.FloatField(null=True, blank=True, help_text='Domain Rating') + da = models.FloatField(null=True, blank=True, help_text='Domain Authority') + referring_domains = models.IntegerField(default=0) + new_links_this_month = models.IntegerField(default=0) + links_by_tier = models.JSONField(default=dict, help_text='{T1: count, T2: count, ...}') + cost_this_month = models.DecimalField(max_digits=10, decimal_places=2, default=0) + cost_per_link_avg = models.DecimalField(max_digits=10, decimal_places=2, default=0) + keywords_top_10 = models.IntegerField(default=0) + keywords_top_20 = models.IntegerField(default=0) + keywords_top_50 = models.IntegerField(default=0) + organic_traffic = models.IntegerField( + null=True, blank=True, + help_text='From GSC via 02C GSCMetricsCache' + ) + impressions = models.IntegerField( + null=True, blank=True, + help_text='From GSC via 02C GSCMetricsCache' + ) + pages_ranking_without_backlinks = models.IntegerField(default=0) + tipping_point_indicators = models.JSONField( + default=dict, + help_text='{indicator_name: True/False}' + ) + tipping_point_triggered = models.BooleanField(default=False) + + class Meta: + app_label = 'linker' + db_table = 'igny8_campaign_kpi_snapshots' +``` + +**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel + +### 3.2 Migration + +``` +igny8_core/migrations/XXXX_add_backlink_models.py +``` + +**Operations:** +1. `CreateModel('SAGCampaign', ...)` — with index on country_code, status +2. `CreateModel('SAGBacklink', ...)` — with indexes on campaign, status, target_content +3. `CreateModel('CampaignKPISnapshot', ...)` — with index on campaign, snapshot_date + +### 3.3 API Endpoints + +All endpoints under `/api/v1/linker/` (extending the linker URL namespace from 02D): + +#### Campaign Management +| Method | Path | Description | +|--------|------|-------------| +| GET | `/api/v1/linker/campaigns/?site_id=X` | List backlink campaigns | +| POST | `/api/v1/linker/campaigns/generate/` | AI-generate campaign from blueprint + country. Body: `{site_id, blueprint_id, country_code}`. | +| GET | `/api/v1/linker/campaigns/{id}/` | Campaign detail with monthly plan | +| PUT | `/api/v1/linker/campaigns/{id}/` | Update campaign (adjust plan, budget) | +| POST | `/api/v1/linker/campaigns/{id}/activate/` | Start campaign (set status=active, started_at) | +| POST | `/api/v1/linker/campaigns/{id}/pause/` | Pause active campaign | + +#### KPI Tracking +| Method | Path | Description | +|--------|------|-------------| +| GET | `/api/v1/linker/campaigns/{id}/kpi/` | KPI snapshot timeline for campaign | +| POST | `/api/v1/linker/campaigns/{id}/kpi/snapshot/` | Record monthly KPI. Body: `{dr, da, referring_domains, ...}`. | +| GET | `/api/v1/linker/tipping-point/?campaign_id=X` | Tipping point analysis — current indicator state | + +#### Backlink Records +| Method | Path | Description | +|--------|------|-------------| +| GET | `/api/v1/linker/backlinks/?campaign_id=X` | List backlinks with filters (status, tier, anchor_type) | +| POST | `/api/v1/linker/backlinks/` | Add backlink record. Body: `{campaign_id, target_content_id, ...}`. | +| PUT | `/api/v1/linker/backlinks/{id}/` | Update backlink status/details | +| POST | `/api/v1/linker/backlinks/check/` | Trigger dead link check for campaign. Body: `{campaign_id}`. | + +#### Marketplace Proxy +| Method | Path | Description | +|--------|------|-------------| +| GET | `/api/v1/linker/marketplace/search/` | FatGrid marketplace search. Query params: `dr_min, traffic_min, price_max, niche`. | +| GET | `/api/v1/linker/marketplace/domain/{domain}/` | FatGrid domain lookup — DR, DA, traffic, niche. | + +**Permissions:** All endpoints use `SiteSectorModelViewSet` permission patterns. + +### 3.4 Campaign Generation Service + +**Location:** `igny8_core/business/campaign_generation.py` + +```python +class CampaignGenerationService: + """ + Generates a backlink campaign from SAG data + country profile. + """ + + COUNTRY_PROFILES = { + 'PK': { + 'timeline_months': 8, + 'budget_min': 2000, 'budget_max': 5000, + 'target_dr': 25, 'quality_threshold': 5, + 'exact_match_max': 0.10, + 'tier_allocation': {'T1': 0.25, 'T2': 0.35, 'T3': 0.25, 'T4': 0.10, 'T5': 0.05}, + # ... full profile + }, + 'CA': { ... }, + 'UK': { ... }, + 'USA': { ... }, + } + + def generate(self, site_id, blueprint_id, country_code): + """ + 1. Load blueprint → published hub pages + 2. Assign tiers by search volume + 3. Select country profile + 4. Calculate total links needed + budget + 5. Build monthly plan with velocity curve + 6. Pre-generate anchor text variants + 7. Create SAGCampaign record + Returns SAGCampaign instance. + """ + pass + + def _assign_tiers(self, blueprint, published_content): + """Sort hubs by search volume, assign T1-T5.""" + pass + + def _build_monthly_plan(self, tier_assignments, country_profile): + """Distribute links across months using velocity curve.""" + pass + + def _generate_anchor_plans(self, tier_assignments, country_profile): + """Pre-generate 3 anchor text variants per page per type.""" + pass +``` + +### 3.5 Quality Scoring Service + +**Location:** `igny8_core/business/backlink_quality.py` + +```python +class BacklinkQualityService: + """ + Scores backlink opportunities on 0-11 scale. + """ + + def score(self, domain_data, country_code): + """ + Auto-check 7 factors: + 1. Organic traffic >500/mo + 2. DR/DA > country threshold + 3. Indexed in Google + 4. Not on blocklist + 5. Traffic trend stable/growing + 6. Niche relevance + 7. Dofollow confirmed + Returns (score, breakdown). + """ + pass + + def meets_threshold(self, score, country_code): + """Check if score meets country minimum.""" + pass +``` + +### 3.6 Tipping Point Detector + +**Location:** `igny8_core/business/tipping_point.py` + +```python +class TippingPointDetector: + """ + Monitors campaign KPI snapshots for authority tipping point indicators. + """ + + def evaluate(self, campaign_id): + """ + Check 5 indicators against latest KPI data + GSC data. + If 3+ triggered, set tipping_point_triggered=True. + Returns {indicators: {name: bool}, triggered: bool, recommendation: str} + """ + pass +``` + +### 3.7 FatGrid API Client + +**Location:** `igny8_core/integration/fatgrid_client.py` + +```python +class FatGridClient: + """ + Client for FatGrid marketplace API. + API key stored in SiteIntegration or account-level settings. + """ + + BASE_URL = 'https://api.fatgrid.com/api/public' + + def __init__(self, api_key): + self.api_key = api_key + + def search_marketplace(self, dr_min=None, traffic_min=None, + price_max=None, niche=None, limit=50): + """Browse marketplace with filters.""" + pass + + def lookup_domain(self, domain): + """Get DR, DA, traffic, niche for a domain.""" + pass + + def bulk_lookup(self, domains): + """Lookup up to 1,000 domains.""" + pass +``` + +### 3.8 Celery Tasks + +**Location:** `igny8_core/tasks/backlink_tasks.py` + +```python +@shared_task(name='check_backlink_status') +def check_backlink_status(campaign_id): + """Weekly HTTP check on all 'live' backlinks. Updates status to 'dead' if 4xx/5xx.""" + pass + +@shared_task(name='record_kpi_snapshot') +def record_kpi_snapshot(campaign_id): + """Monthly KPI recording: pull GSC data, calculate keyword rankings.""" + pass + +@shared_task(name='evaluate_tipping_point') +def evaluate_tipping_point(campaign_id): + """Monthly, after KPI snapshot. Check tipping point indicators.""" + pass + +@shared_task(name='generate_replacement_recommendations') +def generate_replacement_recommendations(campaign_id): + """Triggered when dead links detected. Generate replacement suggestions.""" + pass +``` + +**Beat Schedule Additions:** + +| Task | Schedule | Notes | +|------|----------|-------| +| `check_backlink_status` | Weekly (Thursday 3:00 AM) | HTTP check all live backlinks | +| `record_kpi_snapshot` | Monthly (1st of month, 5:00 AM) | Record KPI for all active campaigns | +| `evaluate_tipping_point` | Monthly (1st of month, 6:00 AM) | After KPI snapshot | + +--- + +## 4. IMPLEMENTATION STEPS + +### Step 1: Models +1. Create SAGCampaign, SAGBacklink, CampaignKPISnapshot in linker app (extends 02D) +2. Run migration + +### Step 2: Country Profiles +1. Define 4 country profiles (PK, CA, UK, USA) as configuration data +2. Store as constants in `CampaignGenerationService` or as a seed data migration + +### Step 3: Services +1. Implement `CampaignGenerationService` in `igny8_core/business/campaign_generation.py` +2. Implement `BacklinkQualityService` in `igny8_core/business/backlink_quality.py` +3. Implement `TippingPointDetector` in `igny8_core/business/tipping_point.py` +4. Implement `FatGridClient` in `igny8_core/integration/fatgrid_client.py` + +### Step 4: API Endpoints +1. Add campaign, backlink, KPI, and marketplace endpoints to `igny8_core/urls/linker.py` +2. Create views: `CampaignViewSet`, `BacklinkViewSet`, `KPISnapshotView`, `TippingPointView` +3. Create `MarketplaceSearchView`, `MarketplaceDomainLookupView` (FatGrid proxy) + +### Step 5: Celery Tasks +1. Implement 4 tasks in `igny8_core/tasks/backlink_tasks.py` +2. Add to Celery beat schedule (weekly backlink check, monthly KPI + tipping point) + +### Step 6: Serializers & Admin +1. Create DRF serializers for SAGCampaign, SAGBacklink, CampaignKPISnapshot +2. Register models in Django admin + +### Step 7: Credit Cost Configuration +Add to `CreditCostConfig` (billing app): + +| operation_type | default_cost | description | +|---------------|-------------|-------------| +| `campaign_generation` | 2 | AI-generate campaign from blueprint + country | +| `backlink_audit` | 1 | Dead link check for campaign | +| `kpi_snapshot` | 0.5 | Monthly KPI recording | +| `dead_link_detection` | 1 | Dead link detection + replacement recommendations | + +**Note:** FatGrid API calls consume FatGrid credits (separate billing — not IGNY8 credits). Store FatGrid API key in account-level settings or SiteIntegration. + +--- + +## 5. ACCEPTANCE CRITERIA + +### Campaign Generation +- [ ] Campaign generated from SAGBlueprint + country code +- [ ] Page tiers assigned based on search volume from Keywords model +- [ ] Budget estimates calculated from country profile + tier allocation +- [ ] Monthly plan distributed across velocity curve (ramp → peak → cruise → maintenance) +- [ ] Anchor text plan pre-generated with 3 variants per page per anchor type +- [ ] Country profile snapshot stored in campaign record + +### Country Profiles +- [ ] All 4 profiles (PK, CA, UK, USA) defined with correct parameters +- [ ] Anchor text mix enforced per country (branded %, exact match %, etc.) +- [ ] Quality thresholds enforced per country (PK ≥5, CA ≥6, UK ≥6, USA ≥7) +- [ ] Timeline and budget ranges match specification + +### Quality Scoring +- [ ] 7 auto-checkable factors scored per backlink opportunity +- [ ] Country-specific threshold enforcement +- [ ] Quality score stored on SAGBacklink record + +### Tipping Point +- [ ] 5 indicators monitored from KPI data + GSC data +- [ ] Triggered when 3+ indicators simultaneous +- [ ] Recommendation generated when triggered (reduce velocity, shift to content) +- [ ] tipping_point_triggered flag set on KPI snapshot + +### Dead Link Monitoring +- [ ] Weekly HTTP checks on all live backlinks +- [ ] Status transitions: live → dead, with date tracking +- [ ] Replacement recommendations generated for dead links +- [ ] 10-15% budget reserve recommended in campaign plan + +### Marketplace Integration +- [ ] FatGrid marketplace search proxied through IGNY8 API +- [ ] FatGrid domain lookup returns DR, DA, traffic, niche +- [ ] API key stored securely (not exposed to frontend) + +--- + +## 6. CLAUDE CODE INSTRUCTIONS + +### File Locations +``` +igny8_core/ +├── modules/ +│ └── linker/ +│ └── models.py # Add SAGCampaign, SAGBacklink, CampaignKPISnapshot +├── business/ +│ ├── campaign_generation.py # CampaignGenerationService +│ ├── backlink_quality.py # BacklinkQualityService +│ └── tipping_point.py # TippingPointDetector +├── integration/ +│ └── fatgrid_client.py # FatGridClient +├── tasks/ +│ └── backlink_tasks.py # Celery tasks +├── urls/ +│ └── linker.py # Extend with campaign/backlink endpoints +└── migrations/ + └── XXXX_add_backlink_models.py +``` + +### Conventions +- **PKs:** BigAutoField (integer) — do NOT use UUIDs +- **Table prefix:** `igny8_` on all new tables +- **App label:** `linker` (same app as 02D) +- **Celery app name:** `igny8_core` +- **URL pattern:** `/api/v1/linker/campaigns/...`, `/api/v1/linker/backlinks/...`, `/api/v1/linker/marketplace/...` +- **Permissions:** Use `SiteSectorModelViewSet` permission pattern +- **Model inheritance:** All new models extend `SiteSectorBaseModel` +- **Frontend:** `.tsx` files with Zustand stores + +### Cross-References +| Doc | Relationship | +|-----|-------------| +| **02D** | Internal linking distributes authority from hub pages (backlink targets) to supporting content | +| **02C** | GSC data feeds KPIs (organic traffic, impressions, keyword positions) | +| **01A** | SAGBlueprint provides cluster hierarchy for tier assignment | +| **04A** | Managed services package includes backlink campaign management for clients | +| **01G** | SAG health monitoring can incorporate backlink campaign progress | + +### Key Decisions +1. **Same `linker` app as 02D** — Internal and external linking share the same app since they're conceptually related; campaigns reference the same Content and Blueprint models +2. **Country profiles as code constants** — Stored in CampaignGenerationService class, not database, to prevent accidental modification; versioned with code +3. **FatGrid proxy** — Never expose FatGrid API key to frontend; all marketplace calls routed through IGNY8 backend +4. **KPI snapshots are manual + automatic** — Monthly auto-recording via Celery, but manual recording also supported for ad-hoc updates +5. **Separate billing for marketplace** — FatGrid credits are external; IGNY8 credits cover campaign generation, audits, and AI-powered anchor text generation diff --git a/v2/V2-Execution-Docs/02F-optimizer.md b/v2/V2-Execution-Docs/02F-optimizer.md new file mode 100644 index 00000000..d1b96356 --- /dev/null +++ b/v2/V2-Execution-Docs/02F-optimizer.md @@ -0,0 +1,602 @@ +# IGNY8 Phase 2: Content Optimizer (02F) +## Cluster-Aligned Content Optimization Engine + +**Document Version:** 1.0 +**Date:** 2026-03-23 +**Phase:** IGNY8 Phase 2 — Feature Expansion +**Status:** Build Ready +**Source of Truth:** Codebase at `/data/app/igny8/` +**Audience:** Claude Code, Backend Developers, Architects + +--- + +## 1. CURRENT STATE + +### Optimization App Today +The `optimization` Django app exists in `INSTALLED_APPS` but is **inactive** (behind feature flag). The following exist: + +- **`OptimizationTask` model** — exists with minimal fields (basic task tracking only) +- **`optimize_content` AI function** — registered in `igny8_core/ai/registry.py` as one of the 7 registered functions, but only does basic content rewriting without cluster awareness, keyword coverage analysis, or scoring +- **`optimization` app label** — app exists at `igny8_core/modules/optimization/` + +### What Does Not Exist +- No cluster-alignment during optimization +- No keyword coverage analysis against cluster keyword sets +- No heading restructure logic +- No intent-based content rewrite +- No schema gap detection +- No before/after scoring system (0-100) +- No batch optimization +- No integration with SAG data (01A) or taxonomy terms (02B) + +### Foundation Available +- `Clusters` model (app_label=`planner`, db_table=`igny8_clusters`) with cluster keywords +- `Keywords` model (app_label=`planner`, db_table=`igny8_keywords`) linked to clusters +- `Content.schema_markup` JSONField — used by 02G for JSON-LD +- `Content.content_type` and `Content.content_structure` — routing context +- `Content.structured_data` JSONField (added by 02A) +- `ContentTaxonomy` cluster mapping (added by 02B) with `mapping_confidence` +- `GSCMetricsCache` (added by 02C) — position data identifies pages needing optimization +- `SchemaValidationService` (added by 02G) — schema gap detection reuse +- `BaseAIFunction` with `validate()`, `prepare()`, `build_prompt()`, `parse_response()`, `save_output()` + +--- + +## 2. WHAT TO BUILD + +### Overview +Extend the existing `OptimizationTask` model and `optimize_content` AI function into a full cluster-aligned optimization engine. The system analyzes content against its cluster's keyword set, scores quality on a 0-100 scale, and produces optimized content with tracked before/after metrics. + +### 2.1 Cluster Matching (Auto-Assign Optimization Context) + +When content has no cluster assignment, the optimizer auto-detects the best-fit cluster: + +**Scoring Algorithm:** +- Keyword overlap (40%): count of cluster keywords found in content title + headings + body +- Semantic similarity (40%): AI-scored relevance between content topic and cluster theme +- Title match (20%): similarity between content title and cluster name/keywords + +**Thresholds:** +- Confidence ≥ 0.6 → auto-assign cluster +- Confidence < 0.6 → flag for manual review, suggest top 3 candidates + +This reuses the same scoring pattern as `ClusterMappingService` from 02B. + +### 2.2 Keyword Coverage Analysis + +For content with an assigned cluster: + +1. Load all `Keywords` records belonging to that cluster +2. Scan `content_html` for each keyword: exact match, partial match (stemmed), semantic presence +3. Report per keyword: `{keyword, target_density, current_density, status: present|missing|low_density}` +4. Coverage targets: + - Hub content (`cluster_hub`): 70%+ of cluster keywords covered + - Supporting articles: 40%+ of cluster keywords covered + - Product/service pages: 30%+ (focused on commercial keywords) + +### 2.3 Heading Restructure + +Analyze H1/H2/H3 hierarchy for SEO best practices: + +| Check | Rule | Fix | +|-------|------|-----| +| Single H1 | Content must have exactly one H1 | Merge or demote extra H1s | +| H2 keyword coverage | H2s should contain target keywords from cluster | AI rewrites H2s with keyword incorporation | +| Logical hierarchy | No skipped levels (H1 → H3 without H2) | Insert missing levels | +| H2 count | Minimum 3 H2s for content >1000 words | AI suggests additional H2 sections | +| Missing keyword themes | Cluster keywords not represented in any heading | AI suggests new H2/H3 sections for missing themes | + +### 2.4 Content Rewrite (Intent-Aligned) + +**Intent Classification:** +- **Informational**: expand explanations, add examples, increase depth, add definitions +- **Commercial**: add comparison tables, pros/cons, feature highlights, trust signals +- **Transactional**: strengthen CTAs, add urgency, streamline conversion path, social proof + +**Content Adjustments:** +- Expand thin content (<500 words) to minimum viable length for the content structure +- Compress bloated content (detect and remove redundancy) +- Add missing sections identified by keyword coverage analysis +- Maintain existing tone and style while improving SEO alignment + +### 2.5 Schema Gap Detection + +Leverages `SchemaValidationService` from 02G: + +1. Check existing `Content.schema_markup` against expected schemas for the content type +2. Expected schema by type: Article (post), Product (product), Service (service_page), FAQPage (if FAQ detected), BreadcrumbList (all), HowTo (if steps detected) +3. Identify missing required fields per schema type +4. Generate corrected/complete schema JSON-LD +5. Schema-only optimization mode available (no content rewrite, just schema fix) + +### 2.6 Before/After Scoring + +**Content Quality Score (0-100):** + +| Factor | Weight | Score Criteria | +|--------|--------|---------------| +| Keyword Coverage | 30% | % of cluster keywords present vs target | +| Heading Structure | 20% | Single H1, keyword H2s, logical hierarchy, no skipped levels | +| Content Depth | 20% | Word count vs structure minimum, section completeness, detail level | +| Readability | 15% | Sentence length, paragraph length, Flesch-Kincaid approximation | +| Schema Completeness | 15% | Required schema fields present, validation passes | + +Every optimization records `score_before` and `score_after`. Dashboard aggregates show average improvement across all optimizations. + +### 2.7 Batch Optimization + +- Select content by: cluster ID, score threshold (e.g., all content scoring < 50), content type, date range +- Queue as Celery tasks with priority ordering (lowest scores first) +- Concurrency: max 3 concurrent optimization tasks per account +- Progress tracking via OptimizationTask status field +- Cancel capability: change status to `rejected` to stop processing + +--- + +## 3. DATA MODELS & APIS + +### 3.1 Modified Model — OptimizationTask (optimization app) + +Extend the existing `OptimizationTask` model with 16 new fields: + +```python +# Add to existing OptimizationTask model: + +content = models.ForeignKey( + 'writer.Content', + on_delete=models.CASCADE, + related_name='optimization_tasks' +) +primary_cluster = models.ForeignKey( + 'planner.Clusters', + on_delete=models.SET_NULL, + null=True, + blank=True, + related_name='optimization_tasks' +) +secondary_clusters = models.JSONField( + default=list, + blank=True, + help_text='List of Clusters IDs for secondary relevance' +) +keyword_targets = models.JSONField( + default=list, + blank=True, + help_text='[{keyword, target_density, current_density, status}]' +) +optimization_type = models.CharField( + max_length=20, + choices=[ + ('full_rewrite', 'Full Rewrite'), + ('heading_only', 'Heading Only'), + ('schema_only', 'Schema Only'), + ('keyword_coverage', 'Keyword Coverage'), + ('batch', 'Batch'), + ], + default='full_rewrite' +) +intent_classification = models.CharField( + max_length=15, + choices=[ + ('informational', 'Informational'), + ('commercial', 'Commercial'), + ('transactional', 'Transactional'), + ], + blank=True, + default='' +) +score_before = models.FloatField(null=True, blank=True) +score_after = models.FloatField(null=True, blank=True) +content_before = models.TextField( + blank=True, + default='', + help_text='Snapshot of original content_html' +) +content_after = models.TextField( + blank=True, + default='', + help_text='Optimized HTML (null until optimization completes)' +) +metadata_before = models.JSONField( + default=dict, + blank=True, + help_text='{meta_title, meta_description, headings[]}' +) +metadata_after = models.JSONField( + default=dict, + blank=True +) +schema_before = models.JSONField(default=dict, blank=True) +schema_after = models.JSONField(default=dict, blank=True) +structure_changes = models.JSONField( + default=list, + blank=True, + help_text='[{change_type, description, before, after}]' +) +confidence_score = models.FloatField( + null=True, + blank=True, + help_text='AI confidence in the quality of changes (0-1)' +) +applied = models.BooleanField(default=False) +applied_at = models.DateTimeField(null=True, blank=True) +``` + +**Update STATUS choices on OptimizationTask:** +```python +STATUS_CHOICES = [ + ('pending', 'Pending'), + ('analyzing', 'Analyzing'), + ('optimizing', 'Optimizing'), + ('review', 'Ready for Review'), + ('applied', 'Applied'), + ('rejected', 'Rejected'), +] +``` + +**PK:** BigAutoField (integer) — existing model +**Table:** existing `igny8_optimization_tasks` table (no rename needed) + +### 3.2 Migration + +Single migration in the optimization app (or igny8_core migrations): + +``` +igny8_core/migrations/XXXX_extend_optimization_task.py +``` + +**Operations:** +1. `AddField('OptimizationTask', 'content', ...)` — FK to Content +2. `AddField('OptimizationTask', 'primary_cluster', ...)` — FK to Clusters +3. `AddField('OptimizationTask', 'secondary_clusters', ...)` — JSONField +4. `AddField('OptimizationTask', 'keyword_targets', ...)` — JSONField +5. `AddField('OptimizationTask', 'optimization_type', ...)` — CharField +6. `AddField('OptimizationTask', 'intent_classification', ...)` — CharField +7. `AddField('OptimizationTask', 'score_before', ...)` — FloatField +8. `AddField('OptimizationTask', 'score_after', ...)` — FloatField +9. `AddField('OptimizationTask', 'content_before', ...)` — TextField +10. `AddField('OptimizationTask', 'content_after', ...)` — TextField +11. `AddField('OptimizationTask', 'metadata_before', ...)` — JSONField +12. `AddField('OptimizationTask', 'metadata_after', ...)` — JSONField +13. `AddField('OptimizationTask', 'schema_before', ...)` — JSONField +14. `AddField('OptimizationTask', 'schema_after', ...)` — JSONField +15. `AddField('OptimizationTask', 'structure_changes', ...)` — JSONField +16. `AddField('OptimizationTask', 'confidence_score', ...)` — FloatField +17. `AddField('OptimizationTask', 'applied', ...)` — BooleanField +18. `AddField('OptimizationTask', 'applied_at', ...)` — DateTimeField + +### 3.3 API Endpoints + +All endpoints under `/api/v1/optimizer/`: + +| Method | Path | Description | +|--------|------|-------------| +| POST | `/api/v1/optimizer/analyze/` | Analyze single content piece. Body: `{content_id}`. Returns scores + keyword coverage + heading analysis + recommendations. Does NOT rewrite. | +| POST | `/api/v1/optimizer/optimize/` | Run full optimization. Body: `{content_id, optimization_type}`. Creates OptimizationTask, runs analysis + rewrite, returns preview. | +| POST | `/api/v1/optimizer/preview/` | Preview changes without creating task. Body: `{content_id}`. Returns diff-style output. | +| POST | `/api/v1/optimizer/apply/{id}/` | Apply optimized version. Copies `content_after` → `Content.content_html`, updates metadata, sets `applied=True`. | +| POST | `/api/v1/optimizer/reject/{id}/` | Reject optimization. Sets status=`rejected`, keeps original content. | +| POST | `/api/v1/optimizer/batch/` | Queue batch optimization. Body: `{site_id, cluster_id?, score_threshold?, content_type?, content_ids?}`. Returns batch task ID. | +| GET | `/api/v1/optimizer/tasks/?site_id=X` | List OptimizationTask records with filters (status, optimization_type, cluster_id, date range). | +| GET | `/api/v1/optimizer/tasks/{id}/` | Single optimization detail with full before/after data. | +| GET | `/api/v1/optimizer/tasks/{id}/diff/` | HTML diff view — visual comparison of content_before vs content_after. | +| GET | `/api/v1/optimizer/cluster-suggestions/?content_id=X` | Suggest best-fit cluster for unassigned content. Returns top 3 candidates with confidence scores. | +| POST | `/api/v1/optimizer/assign-cluster/` | Assign cluster to content. Body: `{content_id, cluster_id}`. Updates Content record. | +| GET | `/api/v1/optimizer/dashboard/?site_id=X` | Optimization stats: avg score improvement, count by status, top improved, lowest scoring content. | + +**Permissions:** All endpoints use `SiteSectorModelViewSet` permission patterns. + +### 3.4 AI Function — Enhanced optimize_content + +Extend the existing registered `optimize_content` AI function: + +**Registry key:** `optimize_content` (already registered — enhance, not replace) +**Location:** `igny8_core/ai/functions/optimize_content.py` (existing file) + +```python +class OptimizeContentFunction(BaseAIFunction): + """ + Enhanced cluster-aligned content optimization. + Extends existing optimize_content with keyword coverage, + heading restructure, intent classification, and scoring. + """ + function_name = 'optimize_content' + + def validate(self, content_id, optimization_type='full_rewrite', **kwargs): + # Verify content exists, has content_html + # Verify optimization_type is valid + pass + + def prepare(self, content_id, optimization_type='full_rewrite', **kwargs): + # Load Content record + # Determine cluster (from Content or auto-match) + # Load cluster Keywords + # Analyze current keyword coverage + # Parse heading structure + # Classify intent + # Calculate score_before + # Snapshot content_before, metadata_before, schema_before + pass + + def build_prompt(self): + # Build type-specific optimization prompt: + # - Include current content_html + # - Include cluster keywords with coverage status + # - Include heading analysis results + # - Include intent classification + # - Include optimization_type instructions: + # full_rewrite: all optimizations + # heading_only: heading restructure only + # schema_only: schema fix only (no content change) + # keyword_coverage: add missing keyword sections only + pass + + def parse_response(self, response): + # Parse optimized HTML + # Parse updated metadata (meta_title, meta_description) + # Parse structure_changes list + # Parse confidence_score + pass + + def save_output(self, parsed): + # Create OptimizationTask with all before/after data + # Calculate score_after + # Set status='review' + pass +``` + +### 3.5 Content Scoring Service + +**Location:** `igny8_core/business/content_scoring.py` + +```python +class ContentScoringService: + """ + Calculates Content Quality Score (0-100) using 5 weighted factors. + Used by optimizer for before/after and by dashboard for overview. + """ + + WEIGHTS = { + 'keyword_coverage': 0.30, + 'heading_structure': 0.20, + 'content_depth': 0.20, + 'readability': 0.15, + 'schema_completeness': 0.15, + } + + def score(self, content_id, cluster_id=None): + """ + Calculate composite score for a content record. + Returns: {total: float, breakdown: {factor: score}} + """ + pass + + def _score_keyword_coverage(self, content, cluster): + """0-100: % of cluster keywords found in content.""" + pass + + def _score_heading_structure(self, content_html): + """0-100: single H1, keyword H2s, no skipped levels, H2 count.""" + pass + + def _score_content_depth(self, content_html, content_structure): + """0-100: word count vs minimum for structure type, section completeness.""" + pass + + def _score_readability(self, content_html): + """0-100: avg sentence length, paragraph length, Flesch-Kincaid approx.""" + pass + + def _score_schema_completeness(self, content): + """0-100: required schema fields present, from SchemaValidationService (02G).""" + pass +``` + +### 3.6 Keyword Coverage Analyzer + +**Location:** `igny8_core/business/keyword_coverage.py` + +```python +class KeywordCoverageAnalyzer: + """ + Analyzes content against cluster keyword set. + Returns per-keyword presence and overall coverage percentage. + """ + + def analyze(self, content_id, cluster_id): + """ + Returns { + total_keywords: int, + covered: int, + missing: int, + coverage_pct: float, + keywords: [{keyword, target_density, current_density, status}] + } + """ + pass + + def _extract_text(self, content_html): + """Strip HTML, return plain text for analysis.""" + pass + + def _check_keyword(self, keyword, text): + """Check for exact, partial (stemmed), and semantic presence.""" + pass +``` + +### 3.7 Celery Tasks + +**Location:** `igny8_core/tasks/optimization_tasks.py` + +```python +@shared_task(name='run_optimization') +def run_optimization(optimization_task_id): + """Process a single OptimizationTask. Called by API endpoints.""" + pass + +@shared_task(name='run_batch_optimization') +def run_batch_optimization(site_id, cluster_id=None, score_threshold=None, + content_type=None, content_ids=None, batch_size=10): + """ + Process batch of content for optimization. + Selects content matching filters, creates OptimizationTask per item, + processes sequentially with max 3 concurrent per account. + """ + pass + +@shared_task(name='identify_optimization_candidates') +def identify_optimization_candidates(site_id, threshold=50): + """ + Weekly scan: find content with quality score below threshold. + Creates report, does NOT auto-optimize. + """ + pass +``` + +**Beat Schedule Addition:** + +| Task | Schedule | Notes | +|------|----------|-------| +| `identify_optimization_candidates` | Weekly (Monday 4:00 AM) | Scans all sites, identifies low-scoring content | + +--- + +## 4. IMPLEMENTATION STEPS + +### Step 1: Migration +1. Add 16 new fields to `OptimizationTask` model +2. Update STATUS_CHOICES on OptimizationTask +3. Run migration + +### Step 2: Services +1. Implement `ContentScoringService` in `igny8_core/business/content_scoring.py` +2. Implement `KeywordCoverageAnalyzer` in `igny8_core/business/keyword_coverage.py` + +### Step 3: AI Function Enhancement +1. Extend `OptimizeContentFunction` in `igny8_core/ai/functions/optimize_content.py` +2. Add cluster-alignment, keyword coverage, heading analysis, intent classification, scoring +3. Maintain backward compatibility — existing `optimize_content` calls still work + +### Step 4: API Endpoints +1. Add optimizer endpoints to `igny8_core/urls/optimizer.py` (or create if doesn't exist) +2. Create views: `AnalyzeView`, `OptimizeView`, `PreviewView`, `ApplyView`, `RejectView`, `BatchView` +3. Create `ClusterSuggestionsView`, `AssignClusterView`, `DashboardView`, `DiffView` +4. Register URL patterns under `/api/v1/optimizer/` + +### Step 5: Celery Tasks +1. Implement `run_optimization`, `run_batch_optimization`, `identify_optimization_candidates` +2. Add `identify_optimization_candidates` to Celery beat schedule + +### Step 6: Serializers & Admin +1. Update DRF serializer for extended OptimizationTask (include all 16 new fields) +2. Create nested serializers for before/after views +3. Update Django admin registration + +### Step 7: Credit Cost Configuration +Add to `CreditCostConfig` (billing app): + +| operation_type | default_cost | description | +|---------------|-------------|-------------| +| `optimization_analysis` | 2 | Analyze single content (scoring + keyword coverage) | +| `optimization_full_rewrite` | 5-8 | Full rewrite optimization (varies by content length) | +| `optimization_schema_only` | 1 | Schema gap fix only | +| `optimization_batch` | 15-25 | Batch optimization for 10 items | + +Credit deduction follows existing `CreditUsageLog` pattern. + +--- + +## 5. ACCEPTANCE CRITERIA + +### Cluster Matching +- [ ] Content without cluster assignment gets auto-matched with confidence scoring +- [ ] Confidence ≥ 0.6 auto-assigns; < 0.6 flags for manual review with top 3 suggestions +- [ ] Cluster suggestions endpoint returns ranked candidates + +### Keyword Coverage +- [ ] All cluster keywords analyzed for presence in content +- [ ] Coverage report includes exact match, partial match, and missing keywords +- [ ] Hub content targets 70%+, supporting articles 40%+, product/service 30%+ + +### Heading Restructure +- [ ] H1/H2/H3 hierarchy validated (single H1, no skipped levels) +- [ ] Missing keyword themes identified and new headings suggested +- [ ] AI rewrites headings incorporating target keywords while maintaining meaning + +### Content Rewrite +- [ ] Intent classified correctly (informational/commercial/transactional) +- [ ] Rewrite adjusts content structure based on intent +- [ ] Thin content expanded, bloated content compressed +- [ ] Missing keyword sections added + +### Scoring +- [ ] Score 0-100 calculated with 5 weighted factors +- [ ] score_before recorded before any changes +- [ ] score_after recorded after optimization +- [ ] Dashboard shows average improvement and distribution + +### Before/After +- [ ] Full snapshot of original content preserved in content_before +- [ ] Optimized version stored in content_after without auto-applying +- [ ] Diff view provides visual HTML comparison +- [ ] Apply action copies content_after → Content.content_html +- [ ] Reject action preserves original, marks task rejected + +### Batch +- [ ] Batch optimization selects content by cluster, score threshold, type, or explicit IDs +- [ ] Max 3 concurrent optimizations per account enforced +- [ ] Progress trackable via OptimizationTask status +- [ ] Weekly candidate identification runs without auto-optimizing + +### Integration +- [ ] Schema gap detection leverages SchemaValidationService from 02G +- [ ] Credit costs deducted per CreditCostConfig entries +- [ ] All API endpoints respect account/site permission boundaries + +--- + +## 6. CLAUDE CODE INSTRUCTIONS + +### File Locations +``` +igny8_core/ +├── ai/ +│ └── functions/ +│ └── optimize_content.py # Enhance existing function +├── business/ +│ ├── content_scoring.py # ContentScoringService +│ └── keyword_coverage.py # KeywordCoverageAnalyzer +├── tasks/ +│ └── optimization_tasks.py # Celery tasks +├── urls/ +│ └── optimizer.py # Optimizer endpoints +└── migrations/ + └── XXXX_extend_optimization_task.py +``` + +### Conventions +- **PKs:** BigAutoField (integer) — do NOT use UUIDs +- **Table prefix:** `igny8_` (existing table `igny8_optimization_tasks`) +- **Celery app name:** `igny8_core` +- **URL pattern:** `/api/v1/optimizer/...` +- **Permissions:** Use `SiteSectorModelViewSet` permission pattern +- **AI functions:** Extend existing `BaseAIFunction` subclass — do NOT create a new registration key, enhance the existing `optimize_content` +- **Frontend:** `.tsx` files with Zustand stores for state management + +### Cross-References +| Doc | Relationship | +|-----|-------------| +| **02B** | Taxonomy terms get cluster context for optimization; ClusterMappingService scoring pattern reused | +| **02G** | SchemaValidationService used for schema gap detection; schema_only optimization triggers 02G schema generation | +| **02C** | GSC position data identifies pages needing optimization (high impressions, low clicks) | +| **02D** | Optimizer identifies internal link opportunities and feeds them to linker | +| **01E** | Blueprint-aware pipeline sets initial content quality; optimizer improves post-generation | +| **01A** | SAGBlueprint/SAGCluster data provides cluster context for optimization | +| **01G** | SAG health monitoring can incorporate content quality scores as a health factor | + +### Key Decisions +1. **Extend, don't replace** — The existing `OptimizationTask` model and `optimize_content` AI function are enhanced, not replaced with new models +2. **Preview-first workflow** — Optimizations always produce a preview (status=`review`) before applying to Content +3. **Content snapshot** — Full HTML snapshot stored in `content_before` for rollback capability +4. **Score reuse** — `ContentScoringService` is a standalone service usable by other modules (02G schema audit, 01G health monitoring) +5. **Schema delegation** — Schema gap detection reuses 02G's `SchemaValidationService` rather than duplicating logic diff --git a/v2/V2-Execution-Docs/02G-rich-schema-serp.md b/v2/V2-Execution-Docs/02G-rich-schema-serp.md new file mode 100644 index 00000000..fd97e267 --- /dev/null +++ b/v2/V2-Execution-Docs/02G-rich-schema-serp.md @@ -0,0 +1,702 @@ +# IGNY8 Phase 2: Rich Schema & SERP Enhancement (02G) +## JSON-LD Schema Generation & On-Page SERP Element Injection + +**Document Version:** 1.0 +**Date:** 2026-03-23 +**Phase:** IGNY8 Phase 2 — Feature Expansion +**Status:** Build Ready +**Source of Truth:** Codebase at `/data/app/igny8/` +**Audience:** Claude Code, Backend Developers, Architects + +--- + +## 1. CURRENT STATE + +### Schema Markup Today +The `Content` model (app_label=`writer`, db_table=`igny8_content`) has a `schema_markup` JSONField that stores raw JSON-LD. The AI function `generate_content` occasionally includes basic Article schema, but the output is inconsistent and unvalidated. + +### What Works Now +- `Content.schema_markup` — JSONField exists, sometimes populated during generation +- `generate_content` AI function — may produce rudimentary Article schema as part of content output +- `ContentTypeTemplate` model (added by 02A) defines section layouts and presets per content type +- 02A added `Content.structured_data` JSONField for type-specific data (product specs, service steps, etc.) + +### What Does Not Exist +- No systematic schema generation by content type +- No on-page SERP element injection (TL;DR, TOC, Key Takeaways, etc.) +- No schema validation against Google Rich Results requirements +- No retroactive enhancement of already-published content +- No SchemaTemplate model, no SERPEnhancement model, no validation records +- No SERP element tracking per content + +### Phase 1 & 2A Foundation Available +- `SAGCluster.cluster_type` choices: `product_category`, `condition_problem`, `feature`, `brand`, `informational`, `comparison` +- 01E blueprint-aware pipeline provides `blueprint_context` with `cluster_type`, `content_structure`, `content_type` +- 02A content type routing provides type-specific generation with section layouts +- `Content.content_type` choices: `post`, `page`, `product`, `taxonomy` +- `Content.content_structure` choices: 14 structure types including `cluster_hub`, `product_page`, `service_page`, `comparison`, `review` + +--- + +## 2. WHAT TO BUILD + +### Overview +Build a schema generation and SERP enhancement system that: +1. Generates correct JSON-LD structured data for 10 schema types, mapped to content type/structure +2. Injects 8 on-page SERP elements into `content_html` to improve rich snippet eligibility +3. Validates schema against Google Rich Results requirements +4. Retroactively enhances existing published content with missing schema and SERP elements + +### 2.1 JSON-LD Schema Types (10 Types) + +Each schema type maps to specific `content_type` + `content_structure` combinations: + +| # | Schema Type | Applies To | Key Fields | +|---|------------|-----------|------------| +| 1 | **Article / BlogPosting** | `post` (all structures) | headline, datePublished, dateModified, author (Person/Organization), publisher, image, description, mainEntityOfPage, wordCount, articleSection | +| 2 | **Product** | `product` / `product_page` | name, description, image, brand, offers (price, priceCurrency, availability, url), aggregateRating, review, sku, gtin | +| 3 | **Service** | `page` / `service_page` | name, description, provider (Organization), serviceType, areaServed, hasOfferCatalog, offers | +| 4 | **LocalBusiness** | Sites with physical location (site-level config) | name, address, telephone, openingHours, geo, image, priceRange, sameAs, hasMap | +| 5 | **Organization** | Site-wide (homepage schema) | name, url, logo, sameAs[], contactPoint, foundingDate, founders | +| 6 | **BreadcrumbList** | All pages | itemListElement [{position, name, item(URL)}] — auto-generated from SAG hierarchy or WP breadcrumb trail | +| 7 | **FAQPage** | Content with FAQ sections (auto-detected from H2/H3 question patterns) | mainEntity [{@type: Question, name, acceptedAnswer: {text}}] | +| 8 | **HowTo** | Step-by-step content (detected from ordered lists with process indicators) | name, step [{@type: HowToStep, name, text, image, url}], totalTime, estimatedCost | +| 9 | **VideoObject** | Content with video embeds (02I integration) | name, description, thumbnailUrl, uploadDate, duration, contentUrl, embedUrl | +| 10 | **WebSite + SearchAction** | Site-wide (homepage) | name, url, potentialAction (SearchAction with query-input) | + +**Auto-Detection Rules:** +- FAQPage: detected when content has H2/H3 headings matching question patterns (starts with "What", "How", "Why", "When", "Is", "Can", "Does", "Should") or explicit `` blocks +- HowTo: detected when content has ordered lists (``) combined with process language ("Step 1", "First", "Next", etc.) +- VideoObject: detected when `` or `` tags present, or when 02I VideoProject is linked to content +- BreadcrumbList: always generated — uses SAG hierarchy (Site → Sector → Cluster → Content) or WordPress breadcrumb trail from SiteIntegration sync + +**Schema Stacking:** A single content piece can have multiple schemas. An article with FAQ and video gets Article + FAQPage + VideoObject + BreadcrumbList — all in a single `