1
This commit is contained in:
735
v2/V2-Execution-Docs/02D-linker-internal.md
Normal file
735
v2/V2-Execution-Docs/02D-linker-internal.md
Normal file
@@ -0,0 +1,735 @@
|
||||
# IGNY8 Phase 2: Internal Linker (02D)
|
||||
## SAG-Based Internal Linking Engine
|
||||
|
||||
**Document Version:** 1.0
|
||||
**Date:** 2026-03-23
|
||||
**Phase:** IGNY8 Phase 2 — Feature Expansion
|
||||
**Status:** Build Ready
|
||||
**Source of Truth:** Codebase at `/data/app/igny8/`
|
||||
**Audience:** Claude Code, Backend Developers, Architects
|
||||
|
||||
---
|
||||
|
||||
## 1. CURRENT STATE
|
||||
|
||||
### Internal Linking Today
|
||||
There is **no** internal linking system in IGNY8. Content is generated and published without any cross-linking strategy. Links within content are only those the AI incidentally includes during generation.
|
||||
|
||||
### What Exists
|
||||
- `Content` model (app_label=`writer`, db_table=`igny8_content`) — stores `content_html` where links would be inserted
|
||||
- `SAGCluster` and `SAGBlueprint` models (from 01A) — provide the cluster hierarchy for link topology
|
||||
- The 7-stage automation pipeline (01E) generates and publishes content but has no linking stage between generation and publish
|
||||
- `SiteIntegration` model (app_label=`integration`) tracks WordPress connections
|
||||
|
||||
### What Does Not Exist
|
||||
- No SAGLink model, no LinkMap model, no SAGLinkAudit model
|
||||
- No link scoring algorithm
|
||||
- No anchor text management
|
||||
- No link density enforcement
|
||||
- No link insertion into content_html
|
||||
- No orphan page detection
|
||||
- No link health monitoring
|
||||
- No link audit system
|
||||
|
||||
### Foundation Available
|
||||
- `SAGBlueprint` (01A) — defines the SAG hierarchy (site → sectors → clusters → content)
|
||||
- `SAGCluster` (01A) — cluster_type, hub_page_type, hub_page_structure
|
||||
- `SAGAttribute` (01A) — attribute values shared across clusters (basis for cross-cluster linking)
|
||||
- 01E pipeline — post-generation hook point available between Stage 4 (Content) and Stage 7 (Publish)
|
||||
- `Content.content_type` and `Content.content_structure` — determines link density rules
|
||||
- 02B `ContentTaxonomy` with cluster mapping — taxonomy-to-cluster relationships for taxonomy contextual links
|
||||
|
||||
---
|
||||
|
||||
## 2. WHAT TO BUILD
|
||||
|
||||
### Overview
|
||||
Build a SAG-aware internal linking engine that automatically plans, scores, and inserts internal links into content. The system operates in two modes: new content mode (pipeline integration) and existing content remediation (audit + fix).
|
||||
|
||||
### 2.1 Seven Link Types
|
||||
|
||||
| # | Link Type | Direction | Description | Limit | Placement |
|
||||
|---|-----------|-----------|-------------|-------|-----------|
|
||||
| 1 | **Vertical Upward** | Supporting → Hub | MANDATORY: every supporting article links to its cluster hub | 1 per article | First 2 paragraphs |
|
||||
| 2 | **Vertical Downward** | Hub → Supporting | Hub lists ALL its supporting articles | No cap | "Related Articles" section + contextual body links |
|
||||
| 3 | **Horizontal Sibling** | Supporting ↔ Supporting | Same-cluster articles linking to each other | Max 2 per article | Natural content overlap points |
|
||||
| 4 | **Cross-Cluster** | Hub ↔ Hub | Hubs sharing a SAGAttribute value can cross-link | Max 2 per hub | Contextual body links |
|
||||
| 5 | **Taxonomy Contextual** | Term Page → Hubs | Term pages link to ALL cluster hubs using that attribute | No cap | Auto-generated from 02B taxonomy-cluster mapping |
|
||||
| 6 | **Breadcrumb** | Hierarchical | Home → Sector → [Attribute] → Hub → Current Page | 1 chain per page | Top of page (auto-generated from SAG hierarchy) |
|
||||
| 7 | **Related Content** | Cross-cluster allowed | 2-3 links in "Related Reading" section at end of article | 2-3 per article | End of article section |
|
||||
|
||||
**Link Density Rules (outbound per page type, by word count):**
|
||||
|
||||
| Page Type | <1000 words | 1000-2000 words | 2000+ words |
|
||||
|-----------|------------|-----------------|-------------|
|
||||
| Hub (`cluster_hub`) | 5-10 | 10-15 | 15-20 |
|
||||
| Blog (article/guide/etc.) | 2-5 | 3-8 | 4-12 |
|
||||
| Product/Service | 2-3 | 3-5 | 3-5 |
|
||||
| Term Page (taxonomy) | 3+ | 3+ | unlimited |
|
||||
|
||||
### 2.2 Link Scoring Algorithm (5 Factors)
|
||||
|
||||
Each candidate link target receives a score (0-100):
|
||||
|
||||
| Factor | Weight | Description |
|
||||
|--------|--------|-------------|
|
||||
| Shared attribute values | 40% | Count of SAGAttribute values shared between source and target clusters |
|
||||
| Target page authority | 25% | Inbound link count of target page (from LinkMap) |
|
||||
| Keyword overlap | 20% | Common keywords between source cluster and target content |
|
||||
| Content recency | 10% | Newer content gets a boost (exponential decay over 6 months) |
|
||||
| Link count gap | 5% | Pages with fewest inbound links get a priority boost |
|
||||
|
||||
**Threshold:** Score ≥ 60 qualifies for automatic linking. Scores 40-59 are suggested for manual review.
|
||||
|
||||
### 2.3 Anchor Text Rules
|
||||
|
||||
| Rule | Value |
|
||||
|------|-------|
|
||||
| Min length | 2 words |
|
||||
| Max length | 8 words |
|
||||
| Grammatically natural | Must read naturally in surrounding sentence |
|
||||
| No exact-match overuse | Same exact anchor cannot be used >3 times to same target URL |
|
||||
| Anchor distribution per target | Primary keyword 60%, page title 30%, natural phrase 10% |
|
||||
| Diversification audit | Flag if any single anchor accounts for >40% of links to a target |
|
||||
|
||||
**Anchor Types:**
|
||||
- `primary_keyword` — cluster primary keyword
|
||||
- `page_title` — target content's title (or shortened version)
|
||||
- `natural` — AI-selected contextually appropriate phrase
|
||||
- `branded` — brand/site name (for homepage links)
|
||||
|
||||
### 2.4 Two Operating Modes
|
||||
|
||||
#### A. New Content Mode (Pipeline Integration)
|
||||
Runs after Stage 4 (content generated), before Stage 7 (publish):
|
||||
|
||||
1. Content generated by pipeline → link planning triggers
|
||||
2. Calculate link targets using scoring algorithm
|
||||
3. Insert links into `content_html` at natural positions
|
||||
4. Store link plan in SAGLink records
|
||||
5. If content is a hub → auto-generate "Related Articles" section with links to all supporting articles in cluster
|
||||
6. **Mandatory check:** if content is a supporting article, verify vertical_up link to hub exists; insert if missing
|
||||
|
||||
#### B. Existing Content Remediation (Audit + Fix)
|
||||
For already-published content without proper internal linking:
|
||||
|
||||
1. **Crawl phase:** Scan all published content for a site, extract all `<a>` tags, build LinkMap
|
||||
2. **Audit analysis:**
|
||||
- Orphan pages: 0 inbound internal links
|
||||
- Over-linked pages: outbound > density max for page type/word count
|
||||
- Under-linked pages: outbound < density min
|
||||
- Missing mandatory links: supporting articles without hub uplink
|
||||
- Broken links: target URL returns 4xx/5xx
|
||||
3. **Recommendation generation:** Priority-scored fix recommendations with AI-suggested anchor text
|
||||
4. **Batch application:** Insert missing links across multiple content records
|
||||
|
||||
### 2.5 Cluster-Level Link Health Score
|
||||
|
||||
Per-cluster health score (0-100) for link coverage:
|
||||
|
||||
| Factor | Points |
|
||||
|--------|--------|
|
||||
| Hub published and linked (has outbound + inbound links) | 25 |
|
||||
| All supporting articles have mandatory uplink to hub | 25 |
|
||||
| At least 1 cross-cluster link from hub | 15 |
|
||||
| Term pages link to hub | 15 |
|
||||
| No broken links in cluster | 10 |
|
||||
| Link density within range for all pages | 10 |
|
||||
|
||||
Site-wide link health = average of all cluster scores. Feeds into SAG health monitoring (01G).
|
||||
|
||||
---
|
||||
|
||||
## 3. DATA MODELS & APIS
|
||||
|
||||
### 3.1 New Models
|
||||
|
||||
#### SAGLink (new `linker` app)
|
||||
|
||||
```python
|
||||
class SAGLink(SiteSectorBaseModel):
|
||||
"""
|
||||
Represents a planned or inserted internal link between two content pages.
|
||||
Tracks link type, anchor text, score, and status through lifecycle.
|
||||
"""
|
||||
blueprint = models.ForeignKey(
|
||||
'planner.SAGBlueprint',
|
||||
on_delete=models.SET_NULL,
|
||||
null=True,
|
||||
blank=True,
|
||||
related_name='sag_links'
|
||||
)
|
||||
source_content = models.ForeignKey(
|
||||
'writer.Content',
|
||||
on_delete=models.CASCADE,
|
||||
related_name='outbound_sag_links'
|
||||
)
|
||||
target_content = models.ForeignKey(
|
||||
'writer.Content',
|
||||
on_delete=models.CASCADE,
|
||||
related_name='inbound_sag_links'
|
||||
)
|
||||
link_type = models.CharField(
|
||||
max_length=20,
|
||||
choices=[
|
||||
('vertical_up', 'Vertical Upward'),
|
||||
('vertical_down', 'Vertical Downward'),
|
||||
('horizontal', 'Horizontal Sibling'),
|
||||
('cross_cluster', 'Cross-Cluster'),
|
||||
('taxonomy', 'Taxonomy Contextual'),
|
||||
('breadcrumb', 'Breadcrumb'),
|
||||
('related', 'Related Content'),
|
||||
]
|
||||
)
|
||||
anchor_text = models.CharField(max_length=200)
|
||||
anchor_type = models.CharField(
|
||||
max_length=20,
|
||||
choices=[
|
||||
('primary_keyword', 'Primary Keyword'),
|
||||
('page_title', 'Page Title'),
|
||||
('natural', 'Natural Phrase'),
|
||||
('branded', 'Branded'),
|
||||
]
|
||||
)
|
||||
placement_zone = models.CharField(
|
||||
max_length=20,
|
||||
choices=[
|
||||
('in_body', 'In Body'),
|
||||
('related_section', 'Related Section'),
|
||||
('breadcrumb', 'Breadcrumb'),
|
||||
('sidebar', 'Sidebar'),
|
||||
]
|
||||
)
|
||||
placement_position = models.IntegerField(
|
||||
null=True,
|
||||
blank=True,
|
||||
help_text='Paragraph number for in_body placement'
|
||||
)
|
||||
score = models.FloatField(
|
||||
default=0,
|
||||
help_text='Link scoring algorithm result (0-100)'
|
||||
)
|
||||
status = models.CharField(
|
||||
max_length=15,
|
||||
choices=[
|
||||
('planned', 'Planned'),
|
||||
('inserted', 'Inserted'),
|
||||
('verified', 'Verified'),
|
||||
('broken', 'Broken'),
|
||||
('removed', 'Removed'),
|
||||
],
|
||||
default='planned'
|
||||
)
|
||||
is_mandatory = models.BooleanField(
|
||||
default=False,
|
||||
help_text='True for vertical_up links (supporting → hub)'
|
||||
)
|
||||
inserted_at = models.DateTimeField(null=True, blank=True)
|
||||
|
||||
class Meta:
|
||||
app_label = 'linker'
|
||||
db_table = 'igny8_sag_links'
|
||||
```
|
||||
|
||||
**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel
|
||||
|
||||
#### SAGLinkAudit (linker app)
|
||||
|
||||
```python
|
||||
class SAGLinkAudit(SiteSectorBaseModel):
|
||||
"""
|
||||
Stores results of a site-wide or cluster-level link audit.
|
||||
"""
|
||||
blueprint = models.ForeignKey(
|
||||
'planner.SAGBlueprint',
|
||||
on_delete=models.SET_NULL,
|
||||
null=True,
|
||||
blank=True,
|
||||
related_name='link_audits'
|
||||
)
|
||||
audit_date = models.DateTimeField(auto_now_add=True)
|
||||
total_links = models.IntegerField(default=0)
|
||||
missing_mandatory = models.IntegerField(default=0)
|
||||
orphan_pages = models.IntegerField(default=0)
|
||||
broken_links = models.IntegerField(default=0)
|
||||
over_linked_pages = models.IntegerField(default=0)
|
||||
under_linked_pages = models.IntegerField(default=0)
|
||||
cluster_scores = models.JSONField(
|
||||
default=dict,
|
||||
help_text='{cluster_id: {score, missing, issues[]}}'
|
||||
)
|
||||
recommendations = models.JSONField(
|
||||
default=list,
|
||||
help_text='[{content_id, action, link_type, target_id, anchor_suggestion, priority}]'
|
||||
)
|
||||
overall_health_score = models.FloatField(
|
||||
default=0,
|
||||
help_text='Average of cluster scores (0-100)'
|
||||
)
|
||||
|
||||
class Meta:
|
||||
app_label = 'linker'
|
||||
db_table = 'igny8_sag_link_audits'
|
||||
```
|
||||
|
||||
**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel
|
||||
|
||||
#### LinkMap (linker app)
|
||||
|
||||
```python
|
||||
class LinkMap(SiteSectorBaseModel):
|
||||
"""
|
||||
Full link map of all internal (and external) links found in published content.
|
||||
Built by crawling content_html of all published content records.
|
||||
"""
|
||||
source_url = models.URLField()
|
||||
source_content = models.ForeignKey(
|
||||
'writer.Content',
|
||||
on_delete=models.SET_NULL,
|
||||
null=True,
|
||||
blank=True,
|
||||
related_name='outbound_link_map'
|
||||
)
|
||||
target_url = models.URLField()
|
||||
target_content = models.ForeignKey(
|
||||
'writer.Content',
|
||||
on_delete=models.SET_NULL,
|
||||
null=True,
|
||||
blank=True,
|
||||
related_name='inbound_link_map'
|
||||
)
|
||||
anchor_text = models.CharField(max_length=500)
|
||||
is_internal = models.BooleanField(default=True)
|
||||
is_follow = models.BooleanField(default=True)
|
||||
position = models.CharField(
|
||||
max_length=20,
|
||||
choices=[
|
||||
('in_content', 'In Content'),
|
||||
('navigation', 'Navigation'),
|
||||
('footer', 'Footer'),
|
||||
('sidebar', 'Sidebar'),
|
||||
],
|
||||
default='in_content'
|
||||
)
|
||||
last_verified = models.DateTimeField(null=True, blank=True)
|
||||
status = models.CharField(
|
||||
max_length=15,
|
||||
choices=[
|
||||
('active', 'Active'),
|
||||
('broken', 'Broken'),
|
||||
('removed', 'Removed'),
|
||||
],
|
||||
default='active'
|
||||
)
|
||||
|
||||
class Meta:
|
||||
app_label = 'linker'
|
||||
db_table = 'igny8_link_map'
|
||||
```
|
||||
|
||||
**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel
|
||||
|
||||
### 3.2 Modified Models
|
||||
|
||||
#### Content (writer app) — add 4 fields
|
||||
|
||||
```python
|
||||
# Add to Content model:
|
||||
link_plan = models.JSONField(
|
||||
null=True,
|
||||
blank=True,
|
||||
help_text='Planned links before insertion: [{target_id, link_type, anchor, score}]'
|
||||
)
|
||||
links_inserted = models.BooleanField(
|
||||
default=False,
|
||||
help_text='Whether link plan has been applied to content_html'
|
||||
)
|
||||
inbound_link_count = models.IntegerField(
|
||||
default=0,
|
||||
help_text='Cached count of inbound internal links'
|
||||
)
|
||||
outbound_link_count = models.IntegerField(
|
||||
default=0,
|
||||
help_text='Cached count of outbound internal links'
|
||||
)
|
||||
```
|
||||
|
||||
### 3.3 New App Registration
|
||||
|
||||
Create linker app:
|
||||
- **App config:** `igny8_core/modules/linker/apps.py` with `app_label = 'linker'`
|
||||
- **Add to INSTALLED_APPS** in `igny8_core/settings.py`
|
||||
|
||||
### 3.4 Migration
|
||||
|
||||
```
|
||||
igny8_core/migrations/XXXX_add_linker_models.py
|
||||
```
|
||||
|
||||
**Operations:**
|
||||
1. `CreateModel('SAGLink', ...)` — with indexes on source_content, target_content, link_type, status
|
||||
2. `CreateModel('SAGLinkAudit', ...)`
|
||||
3. `CreateModel('LinkMap', ...)` — with index on source_url, target_url
|
||||
4. `AddField('Content', 'link_plan', JSONField(null=True, blank=True))`
|
||||
5. `AddField('Content', 'links_inserted', BooleanField(default=False))`
|
||||
6. `AddField('Content', 'inbound_link_count', IntegerField(default=0))`
|
||||
7. `AddField('Content', 'outbound_link_count', IntegerField(default=0))`
|
||||
|
||||
### 3.5 API Endpoints
|
||||
|
||||
All endpoints under `/api/v1/linker/`:
|
||||
|
||||
#### Link Management
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/v1/linker/links/?site_id=X` | List all SAGLink records with filters (link_type, status, cluster_id, source_content_id) |
|
||||
| POST | `/api/v1/linker/links/plan/` | Generate link plan for a content piece. Body: `{content_id}`. Returns planned SAGLink records. |
|
||||
| POST | `/api/v1/linker/links/insert/` | Insert planned links into content_html. Body: `{content_id}`. Modifies Content.content_html. |
|
||||
| POST | `/api/v1/linker/links/batch-insert/` | Batch insert for multiple content. Body: `{content_ids: [int]}`. Queues Celery task. |
|
||||
| GET | `/api/v1/linker/content/{id}/links/` | All inbound + outbound links for a specific content piece. |
|
||||
|
||||
#### Link Audit
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/v1/linker/audit/?site_id=X` | Latest SAGLinkAudit results. |
|
||||
| POST | `/api/v1/linker/audit/run/` | Trigger site-wide link audit. Body: `{site_id}`. Queues Celery task. Returns task ID. |
|
||||
| GET | `/api/v1/linker/audit/recommendations/?site_id=X` | Get fix recommendations from latest audit. |
|
||||
| POST | `/api/v1/linker/audit/apply/` | Apply recommended fixes in batch. Body: `{site_id, recommendation_ids: [int]}`. |
|
||||
|
||||
#### Link Map & Health
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/v1/linker/link-map/?site_id=X` | Full LinkMap for site with pagination. |
|
||||
| GET | `/api/v1/linker/orphans/?site_id=X` | List orphan pages (0 inbound internal links). |
|
||||
| GET | `/api/v1/linker/health/?site_id=X` | Cluster-level link health scores. |
|
||||
|
||||
**Permissions:** All endpoints use `SiteSectorModelViewSet` permission patterns.
|
||||
|
||||
### 3.6 Link Planning Service
|
||||
|
||||
**Location:** `igny8_core/business/link_planning.py`
|
||||
|
||||
```python
|
||||
class LinkPlanningService:
|
||||
"""
|
||||
Generates internal link plans for content based on SAG hierarchy
|
||||
and scoring algorithm.
|
||||
"""
|
||||
|
||||
SCORE_WEIGHTS = {
|
||||
'shared_attributes': 0.40,
|
||||
'target_authority': 0.25,
|
||||
'keyword_overlap': 0.20,
|
||||
'content_recency': 0.10,
|
||||
'link_count_gap': 0.05,
|
||||
}
|
||||
|
||||
AUTO_LINK_THRESHOLD = 60
|
||||
REVIEW_THRESHOLD = 40
|
||||
|
||||
def plan(self, content_id):
|
||||
"""
|
||||
Generate link plan for a content piece.
|
||||
1. Identify content's cluster and role (hub vs supporting)
|
||||
2. Determine mandatory links (vertical_up for supporting)
|
||||
3. Score all candidate targets
|
||||
4. Select targets within density limits
|
||||
5. Generate anchor text per link
|
||||
6. Create SAGLink records with status='planned'
|
||||
Returns list of planned SAGLink records.
|
||||
"""
|
||||
pass
|
||||
|
||||
def _get_mandatory_links(self, content, cluster):
|
||||
"""Vertical upward: supporting → hub. Always added."""
|
||||
pass
|
||||
|
||||
def _get_candidates(self, content, cluster, blueprint):
|
||||
"""Gather all potential link targets from cluster and related clusters."""
|
||||
pass
|
||||
|
||||
def _score_candidate(self, source_content, target_content, source_cluster,
|
||||
target_cluster, blueprint):
|
||||
"""Calculate 0-100 score using 5-factor algorithm."""
|
||||
pass
|
||||
|
||||
def _select_within_density(self, content, scored_candidates):
|
||||
"""Filter candidates to stay within density limits for page type and word count."""
|
||||
pass
|
||||
|
||||
def _generate_anchor_text(self, source_content, target_content, link_type):
|
||||
"""AI-generate contextually appropriate anchor text."""
|
||||
pass
|
||||
```
|
||||
|
||||
### 3.7 Link Insertion Service
|
||||
|
||||
**Location:** `igny8_core/business/link_insertion.py`
|
||||
|
||||
```python
|
||||
class LinkInsertionService:
|
||||
"""
|
||||
Inserts planned links into content_html.
|
||||
Handles placement, anchor text insertion, and collision avoidance.
|
||||
"""
|
||||
|
||||
def insert(self, content_id):
|
||||
"""
|
||||
Insert all planned SAGLink records into Content.content_html.
|
||||
1. Load all SAGLinks where source_content=content_id, status='planned'
|
||||
2. Parse content_html
|
||||
3. For each link, find insertion point based on placement_zone + position
|
||||
4. Insert <a> tag with anchor text
|
||||
5. Update SAGLink status='inserted', set inserted_at
|
||||
6. Update Content.content_html, links_inserted=True, outbound_link_count
|
||||
7. Update target Content.inbound_link_count
|
||||
"""
|
||||
pass
|
||||
|
||||
def _find_insertion_point(self, html_tree, link):
|
||||
"""
|
||||
Find best insertion point in parsed HTML:
|
||||
- in_body: find paragraph at placement_position, find natural spot for anchor
|
||||
- related_section: append to "Related Articles" section (create if missing)
|
||||
- breadcrumb: insert breadcrumb trail at top
|
||||
"""
|
||||
pass
|
||||
|
||||
def _insert_link(self, html_tree, position, anchor_text, target_url):
|
||||
"""Insert <a href> tag at position without breaking existing HTML."""
|
||||
pass
|
||||
```
|
||||
|
||||
### 3.8 Link Audit Service
|
||||
|
||||
**Location:** `igny8_core/business/link_audit.py`
|
||||
|
||||
```python
|
||||
class LinkAuditService:
|
||||
"""
|
||||
Runs site-wide link audits: builds link map, identifies issues,
|
||||
generates recommendations.
|
||||
"""
|
||||
|
||||
def run_audit(self, site_id):
|
||||
"""
|
||||
Full audit:
|
||||
1. Crawl all published Content for site
|
||||
2. Extract all <a> tags, build/update LinkMap records
|
||||
3. Identify orphan pages, over/under-linked, missing mandatory, broken
|
||||
4. Calculate per-cluster health scores
|
||||
5. Generate prioritized recommendations
|
||||
6. Create SAGLinkAudit record
|
||||
Returns SAGLinkAudit instance.
|
||||
"""
|
||||
pass
|
||||
|
||||
def _build_link_map(self, site_id):
|
||||
"""Extract links from all published content_html, create LinkMap records."""
|
||||
pass
|
||||
|
||||
def _find_orphans(self, site_id):
|
||||
"""Content with 0 inbound internal links."""
|
||||
pass
|
||||
|
||||
def _check_density(self, site_id):
|
||||
"""Compare outbound counts against density rules per page type."""
|
||||
pass
|
||||
|
||||
def _check_mandatory(self, site_id):
|
||||
"""Verify all supporting articles have vertical_up link to their hub."""
|
||||
pass
|
||||
|
||||
def _calculate_cluster_health(self, site_id, cluster):
|
||||
"""Calculate 0-100 health score per cluster."""
|
||||
pass
|
||||
|
||||
def _generate_recommendations(self, issues):
|
||||
"""Priority-scored recommendations with AI-suggested anchor text."""
|
||||
pass
|
||||
```
|
||||
|
||||
### 3.9 Celery Tasks
|
||||
|
||||
**Location:** `igny8_core/tasks/linker_tasks.py`
|
||||
|
||||
```python
|
||||
@shared_task(name='generate_link_plan')
|
||||
def generate_link_plan(content_id):
|
||||
"""Runs after content generation, before publish. Creates SAGLink records."""
|
||||
pass
|
||||
|
||||
@shared_task(name='run_link_audit')
|
||||
def run_link_audit(site_id):
|
||||
"""Scheduled weekly or triggered manually. Full site-wide audit."""
|
||||
pass
|
||||
|
||||
@shared_task(name='verify_links')
|
||||
def verify_links(site_id):
|
||||
"""Check for broken links via HTTP status checks on LinkMap URLs."""
|
||||
pass
|
||||
|
||||
@shared_task(name='rebuild_link_map')
|
||||
def rebuild_link_map(site_id):
|
||||
"""Full crawl of published content to rebuild LinkMap from scratch."""
|
||||
pass
|
||||
```
|
||||
|
||||
**Beat Schedule Additions:**
|
||||
|
||||
| Task | Schedule | Notes |
|
||||
|------|----------|-------|
|
||||
| `run_link_audit` | Weekly (Sunday 1:00 AM) | Site-wide audit for all active sites |
|
||||
| `verify_links` | Weekly (Wednesday 2:00 AM) | HTTP check all active LinkMap entries |
|
||||
|
||||
---
|
||||
|
||||
## 4. IMPLEMENTATION STEPS
|
||||
|
||||
### Step 1: Create Linker App
|
||||
1. Create `igny8_core/modules/linker/` directory with `__init__.py` and `apps.py`
|
||||
2. Add `linker` to `INSTALLED_APPS` in settings.py
|
||||
3. Create models: SAGLink, SAGLinkAudit, LinkMap
|
||||
|
||||
### Step 2: Migration
|
||||
1. Create migration for 3 new models
|
||||
2. Add 4 new fields to Content model (link_plan, links_inserted, inbound_link_count, outbound_link_count)
|
||||
3. Run migration
|
||||
|
||||
### Step 3: Services
|
||||
1. Implement `LinkPlanningService` in `igny8_core/business/link_planning.py`
|
||||
2. Implement `LinkInsertionService` in `igny8_core/business/link_insertion.py`
|
||||
3. Implement `LinkAuditService` in `igny8_core/business/link_audit.py`
|
||||
|
||||
### Step 4: Pipeline Integration
|
||||
Insert link planning + insertion between Stage 4 and Stage 7:
|
||||
|
||||
```python
|
||||
# After content generation completes in pipeline:
|
||||
def post_content_generation(content_id):
|
||||
# 02G: Generate schema + SERP elements
|
||||
# ...
|
||||
# 02D: Plan and insert internal links
|
||||
link_service = LinkPlanningService()
|
||||
link_service.plan(content_id)
|
||||
insertion_service = LinkInsertionService()
|
||||
insertion_service.insert(content_id)
|
||||
```
|
||||
|
||||
### Step 5: API Endpoints
|
||||
1. Create `igny8_core/urls/linker.py` with link, audit, and health endpoints
|
||||
2. Create views extending `SiteSectorModelViewSet`
|
||||
3. Register URL patterns under `/api/v1/linker/`
|
||||
|
||||
### Step 6: Celery Tasks
|
||||
1. Implement all 4 tasks in `igny8_core/tasks/linker_tasks.py`
|
||||
2. Add `run_link_audit` and `verify_links` to Celery beat schedule
|
||||
|
||||
### Step 7: Serializers & Admin
|
||||
1. Create DRF serializers for SAGLink, SAGLinkAudit, LinkMap
|
||||
2. Register models in Django admin
|
||||
|
||||
### Step 8: Credit Cost Configuration
|
||||
Add to `CreditCostConfig` (billing app):
|
||||
|
||||
| operation_type | default_cost | description |
|
||||
|---------------|-------------|-------------|
|
||||
| `link_audit` | 1 | Site-wide link audit |
|
||||
| `link_generation` | 0.5 | Generate 1-5 links with AI anchor text |
|
||||
| `link_audit_full` | 3-5 | Full site audit with recommendations |
|
||||
|
||||
---
|
||||
|
||||
## 5. ACCEPTANCE CRITERIA
|
||||
|
||||
### Link Types
|
||||
- [ ] Vertical upward link (supporting → hub) automatically inserted for all supporting articles
|
||||
- [ ] Vertical downward links (hub → supporting) generated with "Related Articles" section
|
||||
- [ ] Horizontal sibling links (max 2) between same-cluster supporting articles
|
||||
- [ ] Cross-cluster links (max 2) between hubs sharing SAGAttribute values
|
||||
- [ ] Taxonomy contextual links from term pages to all relevant cluster hubs
|
||||
- [ ] Breadcrumb chain generated from SAG hierarchy for all content
|
||||
- [ ] Related content section (2-3 links) generated at end of article
|
||||
|
||||
### Link Scoring
|
||||
- [ ] 5-factor scoring algorithm produces 0-100 scores
|
||||
- [ ] Links with score ≥ 60 auto-inserted
|
||||
- [ ] Links with score 40-59 suggested for manual review
|
||||
- [ ] Score algorithm uses: shared attributes (40%), authority (25%), keyword overlap (20%), recency (10%), gap boost (5%)
|
||||
|
||||
### Anchor Text
|
||||
- [ ] Anchor text 2-8 words, grammatically natural
|
||||
- [ ] Same exact anchor not used >3 times to same target
|
||||
- [ ] Distribution per target: 60% primary keyword, 30% page title, 10% natural
|
||||
- [ ] Diversification audit flags if any anchor >40% of links to a target
|
||||
|
||||
### Link Density
|
||||
- [ ] Hub pages: 5-20 outbound links based on word count
|
||||
- [ ] Blog pages: 2-12 outbound links based on word count
|
||||
- [ ] Product/Service pages: 2-5 outbound links
|
||||
- [ ] Term pages: 3+ outbound, unlimited for taxonomy contextual
|
||||
|
||||
### Audit & Remediation
|
||||
- [ ] Link audit identifies orphan pages, over/under-linked, missing mandatory, broken links
|
||||
- [ ] Cluster-level health score (0-100) calculated per cluster
|
||||
- [ ] Recommendations generated with priority scores and AI-suggested anchors
|
||||
- [ ] Batch application of recommendations modifies content_html correctly
|
||||
|
||||
### Pipeline Integration
|
||||
- [ ] Link plan generated automatically after content generation in pipeline
|
||||
- [ ] Links inserted before publish stage
|
||||
- [ ] Mandatory vertical_up link verified before allowing publish
|
||||
- [ ] Content.inbound_link_count and outbound_link_count updated on insert
|
||||
|
||||
---
|
||||
|
||||
## 6. CLAUDE CODE INSTRUCTIONS
|
||||
|
||||
### File Locations
|
||||
```
|
||||
igny8_core/
|
||||
├── modules/
|
||||
│ └── linker/
|
||||
│ ├── __init__.py
|
||||
│ ├── apps.py # app_label = 'linker'
|
||||
│ └── models.py # SAGLink, SAGLinkAudit, LinkMap
|
||||
├── business/
|
||||
│ ├── link_planning.py # LinkPlanningService
|
||||
│ ├── link_insertion.py # LinkInsertionService
|
||||
│ └── link_audit.py # LinkAuditService
|
||||
├── tasks/
|
||||
│ └── linker_tasks.py # Celery tasks
|
||||
├── urls/
|
||||
│ └── linker.py # Linker endpoints
|
||||
└── migrations/
|
||||
└── XXXX_add_linker_models.py
|
||||
```
|
||||
|
||||
### Conventions
|
||||
- **PKs:** BigAutoField (integer) — do NOT use UUIDs
|
||||
- **Table prefix:** `igny8_` on all new tables
|
||||
- **App label:** `linker` (new app)
|
||||
- **Celery app name:** `igny8_core`
|
||||
- **URL pattern:** `/api/v1/linker/...`
|
||||
- **Permissions:** Use `SiteSectorModelViewSet` permission pattern
|
||||
- **Model inheritance:** SAGLink and SAGLinkAudit extend `SiteSectorBaseModel`; LinkMap extends `SiteSectorBaseModel`
|
||||
- **Frontend:** `.tsx` files with Zustand stores for state management
|
||||
|
||||
### Cross-References
|
||||
| Doc | Relationship |
|
||||
|-----|-------------|
|
||||
| **01A** | SAGBlueprint/SAGCluster/SAGAttribute provide hierarchy and cross-cluster relationships |
|
||||
| **01E** | Pipeline integration — link planning hooks after Stage 4, before Stage 7 |
|
||||
| **01G** | SAG health monitoring incorporates cluster link health scores |
|
||||
| **02B** | ContentTaxonomy cluster mapping enables taxonomy contextual links |
|
||||
| **02E** | External backlinks complement internal links; authority distributed by internal links |
|
||||
| **02F** | Optimizer identifies internal link opportunities and feeds to linker |
|
||||
| **03A** | WP plugin standalone mode has its own internal linking module — separate from this |
|
||||
| **03C** | Theme renders breadcrumbs and related content sections generated by linker |
|
||||
|
||||
### Key Decisions
|
||||
1. **New `linker` app** — Separate app because linking is a distinct domain with its own models, not tightly coupled to writer or planner
|
||||
2. **SAGLink stores planned AND inserted** — Single model tracks the full lifecycle from planning through insertion to verification
|
||||
3. **LinkMap is separate from SAGLink** — LinkMap stores the actual crawled link state (including non-SAG links); SAGLink stores the planned/managed links
|
||||
4. **Cached counts on Content** — `inbound_link_count` and `outbound_link_count` are denormalized for fast queries; updated on insert/removal
|
||||
5. **HTML parsing for insertion** — Use Python HTML parser (BeautifulSoup or lxml) for safe link insertion without corrupting content_html
|
||||
Reference in New Issue
Block a user