Files
igny8/v2/V2-Execution-Docs/02D-linker-internal.md
IGNY8 VPS (Salman) 0570052fec 1
2026-03-23 17:20:51 +00:00

736 lines
27 KiB
Markdown

# IGNY8 Phase 2: Internal Linker (02D)
## SAG-Based Internal Linking Engine
**Document Version:** 1.0
**Date:** 2026-03-23
**Phase:** IGNY8 Phase 2 — Feature Expansion
**Status:** Build Ready
**Source of Truth:** Codebase at `/data/app/igny8/`
**Audience:** Claude Code, Backend Developers, Architects
---
## 1. CURRENT STATE
### Internal Linking Today
There is **no** internal linking system in IGNY8. Content is generated and published without any cross-linking strategy. Links within content are only those the AI incidentally includes during generation.
### What Exists
- `Content` model (app_label=`writer`, db_table=`igny8_content`) — stores `content_html` where links would be inserted
- `SAGCluster` and `SAGBlueprint` models (from 01A) — provide the cluster hierarchy for link topology
- The 7-stage automation pipeline (01E) generates and publishes content but has no linking stage between generation and publish
- `SiteIntegration` model (app_label=`integration`) tracks WordPress connections
### What Does Not Exist
- No SAGLink model, no LinkMap model, no SAGLinkAudit model
- No link scoring algorithm
- No anchor text management
- No link density enforcement
- No link insertion into content_html
- No orphan page detection
- No link health monitoring
- No link audit system
### Foundation Available
- `SAGBlueprint` (01A) — defines the SAG hierarchy (site → sectors → clusters → content)
- `SAGCluster` (01A) — cluster_type, hub_page_type, hub_page_structure
- `SAGAttribute` (01A) — attribute values shared across clusters (basis for cross-cluster linking)
- 01E pipeline — post-generation hook point available between Stage 4 (Content) and Stage 7 (Publish)
- `Content.content_type` and `Content.content_structure` — determines link density rules
- 02B `ContentTaxonomy` with cluster mapping — taxonomy-to-cluster relationships for taxonomy contextual links
---
## 2. WHAT TO BUILD
### Overview
Build a SAG-aware internal linking engine that automatically plans, scores, and inserts internal links into content. The system operates in two modes: new content mode (pipeline integration) and existing content remediation (audit + fix).
### 2.1 Seven Link Types
| # | Link Type | Direction | Description | Limit | Placement |
|---|-----------|-----------|-------------|-------|-----------|
| 1 | **Vertical Upward** | Supporting → Hub | MANDATORY: every supporting article links to its cluster hub | 1 per article | First 2 paragraphs |
| 2 | **Vertical Downward** | Hub → Supporting | Hub lists ALL its supporting articles | No cap | "Related Articles" section + contextual body links |
| 3 | **Horizontal Sibling** | Supporting ↔ Supporting | Same-cluster articles linking to each other | Max 2 per article | Natural content overlap points |
| 4 | **Cross-Cluster** | Hub ↔ Hub | Hubs sharing a SAGAttribute value can cross-link | Max 2 per hub | Contextual body links |
| 5 | **Taxonomy Contextual** | Term Page → Hubs | Term pages link to ALL cluster hubs using that attribute | No cap | Auto-generated from 02B taxonomy-cluster mapping |
| 6 | **Breadcrumb** | Hierarchical | Home → Sector → [Attribute] → Hub → Current Page | 1 chain per page | Top of page (auto-generated from SAG hierarchy) |
| 7 | **Related Content** | Cross-cluster allowed | 2-3 links in "Related Reading" section at end of article | 2-3 per article | End of article section |
**Link Density Rules (outbound per page type, by word count):**
| Page Type | <1000 words | 1000-2000 words | 2000+ words |
|-----------|------------|-----------------|-------------|
| Hub (`cluster_hub`) | 5-10 | 10-15 | 15-20 |
| Blog (article/guide/etc.) | 2-5 | 3-8 | 4-12 |
| Product/Service | 2-3 | 3-5 | 3-5 |
| Term Page (taxonomy) | 3+ | 3+ | unlimited |
### 2.2 Link Scoring Algorithm (5 Factors)
Each candidate link target receives a score (0-100):
| Factor | Weight | Description |
|--------|--------|-------------|
| Shared attribute values | 40% | Count of SAGAttribute values shared between source and target clusters |
| Target page authority | 25% | Inbound link count of target page (from LinkMap) |
| Keyword overlap | 20% | Common keywords between source cluster and target content |
| Content recency | 10% | Newer content gets a boost (exponential decay over 6 months) |
| Link count gap | 5% | Pages with fewest inbound links get a priority boost |
**Threshold:** Score ≥ 60 qualifies for automatic linking. Scores 40-59 are suggested for manual review.
### 2.3 Anchor Text Rules
| Rule | Value |
|------|-------|
| Min length | 2 words |
| Max length | 8 words |
| Grammatically natural | Must read naturally in surrounding sentence |
| No exact-match overuse | Same exact anchor cannot be used >3 times to same target URL |
| Anchor distribution per target | Primary keyword 60%, page title 30%, natural phrase 10% |
| Diversification audit | Flag if any single anchor accounts for >40% of links to a target |
**Anchor Types:**
- `primary_keyword` — cluster primary keyword
- `page_title` — target content's title (or shortened version)
- `natural` — AI-selected contextually appropriate phrase
- `branded` — brand/site name (for homepage links)
### 2.4 Two Operating Modes
#### A. New Content Mode (Pipeline Integration)
Runs after Stage 4 (content generated), before Stage 7 (publish):
1. Content generated by pipeline → link planning triggers
2. Calculate link targets using scoring algorithm
3. Insert links into `content_html` at natural positions
4. Store link plan in SAGLink records
5. If content is a hub → auto-generate "Related Articles" section with links to all supporting articles in cluster
6. **Mandatory check:** if content is a supporting article, verify vertical_up link to hub exists; insert if missing
#### B. Existing Content Remediation (Audit + Fix)
For already-published content without proper internal linking:
1. **Crawl phase:** Scan all published content for a site, extract all `<a>` tags, build LinkMap
2. **Audit analysis:**
- Orphan pages: 0 inbound internal links
- Over-linked pages: outbound > density max for page type/word count
- Under-linked pages: outbound < density min
- Missing mandatory links: supporting articles without hub uplink
- Broken links: target URL returns 4xx/5xx
3. **Recommendation generation:** Priority-scored fix recommendations with AI-suggested anchor text
4. **Batch application:** Insert missing links across multiple content records
### 2.5 Cluster-Level Link Health Score
Per-cluster health score (0-100) for link coverage:
| Factor | Points |
|--------|--------|
| Hub published and linked (has outbound + inbound links) | 25 |
| All supporting articles have mandatory uplink to hub | 25 |
| At least 1 cross-cluster link from hub | 15 |
| Term pages link to hub | 15 |
| No broken links in cluster | 10 |
| Link density within range for all pages | 10 |
Site-wide link health = average of all cluster scores. Feeds into SAG health monitoring (01G).
---
## 3. DATA MODELS & APIS
### 3.1 New Models
#### SAGLink (new `linker` app)
```python
class SAGLink(SiteSectorBaseModel):
"""
Represents a planned or inserted internal link between two content pages.
Tracks link type, anchor text, score, and status through lifecycle.
"""
blueprint = models.ForeignKey(
'planner.SAGBlueprint',
on_delete=models.SET_NULL,
null=True,
blank=True,
related_name='sag_links'
)
source_content = models.ForeignKey(
'writer.Content',
on_delete=models.CASCADE,
related_name='outbound_sag_links'
)
target_content = models.ForeignKey(
'writer.Content',
on_delete=models.CASCADE,
related_name='inbound_sag_links'
)
link_type = models.CharField(
max_length=20,
choices=[
('vertical_up', 'Vertical Upward'),
('vertical_down', 'Vertical Downward'),
('horizontal', 'Horizontal Sibling'),
('cross_cluster', 'Cross-Cluster'),
('taxonomy', 'Taxonomy Contextual'),
('breadcrumb', 'Breadcrumb'),
('related', 'Related Content'),
]
)
anchor_text = models.CharField(max_length=200)
anchor_type = models.CharField(
max_length=20,
choices=[
('primary_keyword', 'Primary Keyword'),
('page_title', 'Page Title'),
('natural', 'Natural Phrase'),
('branded', 'Branded'),
]
)
placement_zone = models.CharField(
max_length=20,
choices=[
('in_body', 'In Body'),
('related_section', 'Related Section'),
('breadcrumb', 'Breadcrumb'),
('sidebar', 'Sidebar'),
]
)
placement_position = models.IntegerField(
null=True,
blank=True,
help_text='Paragraph number for in_body placement'
)
score = models.FloatField(
default=0,
help_text='Link scoring algorithm result (0-100)'
)
status = models.CharField(
max_length=15,
choices=[
('planned', 'Planned'),
('inserted', 'Inserted'),
('verified', 'Verified'),
('broken', 'Broken'),
('removed', 'Removed'),
],
default='planned'
)
is_mandatory = models.BooleanField(
default=False,
help_text='True for vertical_up links (supporting → hub)'
)
inserted_at = models.DateTimeField(null=True, blank=True)
class Meta:
app_label = 'linker'
db_table = 'igny8_sag_links'
```
**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel
#### SAGLinkAudit (linker app)
```python
class SAGLinkAudit(SiteSectorBaseModel):
"""
Stores results of a site-wide or cluster-level link audit.
"""
blueprint = models.ForeignKey(
'planner.SAGBlueprint',
on_delete=models.SET_NULL,
null=True,
blank=True,
related_name='link_audits'
)
audit_date = models.DateTimeField(auto_now_add=True)
total_links = models.IntegerField(default=0)
missing_mandatory = models.IntegerField(default=0)
orphan_pages = models.IntegerField(default=0)
broken_links = models.IntegerField(default=0)
over_linked_pages = models.IntegerField(default=0)
under_linked_pages = models.IntegerField(default=0)
cluster_scores = models.JSONField(
default=dict,
help_text='{cluster_id: {score, missing, issues[]}}'
)
recommendations = models.JSONField(
default=list,
help_text='[{content_id, action, link_type, target_id, anchor_suggestion, priority}]'
)
overall_health_score = models.FloatField(
default=0,
help_text='Average of cluster scores (0-100)'
)
class Meta:
app_label = 'linker'
db_table = 'igny8_sag_link_audits'
```
**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel
#### LinkMap (linker app)
```python
class LinkMap(SiteSectorBaseModel):
"""
Full link map of all internal (and external) links found in published content.
Built by crawling content_html of all published content records.
"""
source_url = models.URLField()
source_content = models.ForeignKey(
'writer.Content',
on_delete=models.SET_NULL,
null=True,
blank=True,
related_name='outbound_link_map'
)
target_url = models.URLField()
target_content = models.ForeignKey(
'writer.Content',
on_delete=models.SET_NULL,
null=True,
blank=True,
related_name='inbound_link_map'
)
anchor_text = models.CharField(max_length=500)
is_internal = models.BooleanField(default=True)
is_follow = models.BooleanField(default=True)
position = models.CharField(
max_length=20,
choices=[
('in_content', 'In Content'),
('navigation', 'Navigation'),
('footer', 'Footer'),
('sidebar', 'Sidebar'),
],
default='in_content'
)
last_verified = models.DateTimeField(null=True, blank=True)
status = models.CharField(
max_length=15,
choices=[
('active', 'Active'),
('broken', 'Broken'),
('removed', 'Removed'),
],
default='active'
)
class Meta:
app_label = 'linker'
db_table = 'igny8_link_map'
```
**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel
### 3.2 Modified Models
#### Content (writer app) — add 4 fields
```python
# Add to Content model:
link_plan = models.JSONField(
null=True,
blank=True,
help_text='Planned links before insertion: [{target_id, link_type, anchor, score}]'
)
links_inserted = models.BooleanField(
default=False,
help_text='Whether link plan has been applied to content_html'
)
inbound_link_count = models.IntegerField(
default=0,
help_text='Cached count of inbound internal links'
)
outbound_link_count = models.IntegerField(
default=0,
help_text='Cached count of outbound internal links'
)
```
### 3.3 New App Registration
Create linker app:
- **App config:** `igny8_core/modules/linker/apps.py` with `app_label = 'linker'`
- **Add to INSTALLED_APPS** in `igny8_core/settings.py`
### 3.4 Migration
```
igny8_core/migrations/XXXX_add_linker_models.py
```
**Operations:**
1. `CreateModel('SAGLink', ...)` — with indexes on source_content, target_content, link_type, status
2. `CreateModel('SAGLinkAudit', ...)`
3. `CreateModel('LinkMap', ...)` — with index on source_url, target_url
4. `AddField('Content', 'link_plan', JSONField(null=True, blank=True))`
5. `AddField('Content', 'links_inserted', BooleanField(default=False))`
6. `AddField('Content', 'inbound_link_count', IntegerField(default=0))`
7. `AddField('Content', 'outbound_link_count', IntegerField(default=0))`
### 3.5 API Endpoints
All endpoints under `/api/v1/linker/`:
#### Link Management
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/linker/links/?site_id=X` | List all SAGLink records with filters (link_type, status, cluster_id, source_content_id) |
| POST | `/api/v1/linker/links/plan/` | Generate link plan for a content piece. Body: `{content_id}`. Returns planned SAGLink records. |
| POST | `/api/v1/linker/links/insert/` | Insert planned links into content_html. Body: `{content_id}`. Modifies Content.content_html. |
| POST | `/api/v1/linker/links/batch-insert/` | Batch insert for multiple content. Body: `{content_ids: [int]}`. Queues Celery task. |
| GET | `/api/v1/linker/content/{id}/links/` | All inbound + outbound links for a specific content piece. |
#### Link Audit
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/linker/audit/?site_id=X` | Latest SAGLinkAudit results. |
| POST | `/api/v1/linker/audit/run/` | Trigger site-wide link audit. Body: `{site_id}`. Queues Celery task. Returns task ID. |
| GET | `/api/v1/linker/audit/recommendations/?site_id=X` | Get fix recommendations from latest audit. |
| POST | `/api/v1/linker/audit/apply/` | Apply recommended fixes in batch. Body: `{site_id, recommendation_ids: [int]}`. |
#### Link Map & Health
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/linker/link-map/?site_id=X` | Full LinkMap for site with pagination. |
| GET | `/api/v1/linker/orphans/?site_id=X` | List orphan pages (0 inbound internal links). |
| GET | `/api/v1/linker/health/?site_id=X` | Cluster-level link health scores. |
**Permissions:** All endpoints use `SiteSectorModelViewSet` permission patterns.
### 3.6 Link Planning Service
**Location:** `igny8_core/business/link_planning.py`
```python
class LinkPlanningService:
"""
Generates internal link plans for content based on SAG hierarchy
and scoring algorithm.
"""
SCORE_WEIGHTS = {
'shared_attributes': 0.40,
'target_authority': 0.25,
'keyword_overlap': 0.20,
'content_recency': 0.10,
'link_count_gap': 0.05,
}
AUTO_LINK_THRESHOLD = 60
REVIEW_THRESHOLD = 40
def plan(self, content_id):
"""
Generate link plan for a content piece.
1. Identify content's cluster and role (hub vs supporting)
2. Determine mandatory links (vertical_up for supporting)
3. Score all candidate targets
4. Select targets within density limits
5. Generate anchor text per link
6. Create SAGLink records with status='planned'
Returns list of planned SAGLink records.
"""
pass
def _get_mandatory_links(self, content, cluster):
"""Vertical upward: supporting → hub. Always added."""
pass
def _get_candidates(self, content, cluster, blueprint):
"""Gather all potential link targets from cluster and related clusters."""
pass
def _score_candidate(self, source_content, target_content, source_cluster,
target_cluster, blueprint):
"""Calculate 0-100 score using 5-factor algorithm."""
pass
def _select_within_density(self, content, scored_candidates):
"""Filter candidates to stay within density limits for page type and word count."""
pass
def _generate_anchor_text(self, source_content, target_content, link_type):
"""AI-generate contextually appropriate anchor text."""
pass
```
### 3.7 Link Insertion Service
**Location:** `igny8_core/business/link_insertion.py`
```python
class LinkInsertionService:
"""
Inserts planned links into content_html.
Handles placement, anchor text insertion, and collision avoidance.
"""
def insert(self, content_id):
"""
Insert all planned SAGLink records into Content.content_html.
1. Load all SAGLinks where source_content=content_id, status='planned'
2. Parse content_html
3. For each link, find insertion point based on placement_zone + position
4. Insert <a> tag with anchor text
5. Update SAGLink status='inserted', set inserted_at
6. Update Content.content_html, links_inserted=True, outbound_link_count
7. Update target Content.inbound_link_count
"""
pass
def _find_insertion_point(self, html_tree, link):
"""
Find best insertion point in parsed HTML:
- in_body: find paragraph at placement_position, find natural spot for anchor
- related_section: append to "Related Articles" section (create if missing)
- breadcrumb: insert breadcrumb trail at top
"""
pass
def _insert_link(self, html_tree, position, anchor_text, target_url):
"""Insert <a href> tag at position without breaking existing HTML."""
pass
```
### 3.8 Link Audit Service
**Location:** `igny8_core/business/link_audit.py`
```python
class LinkAuditService:
"""
Runs site-wide link audits: builds link map, identifies issues,
generates recommendations.
"""
def run_audit(self, site_id):
"""
Full audit:
1. Crawl all published Content for site
2. Extract all <a> tags, build/update LinkMap records
3. Identify orphan pages, over/under-linked, missing mandatory, broken
4. Calculate per-cluster health scores
5. Generate prioritized recommendations
6. Create SAGLinkAudit record
Returns SAGLinkAudit instance.
"""
pass
def _build_link_map(self, site_id):
"""Extract links from all published content_html, create LinkMap records."""
pass
def _find_orphans(self, site_id):
"""Content with 0 inbound internal links."""
pass
def _check_density(self, site_id):
"""Compare outbound counts against density rules per page type."""
pass
def _check_mandatory(self, site_id):
"""Verify all supporting articles have vertical_up link to their hub."""
pass
def _calculate_cluster_health(self, site_id, cluster):
"""Calculate 0-100 health score per cluster."""
pass
def _generate_recommendations(self, issues):
"""Priority-scored recommendations with AI-suggested anchor text."""
pass
```
### 3.9 Celery Tasks
**Location:** `igny8_core/tasks/linker_tasks.py`
```python
@shared_task(name='generate_link_plan')
def generate_link_plan(content_id):
"""Runs after content generation, before publish. Creates SAGLink records."""
pass
@shared_task(name='run_link_audit')
def run_link_audit(site_id):
"""Scheduled weekly or triggered manually. Full site-wide audit."""
pass
@shared_task(name='verify_links')
def verify_links(site_id):
"""Check for broken links via HTTP status checks on LinkMap URLs."""
pass
@shared_task(name='rebuild_link_map')
def rebuild_link_map(site_id):
"""Full crawl of published content to rebuild LinkMap from scratch."""
pass
```
**Beat Schedule Additions:**
| Task | Schedule | Notes |
|------|----------|-------|
| `run_link_audit` | Weekly (Sunday 1:00 AM) | Site-wide audit for all active sites |
| `verify_links` | Weekly (Wednesday 2:00 AM) | HTTP check all active LinkMap entries |
---
## 4. IMPLEMENTATION STEPS
### Step 1: Create Linker App
1. Create `igny8_core/modules/linker/` directory with `__init__.py` and `apps.py`
2. Add `linker` to `INSTALLED_APPS` in settings.py
3. Create models: SAGLink, SAGLinkAudit, LinkMap
### Step 2: Migration
1. Create migration for 3 new models
2. Add 4 new fields to Content model (link_plan, links_inserted, inbound_link_count, outbound_link_count)
3. Run migration
### Step 3: Services
1. Implement `LinkPlanningService` in `igny8_core/business/link_planning.py`
2. Implement `LinkInsertionService` in `igny8_core/business/link_insertion.py`
3. Implement `LinkAuditService` in `igny8_core/business/link_audit.py`
### Step 4: Pipeline Integration
Insert link planning + insertion between Stage 4 and Stage 7:
```python
# After content generation completes in pipeline:
def post_content_generation(content_id):
# 02G: Generate schema + SERP elements
# ...
# 02D: Plan and insert internal links
link_service = LinkPlanningService()
link_service.plan(content_id)
insertion_service = LinkInsertionService()
insertion_service.insert(content_id)
```
### Step 5: API Endpoints
1. Create `igny8_core/urls/linker.py` with link, audit, and health endpoints
2. Create views extending `SiteSectorModelViewSet`
3. Register URL patterns under `/api/v1/linker/`
### Step 6: Celery Tasks
1. Implement all 4 tasks in `igny8_core/tasks/linker_tasks.py`
2. Add `run_link_audit` and `verify_links` to Celery beat schedule
### Step 7: Serializers & Admin
1. Create DRF serializers for SAGLink, SAGLinkAudit, LinkMap
2. Register models in Django admin
### Step 8: Credit Cost Configuration
Add to `CreditCostConfig` (billing app):
| operation_type | default_cost | description |
|---------------|-------------|-------------|
| `link_audit` | 1 | Site-wide link audit |
| `link_generation` | 0.5 | Generate 1-5 links with AI anchor text |
| `link_audit_full` | 3-5 | Full site audit with recommendations |
---
## 5. ACCEPTANCE CRITERIA
### Link Types
- [ ] Vertical upward link (supporting → hub) automatically inserted for all supporting articles
- [ ] Vertical downward links (hub → supporting) generated with "Related Articles" section
- [ ] Horizontal sibling links (max 2) between same-cluster supporting articles
- [ ] Cross-cluster links (max 2) between hubs sharing SAGAttribute values
- [ ] Taxonomy contextual links from term pages to all relevant cluster hubs
- [ ] Breadcrumb chain generated from SAG hierarchy for all content
- [ ] Related content section (2-3 links) generated at end of article
### Link Scoring
- [ ] 5-factor scoring algorithm produces 0-100 scores
- [ ] Links with score ≥ 60 auto-inserted
- [ ] Links with score 40-59 suggested for manual review
- [ ] Score algorithm uses: shared attributes (40%), authority (25%), keyword overlap (20%), recency (10%), gap boost (5%)
### Anchor Text
- [ ] Anchor text 2-8 words, grammatically natural
- [ ] Same exact anchor not used >3 times to same target
- [ ] Distribution per target: 60% primary keyword, 30% page title, 10% natural
- [ ] Diversification audit flags if any anchor >40% of links to a target
### Link Density
- [ ] Hub pages: 5-20 outbound links based on word count
- [ ] Blog pages: 2-12 outbound links based on word count
- [ ] Product/Service pages: 2-5 outbound links
- [ ] Term pages: 3+ outbound, unlimited for taxonomy contextual
### Audit & Remediation
- [ ] Link audit identifies orphan pages, over/under-linked, missing mandatory, broken links
- [ ] Cluster-level health score (0-100) calculated per cluster
- [ ] Recommendations generated with priority scores and AI-suggested anchors
- [ ] Batch application of recommendations modifies content_html correctly
### Pipeline Integration
- [ ] Link plan generated automatically after content generation in pipeline
- [ ] Links inserted before publish stage
- [ ] Mandatory vertical_up link verified before allowing publish
- [ ] Content.inbound_link_count and outbound_link_count updated on insert
---
## 6. CLAUDE CODE INSTRUCTIONS
### File Locations
```
igny8_core/
├── modules/
│ └── linker/
│ ├── __init__.py
│ ├── apps.py # app_label = 'linker'
│ └── models.py # SAGLink, SAGLinkAudit, LinkMap
├── business/
│ ├── link_planning.py # LinkPlanningService
│ ├── link_insertion.py # LinkInsertionService
│ └── link_audit.py # LinkAuditService
├── tasks/
│ └── linker_tasks.py # Celery tasks
├── urls/
│ └── linker.py # Linker endpoints
└── migrations/
└── XXXX_add_linker_models.py
```
### Conventions
- **PKs:** BigAutoField (integer) — do NOT use UUIDs
- **Table prefix:** `igny8_` on all new tables
- **App label:** `linker` (new app)
- **Celery app name:** `igny8_core`
- **URL pattern:** `/api/v1/linker/...`
- **Permissions:** Use `SiteSectorModelViewSet` permission pattern
- **Model inheritance:** SAGLink and SAGLinkAudit extend `SiteSectorBaseModel`; LinkMap extends `SiteSectorBaseModel`
- **Frontend:** `.tsx` files with Zustand stores for state management
### Cross-References
| Doc | Relationship |
|-----|-------------|
| **01A** | SAGBlueprint/SAGCluster/SAGAttribute provide hierarchy and cross-cluster relationships |
| **01E** | Pipeline integration — link planning hooks after Stage 4, before Stage 7 |
| **01G** | SAG health monitoring incorporates cluster link health scores |
| **02B** | ContentTaxonomy cluster mapping enables taxonomy contextual links |
| **02E** | External backlinks complement internal links; authority distributed by internal links |
| **02F** | Optimizer identifies internal link opportunities and feeds to linker |
| **03A** | WP plugin standalone mode has its own internal linking module — separate from this |
| **03C** | Theme renders breadcrumbs and related content sections generated by linker |
### Key Decisions
1. **New `linker` app** — Separate app because linking is a distinct domain with its own models, not tightly coupled to writer or planner
2. **SAGLink stores planned AND inserted** — Single model tracks the full lifecycle from planning through insertion to verification
3. **LinkMap is separate from SAGLink** — LinkMap stores the actual crawled link state (including non-SAG links); SAGLink stores the planned/managed links
4. **Cached counts on Content**`inbound_link_count` and `outbound_link_count` are denormalized for fast queries; updated on insert/removal
5. **HTML parsing for insertion** — Use Python HTML parser (BeautifulSoup or lxml) for safe link insertion without corrupting content_html