736 lines
27 KiB
Markdown
736 lines
27 KiB
Markdown
# IGNY8 Phase 2: Internal Linker (02D)
|
|
## SAG-Based Internal Linking Engine
|
|
|
|
**Document Version:** 1.0
|
|
**Date:** 2026-03-23
|
|
**Phase:** IGNY8 Phase 2 — Feature Expansion
|
|
**Status:** Build Ready
|
|
**Source of Truth:** Codebase at `/data/app/igny8/`
|
|
**Audience:** Claude Code, Backend Developers, Architects
|
|
|
|
---
|
|
|
|
## 1. CURRENT STATE
|
|
|
|
### Internal Linking Today
|
|
There is **no** internal linking system in IGNY8. Content is generated and published without any cross-linking strategy. Links within content are only those the AI incidentally includes during generation.
|
|
|
|
### What Exists
|
|
- `Content` model (app_label=`writer`, db_table=`igny8_content`) — stores `content_html` where links would be inserted
|
|
- `SAGCluster` and `SAGBlueprint` models (from 01A) — provide the cluster hierarchy for link topology
|
|
- The 7-stage automation pipeline (01E) generates and publishes content but has no linking stage between generation and publish
|
|
- `SiteIntegration` model (app_label=`integration`) tracks WordPress connections
|
|
|
|
### What Does Not Exist
|
|
- No SAGLink model, no LinkMap model, no SAGLinkAudit model
|
|
- No link scoring algorithm
|
|
- No anchor text management
|
|
- No link density enforcement
|
|
- No link insertion into content_html
|
|
- No orphan page detection
|
|
- No link health monitoring
|
|
- No link audit system
|
|
|
|
### Foundation Available
|
|
- `SAGBlueprint` (01A) — defines the SAG hierarchy (site → sectors → clusters → content)
|
|
- `SAGCluster` (01A) — cluster_type, hub_page_type, hub_page_structure
|
|
- `SAGAttribute` (01A) — attribute values shared across clusters (basis for cross-cluster linking)
|
|
- 01E pipeline — post-generation hook point available between Stage 4 (Content) and Stage 7 (Publish)
|
|
- `Content.content_type` and `Content.content_structure` — determines link density rules
|
|
- 02B `ContentTaxonomy` with cluster mapping — taxonomy-to-cluster relationships for taxonomy contextual links
|
|
|
|
---
|
|
|
|
## 2. WHAT TO BUILD
|
|
|
|
### Overview
|
|
Build a SAG-aware internal linking engine that automatically plans, scores, and inserts internal links into content. The system operates in two modes: new content mode (pipeline integration) and existing content remediation (audit + fix).
|
|
|
|
### 2.1 Seven Link Types
|
|
|
|
| # | Link Type | Direction | Description | Limit | Placement |
|
|
|---|-----------|-----------|-------------|-------|-----------|
|
|
| 1 | **Vertical Upward** | Supporting → Hub | MANDATORY: every supporting article links to its cluster hub | 1 per article | First 2 paragraphs |
|
|
| 2 | **Vertical Downward** | Hub → Supporting | Hub lists ALL its supporting articles | No cap | "Related Articles" section + contextual body links |
|
|
| 3 | **Horizontal Sibling** | Supporting ↔ Supporting | Same-cluster articles linking to each other | Max 2 per article | Natural content overlap points |
|
|
| 4 | **Cross-Cluster** | Hub ↔ Hub | Hubs sharing a SAGAttribute value can cross-link | Max 2 per hub | Contextual body links |
|
|
| 5 | **Taxonomy Contextual** | Term Page → Hubs | Term pages link to ALL cluster hubs using that attribute | No cap | Auto-generated from 02B taxonomy-cluster mapping |
|
|
| 6 | **Breadcrumb** | Hierarchical | Home → Sector → [Attribute] → Hub → Current Page | 1 chain per page | Top of page (auto-generated from SAG hierarchy) |
|
|
| 7 | **Related Content** | Cross-cluster allowed | 2-3 links in "Related Reading" section at end of article | 2-3 per article | End of article section |
|
|
|
|
**Link Density Rules (outbound per page type, by word count):**
|
|
|
|
| Page Type | <1000 words | 1000-2000 words | 2000+ words |
|
|
|-----------|------------|-----------------|-------------|
|
|
| Hub (`cluster_hub`) | 5-10 | 10-15 | 15-20 |
|
|
| Blog (article/guide/etc.) | 2-5 | 3-8 | 4-12 |
|
|
| Product/Service | 2-3 | 3-5 | 3-5 |
|
|
| Term Page (taxonomy) | 3+ | 3+ | unlimited |
|
|
|
|
### 2.2 Link Scoring Algorithm (5 Factors)
|
|
|
|
Each candidate link target receives a score (0-100):
|
|
|
|
| Factor | Weight | Description |
|
|
|--------|--------|-------------|
|
|
| Shared attribute values | 40% | Count of SAGAttribute values shared between source and target clusters |
|
|
| Target page authority | 25% | Inbound link count of target page (from LinkMap) |
|
|
| Keyword overlap | 20% | Common keywords between source cluster and target content |
|
|
| Content recency | 10% | Newer content gets a boost (exponential decay over 6 months) |
|
|
| Link count gap | 5% | Pages with fewest inbound links get a priority boost |
|
|
|
|
**Threshold:** Score ≥ 60 qualifies for automatic linking. Scores 40-59 are suggested for manual review.
|
|
|
|
### 2.3 Anchor Text Rules
|
|
|
|
| Rule | Value |
|
|
|------|-------|
|
|
| Min length | 2 words |
|
|
| Max length | 8 words |
|
|
| Grammatically natural | Must read naturally in surrounding sentence |
|
|
| No exact-match overuse | Same exact anchor cannot be used >3 times to same target URL |
|
|
| Anchor distribution per target | Primary keyword 60%, page title 30%, natural phrase 10% |
|
|
| Diversification audit | Flag if any single anchor accounts for >40% of links to a target |
|
|
|
|
**Anchor Types:**
|
|
- `primary_keyword` — cluster primary keyword
|
|
- `page_title` — target content's title (or shortened version)
|
|
- `natural` — AI-selected contextually appropriate phrase
|
|
- `branded` — brand/site name (for homepage links)
|
|
|
|
### 2.4 Two Operating Modes
|
|
|
|
#### A. New Content Mode (Pipeline Integration)
|
|
Runs after Stage 4 (content generated), before Stage 7 (publish):
|
|
|
|
1. Content generated by pipeline → link planning triggers
|
|
2. Calculate link targets using scoring algorithm
|
|
3. Insert links into `content_html` at natural positions
|
|
4. Store link plan in SAGLink records
|
|
5. If content is a hub → auto-generate "Related Articles" section with links to all supporting articles in cluster
|
|
6. **Mandatory check:** if content is a supporting article, verify vertical_up link to hub exists; insert if missing
|
|
|
|
#### B. Existing Content Remediation (Audit + Fix)
|
|
For already-published content without proper internal linking:
|
|
|
|
1. **Crawl phase:** Scan all published content for a site, extract all `<a>` tags, build LinkMap
|
|
2. **Audit analysis:**
|
|
- Orphan pages: 0 inbound internal links
|
|
- Over-linked pages: outbound > density max for page type/word count
|
|
- Under-linked pages: outbound < density min
|
|
- Missing mandatory links: supporting articles without hub uplink
|
|
- Broken links: target URL returns 4xx/5xx
|
|
3. **Recommendation generation:** Priority-scored fix recommendations with AI-suggested anchor text
|
|
4. **Batch application:** Insert missing links across multiple content records
|
|
|
|
### 2.5 Cluster-Level Link Health Score
|
|
|
|
Per-cluster health score (0-100) for link coverage:
|
|
|
|
| Factor | Points |
|
|
|--------|--------|
|
|
| Hub published and linked (has outbound + inbound links) | 25 |
|
|
| All supporting articles have mandatory uplink to hub | 25 |
|
|
| At least 1 cross-cluster link from hub | 15 |
|
|
| Term pages link to hub | 15 |
|
|
| No broken links in cluster | 10 |
|
|
| Link density within range for all pages | 10 |
|
|
|
|
Site-wide link health = average of all cluster scores. Feeds into SAG health monitoring (01G).
|
|
|
|
---
|
|
|
|
## 3. DATA MODELS & APIS
|
|
|
|
### 3.1 New Models
|
|
|
|
#### SAGLink (new `linker` app)
|
|
|
|
```python
|
|
class SAGLink(SiteSectorBaseModel):
|
|
"""
|
|
Represents a planned or inserted internal link between two content pages.
|
|
Tracks link type, anchor text, score, and status through lifecycle.
|
|
"""
|
|
blueprint = models.ForeignKey(
|
|
'planner.SAGBlueprint',
|
|
on_delete=models.SET_NULL,
|
|
null=True,
|
|
blank=True,
|
|
related_name='sag_links'
|
|
)
|
|
source_content = models.ForeignKey(
|
|
'writer.Content',
|
|
on_delete=models.CASCADE,
|
|
related_name='outbound_sag_links'
|
|
)
|
|
target_content = models.ForeignKey(
|
|
'writer.Content',
|
|
on_delete=models.CASCADE,
|
|
related_name='inbound_sag_links'
|
|
)
|
|
link_type = models.CharField(
|
|
max_length=20,
|
|
choices=[
|
|
('vertical_up', 'Vertical Upward'),
|
|
('vertical_down', 'Vertical Downward'),
|
|
('horizontal', 'Horizontal Sibling'),
|
|
('cross_cluster', 'Cross-Cluster'),
|
|
('taxonomy', 'Taxonomy Contextual'),
|
|
('breadcrumb', 'Breadcrumb'),
|
|
('related', 'Related Content'),
|
|
]
|
|
)
|
|
anchor_text = models.CharField(max_length=200)
|
|
anchor_type = models.CharField(
|
|
max_length=20,
|
|
choices=[
|
|
('primary_keyword', 'Primary Keyword'),
|
|
('page_title', 'Page Title'),
|
|
('natural', 'Natural Phrase'),
|
|
('branded', 'Branded'),
|
|
]
|
|
)
|
|
placement_zone = models.CharField(
|
|
max_length=20,
|
|
choices=[
|
|
('in_body', 'In Body'),
|
|
('related_section', 'Related Section'),
|
|
('breadcrumb', 'Breadcrumb'),
|
|
('sidebar', 'Sidebar'),
|
|
]
|
|
)
|
|
placement_position = models.IntegerField(
|
|
null=True,
|
|
blank=True,
|
|
help_text='Paragraph number for in_body placement'
|
|
)
|
|
score = models.FloatField(
|
|
default=0,
|
|
help_text='Link scoring algorithm result (0-100)'
|
|
)
|
|
status = models.CharField(
|
|
max_length=15,
|
|
choices=[
|
|
('planned', 'Planned'),
|
|
('inserted', 'Inserted'),
|
|
('verified', 'Verified'),
|
|
('broken', 'Broken'),
|
|
('removed', 'Removed'),
|
|
],
|
|
default='planned'
|
|
)
|
|
is_mandatory = models.BooleanField(
|
|
default=False,
|
|
help_text='True for vertical_up links (supporting → hub)'
|
|
)
|
|
inserted_at = models.DateTimeField(null=True, blank=True)
|
|
|
|
class Meta:
|
|
app_label = 'linker'
|
|
db_table = 'igny8_sag_links'
|
|
```
|
|
|
|
**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel
|
|
|
|
#### SAGLinkAudit (linker app)
|
|
|
|
```python
|
|
class SAGLinkAudit(SiteSectorBaseModel):
|
|
"""
|
|
Stores results of a site-wide or cluster-level link audit.
|
|
"""
|
|
blueprint = models.ForeignKey(
|
|
'planner.SAGBlueprint',
|
|
on_delete=models.SET_NULL,
|
|
null=True,
|
|
blank=True,
|
|
related_name='link_audits'
|
|
)
|
|
audit_date = models.DateTimeField(auto_now_add=True)
|
|
total_links = models.IntegerField(default=0)
|
|
missing_mandatory = models.IntegerField(default=0)
|
|
orphan_pages = models.IntegerField(default=0)
|
|
broken_links = models.IntegerField(default=0)
|
|
over_linked_pages = models.IntegerField(default=0)
|
|
under_linked_pages = models.IntegerField(default=0)
|
|
cluster_scores = models.JSONField(
|
|
default=dict,
|
|
help_text='{cluster_id: {score, missing, issues[]}}'
|
|
)
|
|
recommendations = models.JSONField(
|
|
default=list,
|
|
help_text='[{content_id, action, link_type, target_id, anchor_suggestion, priority}]'
|
|
)
|
|
overall_health_score = models.FloatField(
|
|
default=0,
|
|
help_text='Average of cluster scores (0-100)'
|
|
)
|
|
|
|
class Meta:
|
|
app_label = 'linker'
|
|
db_table = 'igny8_sag_link_audits'
|
|
```
|
|
|
|
**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel
|
|
|
|
#### LinkMap (linker app)
|
|
|
|
```python
|
|
class LinkMap(SiteSectorBaseModel):
|
|
"""
|
|
Full link map of all internal (and external) links found in published content.
|
|
Built by crawling content_html of all published content records.
|
|
"""
|
|
source_url = models.URLField()
|
|
source_content = models.ForeignKey(
|
|
'writer.Content',
|
|
on_delete=models.SET_NULL,
|
|
null=True,
|
|
blank=True,
|
|
related_name='outbound_link_map'
|
|
)
|
|
target_url = models.URLField()
|
|
target_content = models.ForeignKey(
|
|
'writer.Content',
|
|
on_delete=models.SET_NULL,
|
|
null=True,
|
|
blank=True,
|
|
related_name='inbound_link_map'
|
|
)
|
|
anchor_text = models.CharField(max_length=500)
|
|
is_internal = models.BooleanField(default=True)
|
|
is_follow = models.BooleanField(default=True)
|
|
position = models.CharField(
|
|
max_length=20,
|
|
choices=[
|
|
('in_content', 'In Content'),
|
|
('navigation', 'Navigation'),
|
|
('footer', 'Footer'),
|
|
('sidebar', 'Sidebar'),
|
|
],
|
|
default='in_content'
|
|
)
|
|
last_verified = models.DateTimeField(null=True, blank=True)
|
|
status = models.CharField(
|
|
max_length=15,
|
|
choices=[
|
|
('active', 'Active'),
|
|
('broken', 'Broken'),
|
|
('removed', 'Removed'),
|
|
],
|
|
default='active'
|
|
)
|
|
|
|
class Meta:
|
|
app_label = 'linker'
|
|
db_table = 'igny8_link_map'
|
|
```
|
|
|
|
**PK:** BigAutoField (integer) — inherits from SiteSectorBaseModel
|
|
|
|
### 3.2 Modified Models
|
|
|
|
#### Content (writer app) — add 4 fields
|
|
|
|
```python
|
|
# Add to Content model:
|
|
link_plan = models.JSONField(
|
|
null=True,
|
|
blank=True,
|
|
help_text='Planned links before insertion: [{target_id, link_type, anchor, score}]'
|
|
)
|
|
links_inserted = models.BooleanField(
|
|
default=False,
|
|
help_text='Whether link plan has been applied to content_html'
|
|
)
|
|
inbound_link_count = models.IntegerField(
|
|
default=0,
|
|
help_text='Cached count of inbound internal links'
|
|
)
|
|
outbound_link_count = models.IntegerField(
|
|
default=0,
|
|
help_text='Cached count of outbound internal links'
|
|
)
|
|
```
|
|
|
|
### 3.3 New App Registration
|
|
|
|
Create linker app:
|
|
- **App config:** `igny8_core/modules/linker/apps.py` with `app_label = 'linker'`
|
|
- **Add to INSTALLED_APPS** in `igny8_core/settings.py`
|
|
|
|
### 3.4 Migration
|
|
|
|
```
|
|
igny8_core/migrations/XXXX_add_linker_models.py
|
|
```
|
|
|
|
**Operations:**
|
|
1. `CreateModel('SAGLink', ...)` — with indexes on source_content, target_content, link_type, status
|
|
2. `CreateModel('SAGLinkAudit', ...)`
|
|
3. `CreateModel('LinkMap', ...)` — with index on source_url, target_url
|
|
4. `AddField('Content', 'link_plan', JSONField(null=True, blank=True))`
|
|
5. `AddField('Content', 'links_inserted', BooleanField(default=False))`
|
|
6. `AddField('Content', 'inbound_link_count', IntegerField(default=0))`
|
|
7. `AddField('Content', 'outbound_link_count', IntegerField(default=0))`
|
|
|
|
### 3.5 API Endpoints
|
|
|
|
All endpoints under `/api/v1/linker/`:
|
|
|
|
#### Link Management
|
|
| Method | Path | Description |
|
|
|--------|------|-------------|
|
|
| GET | `/api/v1/linker/links/?site_id=X` | List all SAGLink records with filters (link_type, status, cluster_id, source_content_id) |
|
|
| POST | `/api/v1/linker/links/plan/` | Generate link plan for a content piece. Body: `{content_id}`. Returns planned SAGLink records. |
|
|
| POST | `/api/v1/linker/links/insert/` | Insert planned links into content_html. Body: `{content_id}`. Modifies Content.content_html. |
|
|
| POST | `/api/v1/linker/links/batch-insert/` | Batch insert for multiple content. Body: `{content_ids: [int]}`. Queues Celery task. |
|
|
| GET | `/api/v1/linker/content/{id}/links/` | All inbound + outbound links for a specific content piece. |
|
|
|
|
#### Link Audit
|
|
| Method | Path | Description |
|
|
|--------|------|-------------|
|
|
| GET | `/api/v1/linker/audit/?site_id=X` | Latest SAGLinkAudit results. |
|
|
| POST | `/api/v1/linker/audit/run/` | Trigger site-wide link audit. Body: `{site_id}`. Queues Celery task. Returns task ID. |
|
|
| GET | `/api/v1/linker/audit/recommendations/?site_id=X` | Get fix recommendations from latest audit. |
|
|
| POST | `/api/v1/linker/audit/apply/` | Apply recommended fixes in batch. Body: `{site_id, recommendation_ids: [int]}`. |
|
|
|
|
#### Link Map & Health
|
|
| Method | Path | Description |
|
|
|--------|------|-------------|
|
|
| GET | `/api/v1/linker/link-map/?site_id=X` | Full LinkMap for site with pagination. |
|
|
| GET | `/api/v1/linker/orphans/?site_id=X` | List orphan pages (0 inbound internal links). |
|
|
| GET | `/api/v1/linker/health/?site_id=X` | Cluster-level link health scores. |
|
|
|
|
**Permissions:** All endpoints use `SiteSectorModelViewSet` permission patterns.
|
|
|
|
### 3.6 Link Planning Service
|
|
|
|
**Location:** `igny8_core/business/link_planning.py`
|
|
|
|
```python
|
|
class LinkPlanningService:
|
|
"""
|
|
Generates internal link plans for content based on SAG hierarchy
|
|
and scoring algorithm.
|
|
"""
|
|
|
|
SCORE_WEIGHTS = {
|
|
'shared_attributes': 0.40,
|
|
'target_authority': 0.25,
|
|
'keyword_overlap': 0.20,
|
|
'content_recency': 0.10,
|
|
'link_count_gap': 0.05,
|
|
}
|
|
|
|
AUTO_LINK_THRESHOLD = 60
|
|
REVIEW_THRESHOLD = 40
|
|
|
|
def plan(self, content_id):
|
|
"""
|
|
Generate link plan for a content piece.
|
|
1. Identify content's cluster and role (hub vs supporting)
|
|
2. Determine mandatory links (vertical_up for supporting)
|
|
3. Score all candidate targets
|
|
4. Select targets within density limits
|
|
5. Generate anchor text per link
|
|
6. Create SAGLink records with status='planned'
|
|
Returns list of planned SAGLink records.
|
|
"""
|
|
pass
|
|
|
|
def _get_mandatory_links(self, content, cluster):
|
|
"""Vertical upward: supporting → hub. Always added."""
|
|
pass
|
|
|
|
def _get_candidates(self, content, cluster, blueprint):
|
|
"""Gather all potential link targets from cluster and related clusters."""
|
|
pass
|
|
|
|
def _score_candidate(self, source_content, target_content, source_cluster,
|
|
target_cluster, blueprint):
|
|
"""Calculate 0-100 score using 5-factor algorithm."""
|
|
pass
|
|
|
|
def _select_within_density(self, content, scored_candidates):
|
|
"""Filter candidates to stay within density limits for page type and word count."""
|
|
pass
|
|
|
|
def _generate_anchor_text(self, source_content, target_content, link_type):
|
|
"""AI-generate contextually appropriate anchor text."""
|
|
pass
|
|
```
|
|
|
|
### 3.7 Link Insertion Service
|
|
|
|
**Location:** `igny8_core/business/link_insertion.py`
|
|
|
|
```python
|
|
class LinkInsertionService:
|
|
"""
|
|
Inserts planned links into content_html.
|
|
Handles placement, anchor text insertion, and collision avoidance.
|
|
"""
|
|
|
|
def insert(self, content_id):
|
|
"""
|
|
Insert all planned SAGLink records into Content.content_html.
|
|
1. Load all SAGLinks where source_content=content_id, status='planned'
|
|
2. Parse content_html
|
|
3. For each link, find insertion point based on placement_zone + position
|
|
4. Insert <a> tag with anchor text
|
|
5. Update SAGLink status='inserted', set inserted_at
|
|
6. Update Content.content_html, links_inserted=True, outbound_link_count
|
|
7. Update target Content.inbound_link_count
|
|
"""
|
|
pass
|
|
|
|
def _find_insertion_point(self, html_tree, link):
|
|
"""
|
|
Find best insertion point in parsed HTML:
|
|
- in_body: find paragraph at placement_position, find natural spot for anchor
|
|
- related_section: append to "Related Articles" section (create if missing)
|
|
- breadcrumb: insert breadcrumb trail at top
|
|
"""
|
|
pass
|
|
|
|
def _insert_link(self, html_tree, position, anchor_text, target_url):
|
|
"""Insert <a href> tag at position without breaking existing HTML."""
|
|
pass
|
|
```
|
|
|
|
### 3.8 Link Audit Service
|
|
|
|
**Location:** `igny8_core/business/link_audit.py`
|
|
|
|
```python
|
|
class LinkAuditService:
|
|
"""
|
|
Runs site-wide link audits: builds link map, identifies issues,
|
|
generates recommendations.
|
|
"""
|
|
|
|
def run_audit(self, site_id):
|
|
"""
|
|
Full audit:
|
|
1. Crawl all published Content for site
|
|
2. Extract all <a> tags, build/update LinkMap records
|
|
3. Identify orphan pages, over/under-linked, missing mandatory, broken
|
|
4. Calculate per-cluster health scores
|
|
5. Generate prioritized recommendations
|
|
6. Create SAGLinkAudit record
|
|
Returns SAGLinkAudit instance.
|
|
"""
|
|
pass
|
|
|
|
def _build_link_map(self, site_id):
|
|
"""Extract links from all published content_html, create LinkMap records."""
|
|
pass
|
|
|
|
def _find_orphans(self, site_id):
|
|
"""Content with 0 inbound internal links."""
|
|
pass
|
|
|
|
def _check_density(self, site_id):
|
|
"""Compare outbound counts against density rules per page type."""
|
|
pass
|
|
|
|
def _check_mandatory(self, site_id):
|
|
"""Verify all supporting articles have vertical_up link to their hub."""
|
|
pass
|
|
|
|
def _calculate_cluster_health(self, site_id, cluster):
|
|
"""Calculate 0-100 health score per cluster."""
|
|
pass
|
|
|
|
def _generate_recommendations(self, issues):
|
|
"""Priority-scored recommendations with AI-suggested anchor text."""
|
|
pass
|
|
```
|
|
|
|
### 3.9 Celery Tasks
|
|
|
|
**Location:** `igny8_core/tasks/linker_tasks.py`
|
|
|
|
```python
|
|
@shared_task(name='generate_link_plan')
|
|
def generate_link_plan(content_id):
|
|
"""Runs after content generation, before publish. Creates SAGLink records."""
|
|
pass
|
|
|
|
@shared_task(name='run_link_audit')
|
|
def run_link_audit(site_id):
|
|
"""Scheduled weekly or triggered manually. Full site-wide audit."""
|
|
pass
|
|
|
|
@shared_task(name='verify_links')
|
|
def verify_links(site_id):
|
|
"""Check for broken links via HTTP status checks on LinkMap URLs."""
|
|
pass
|
|
|
|
@shared_task(name='rebuild_link_map')
|
|
def rebuild_link_map(site_id):
|
|
"""Full crawl of published content to rebuild LinkMap from scratch."""
|
|
pass
|
|
```
|
|
|
|
**Beat Schedule Additions:**
|
|
|
|
| Task | Schedule | Notes |
|
|
|------|----------|-------|
|
|
| `run_link_audit` | Weekly (Sunday 1:00 AM) | Site-wide audit for all active sites |
|
|
| `verify_links` | Weekly (Wednesday 2:00 AM) | HTTP check all active LinkMap entries |
|
|
|
|
---
|
|
|
|
## 4. IMPLEMENTATION STEPS
|
|
|
|
### Step 1: Create Linker App
|
|
1. Create `igny8_core/modules/linker/` directory with `__init__.py` and `apps.py`
|
|
2. Add `linker` to `INSTALLED_APPS` in settings.py
|
|
3. Create models: SAGLink, SAGLinkAudit, LinkMap
|
|
|
|
### Step 2: Migration
|
|
1. Create migration for 3 new models
|
|
2. Add 4 new fields to Content model (link_plan, links_inserted, inbound_link_count, outbound_link_count)
|
|
3. Run migration
|
|
|
|
### Step 3: Services
|
|
1. Implement `LinkPlanningService` in `igny8_core/business/link_planning.py`
|
|
2. Implement `LinkInsertionService` in `igny8_core/business/link_insertion.py`
|
|
3. Implement `LinkAuditService` in `igny8_core/business/link_audit.py`
|
|
|
|
### Step 4: Pipeline Integration
|
|
Insert link planning + insertion between Stage 4 and Stage 7:
|
|
|
|
```python
|
|
# After content generation completes in pipeline:
|
|
def post_content_generation(content_id):
|
|
# 02G: Generate schema + SERP elements
|
|
# ...
|
|
# 02D: Plan and insert internal links
|
|
link_service = LinkPlanningService()
|
|
link_service.plan(content_id)
|
|
insertion_service = LinkInsertionService()
|
|
insertion_service.insert(content_id)
|
|
```
|
|
|
|
### Step 5: API Endpoints
|
|
1. Create `igny8_core/urls/linker.py` with link, audit, and health endpoints
|
|
2. Create views extending `SiteSectorModelViewSet`
|
|
3. Register URL patterns under `/api/v1/linker/`
|
|
|
|
### Step 6: Celery Tasks
|
|
1. Implement all 4 tasks in `igny8_core/tasks/linker_tasks.py`
|
|
2. Add `run_link_audit` and `verify_links` to Celery beat schedule
|
|
|
|
### Step 7: Serializers & Admin
|
|
1. Create DRF serializers for SAGLink, SAGLinkAudit, LinkMap
|
|
2. Register models in Django admin
|
|
|
|
### Step 8: Credit Cost Configuration
|
|
Add to `CreditCostConfig` (billing app):
|
|
|
|
| operation_type | default_cost | description |
|
|
|---------------|-------------|-------------|
|
|
| `link_audit` | 1 | Site-wide link audit |
|
|
| `link_generation` | 0.5 | Generate 1-5 links with AI anchor text |
|
|
| `link_audit_full` | 3-5 | Full site audit with recommendations |
|
|
|
|
---
|
|
|
|
## 5. ACCEPTANCE CRITERIA
|
|
|
|
### Link Types
|
|
- [ ] Vertical upward link (supporting → hub) automatically inserted for all supporting articles
|
|
- [ ] Vertical downward links (hub → supporting) generated with "Related Articles" section
|
|
- [ ] Horizontal sibling links (max 2) between same-cluster supporting articles
|
|
- [ ] Cross-cluster links (max 2) between hubs sharing SAGAttribute values
|
|
- [ ] Taxonomy contextual links from term pages to all relevant cluster hubs
|
|
- [ ] Breadcrumb chain generated from SAG hierarchy for all content
|
|
- [ ] Related content section (2-3 links) generated at end of article
|
|
|
|
### Link Scoring
|
|
- [ ] 5-factor scoring algorithm produces 0-100 scores
|
|
- [ ] Links with score ≥ 60 auto-inserted
|
|
- [ ] Links with score 40-59 suggested for manual review
|
|
- [ ] Score algorithm uses: shared attributes (40%), authority (25%), keyword overlap (20%), recency (10%), gap boost (5%)
|
|
|
|
### Anchor Text
|
|
- [ ] Anchor text 2-8 words, grammatically natural
|
|
- [ ] Same exact anchor not used >3 times to same target
|
|
- [ ] Distribution per target: 60% primary keyword, 30% page title, 10% natural
|
|
- [ ] Diversification audit flags if any anchor >40% of links to a target
|
|
|
|
### Link Density
|
|
- [ ] Hub pages: 5-20 outbound links based on word count
|
|
- [ ] Blog pages: 2-12 outbound links based on word count
|
|
- [ ] Product/Service pages: 2-5 outbound links
|
|
- [ ] Term pages: 3+ outbound, unlimited for taxonomy contextual
|
|
|
|
### Audit & Remediation
|
|
- [ ] Link audit identifies orphan pages, over/under-linked, missing mandatory, broken links
|
|
- [ ] Cluster-level health score (0-100) calculated per cluster
|
|
- [ ] Recommendations generated with priority scores and AI-suggested anchors
|
|
- [ ] Batch application of recommendations modifies content_html correctly
|
|
|
|
### Pipeline Integration
|
|
- [ ] Link plan generated automatically after content generation in pipeline
|
|
- [ ] Links inserted before publish stage
|
|
- [ ] Mandatory vertical_up link verified before allowing publish
|
|
- [ ] Content.inbound_link_count and outbound_link_count updated on insert
|
|
|
|
---
|
|
|
|
## 6. CLAUDE CODE INSTRUCTIONS
|
|
|
|
### File Locations
|
|
```
|
|
igny8_core/
|
|
├── modules/
|
|
│ └── linker/
|
|
│ ├── __init__.py
|
|
│ ├── apps.py # app_label = 'linker'
|
|
│ └── models.py # SAGLink, SAGLinkAudit, LinkMap
|
|
├── business/
|
|
│ ├── link_planning.py # LinkPlanningService
|
|
│ ├── link_insertion.py # LinkInsertionService
|
|
│ └── link_audit.py # LinkAuditService
|
|
├── tasks/
|
|
│ └── linker_tasks.py # Celery tasks
|
|
├── urls/
|
|
│ └── linker.py # Linker endpoints
|
|
└── migrations/
|
|
└── XXXX_add_linker_models.py
|
|
```
|
|
|
|
### Conventions
|
|
- **PKs:** BigAutoField (integer) — do NOT use UUIDs
|
|
- **Table prefix:** `igny8_` on all new tables
|
|
- **App label:** `linker` (new app)
|
|
- **Celery app name:** `igny8_core`
|
|
- **URL pattern:** `/api/v1/linker/...`
|
|
- **Permissions:** Use `SiteSectorModelViewSet` permission pattern
|
|
- **Model inheritance:** SAGLink and SAGLinkAudit extend `SiteSectorBaseModel`; LinkMap extends `SiteSectorBaseModel`
|
|
- **Frontend:** `.tsx` files with Zustand stores for state management
|
|
|
|
### Cross-References
|
|
| Doc | Relationship |
|
|
|-----|-------------|
|
|
| **01A** | SAGBlueprint/SAGCluster/SAGAttribute provide hierarchy and cross-cluster relationships |
|
|
| **01E** | Pipeline integration — link planning hooks after Stage 4, before Stage 7 |
|
|
| **01G** | SAG health monitoring incorporates cluster link health scores |
|
|
| **02B** | ContentTaxonomy cluster mapping enables taxonomy contextual links |
|
|
| **02E** | External backlinks complement internal links; authority distributed by internal links |
|
|
| **02F** | Optimizer identifies internal link opportunities and feeds to linker |
|
|
| **03A** | WP plugin standalone mode has its own internal linking module — separate from this |
|
|
| **03C** | Theme renders breadcrumbs and related content sections generated by linker |
|
|
|
|
### Key Decisions
|
|
1. **New `linker` app** — Separate app because linking is a distinct domain with its own models, not tightly coupled to writer or planner
|
|
2. **SAGLink stores planned AND inserted** — Single model tracks the full lifecycle from planning through insertion to verification
|
|
3. **LinkMap is separate from SAGLink** — LinkMap stores the actual crawled link state (including non-SAG links); SAGLink stores the planned/managed links
|
|
4. **Cached counts on Content** — `inbound_link_count` and `outbound_link_count` are denormalized for fast queries; updated on insert/removal
|
|
5. **HTML parsing for insertion** — Use Python HTML parser (BeautifulSoup or lxml) for safe link insertion without corrupting content_html
|