Files
igny8/v2/V2-Execution-Docs/02D-linker-internal.md
IGNY8 VPS (Salman) 0570052fec 1
2026-03-23 17:20:51 +00:00

27 KiB

IGNY8 Phase 2: Internal Linker (02D)

SAG-Based Internal Linking Engine

Document Version: 1.0 Date: 2026-03-23 Phase: IGNY8 Phase 2 — Feature Expansion Status: Build Ready Source of Truth: Codebase at /data/app/igny8/ Audience: Claude Code, Backend Developers, Architects


1. CURRENT STATE

Internal Linking Today

There is no internal linking system in IGNY8. Content is generated and published without any cross-linking strategy. Links within content are only those the AI incidentally includes during generation.

What Exists

  • Content model (app_label=writer, db_table=igny8_content) — stores content_html where links would be inserted
  • SAGCluster and SAGBlueprint models (from 01A) — provide the cluster hierarchy for link topology
  • The 7-stage automation pipeline (01E) generates and publishes content but has no linking stage between generation and publish
  • SiteIntegration model (app_label=integration) tracks WordPress connections

What Does Not Exist

  • No SAGLink model, no LinkMap model, no SAGLinkAudit model
  • No link scoring algorithm
  • No anchor text management
  • No link density enforcement
  • No link insertion into content_html
  • No orphan page detection
  • No link health monitoring
  • No link audit system

Foundation Available

  • SAGBlueprint (01A) — defines the SAG hierarchy (site → sectors → clusters → content)
  • SAGCluster (01A) — cluster_type, hub_page_type, hub_page_structure
  • SAGAttribute (01A) — attribute values shared across clusters (basis for cross-cluster linking)
  • 01E pipeline — post-generation hook point available between Stage 4 (Content) and Stage 7 (Publish)
  • Content.content_type and Content.content_structure — determines link density rules
  • 02B ContentTaxonomy with cluster mapping — taxonomy-to-cluster relationships for taxonomy contextual links

2. WHAT TO BUILD

Overview

Build a SAG-aware internal linking engine that automatically plans, scores, and inserts internal links into content. The system operates in two modes: new content mode (pipeline integration) and existing content remediation (audit + fix).

# Link Type Direction Description Limit Placement
1 Vertical Upward Supporting → Hub MANDATORY: every supporting article links to its cluster hub 1 per article First 2 paragraphs
2 Vertical Downward Hub → Supporting Hub lists ALL its supporting articles No cap "Related Articles" section + contextual body links
3 Horizontal Sibling Supporting ↔ Supporting Same-cluster articles linking to each other Max 2 per article Natural content overlap points
4 Cross-Cluster Hub ↔ Hub Hubs sharing a SAGAttribute value can cross-link Max 2 per hub Contextual body links
5 Taxonomy Contextual Term Page → Hubs Term pages link to ALL cluster hubs using that attribute No cap Auto-generated from 02B taxonomy-cluster mapping
6 Breadcrumb Hierarchical Home → Sector → [Attribute] → Hub → Current Page 1 chain per page Top of page (auto-generated from SAG hierarchy)
7 Related Content Cross-cluster allowed 2-3 links in "Related Reading" section at end of article 2-3 per article End of article section

Link Density Rules (outbound per page type, by word count):

Page Type <1000 words 1000-2000 words 2000+ words
Hub (cluster_hub) 5-10 10-15 15-20
Blog (article/guide/etc.) 2-5 3-8 4-12
Product/Service 2-3 3-5 3-5
Term Page (taxonomy) 3+ 3+ unlimited

Each candidate link target receives a score (0-100):

Factor Weight Description
Shared attribute values 40% Count of SAGAttribute values shared between source and target clusters
Target page authority 25% Inbound link count of target page (from LinkMap)
Keyword overlap 20% Common keywords between source cluster and target content
Content recency 10% Newer content gets a boost (exponential decay over 6 months)
Link count gap 5% Pages with fewest inbound links get a priority boost

Threshold: Score ≥ 60 qualifies for automatic linking. Scores 40-59 are suggested for manual review.

2.3 Anchor Text Rules

Rule Value
Min length 2 words
Max length 8 words
Grammatically natural Must read naturally in surrounding sentence
No exact-match overuse Same exact anchor cannot be used >3 times to same target URL
Anchor distribution per target Primary keyword 60%, page title 30%, natural phrase 10%
Diversification audit Flag if any single anchor accounts for >40% of links to a target

Anchor Types:

  • primary_keyword — cluster primary keyword
  • page_title — target content's title (or shortened version)
  • natural — AI-selected contextually appropriate phrase
  • branded — brand/site name (for homepage links)

2.4 Two Operating Modes

A. New Content Mode (Pipeline Integration)

Runs after Stage 4 (content generated), before Stage 7 (publish):

  1. Content generated by pipeline → link planning triggers
  2. Calculate link targets using scoring algorithm
  3. Insert links into content_html at natural positions
  4. Store link plan in SAGLink records
  5. If content is a hub → auto-generate "Related Articles" section with links to all supporting articles in cluster
  6. Mandatory check: if content is a supporting article, verify vertical_up link to hub exists; insert if missing

B. Existing Content Remediation (Audit + Fix)

For already-published content without proper internal linking:

  1. Crawl phase: Scan all published content for a site, extract all <a> tags, build LinkMap
  2. Audit analysis:
    • Orphan pages: 0 inbound internal links
    • Over-linked pages: outbound > density max for page type/word count
    • Under-linked pages: outbound < density min
    • Missing mandatory links: supporting articles without hub uplink
    • Broken links: target URL returns 4xx/5xx
  3. Recommendation generation: Priority-scored fix recommendations with AI-suggested anchor text
  4. Batch application: Insert missing links across multiple content records

Per-cluster health score (0-100) for link coverage:

Factor Points
Hub published and linked (has outbound + inbound links) 25
All supporting articles have mandatory uplink to hub 25
At least 1 cross-cluster link from hub 15
Term pages link to hub 15
No broken links in cluster 10
Link density within range for all pages 10

Site-wide link health = average of all cluster scores. Feeds into SAG health monitoring (01G).


3. DATA MODELS & APIS

3.1 New Models

class SAGLink(SiteSectorBaseModel):
    """
    Represents a planned or inserted internal link between two content pages.
    Tracks link type, anchor text, score, and status through lifecycle.
    """
    blueprint = models.ForeignKey(
        'planner.SAGBlueprint',
        on_delete=models.SET_NULL,
        null=True,
        blank=True,
        related_name='sag_links'
    )
    source_content = models.ForeignKey(
        'writer.Content',
        on_delete=models.CASCADE,
        related_name='outbound_sag_links'
    )
    target_content = models.ForeignKey(
        'writer.Content',
        on_delete=models.CASCADE,
        related_name='inbound_sag_links'
    )
    link_type = models.CharField(
        max_length=20,
        choices=[
            ('vertical_up', 'Vertical Upward'),
            ('vertical_down', 'Vertical Downward'),
            ('horizontal', 'Horizontal Sibling'),
            ('cross_cluster', 'Cross-Cluster'),
            ('taxonomy', 'Taxonomy Contextual'),
            ('breadcrumb', 'Breadcrumb'),
            ('related', 'Related Content'),
        ]
    )
    anchor_text = models.CharField(max_length=200)
    anchor_type = models.CharField(
        max_length=20,
        choices=[
            ('primary_keyword', 'Primary Keyword'),
            ('page_title', 'Page Title'),
            ('natural', 'Natural Phrase'),
            ('branded', 'Branded'),
        ]
    )
    placement_zone = models.CharField(
        max_length=20,
        choices=[
            ('in_body', 'In Body'),
            ('related_section', 'Related Section'),
            ('breadcrumb', 'Breadcrumb'),
            ('sidebar', 'Sidebar'),
        ]
    )
    placement_position = models.IntegerField(
        null=True,
        blank=True,
        help_text='Paragraph number for in_body placement'
    )
    score = models.FloatField(
        default=0,
        help_text='Link scoring algorithm result (0-100)'
    )
    status = models.CharField(
        max_length=15,
        choices=[
            ('planned', 'Planned'),
            ('inserted', 'Inserted'),
            ('verified', 'Verified'),
            ('broken', 'Broken'),
            ('removed', 'Removed'),
        ],
        default='planned'
    )
    is_mandatory = models.BooleanField(
        default=False,
        help_text='True for vertical_up links (supporting → hub)'
    )
    inserted_at = models.DateTimeField(null=True, blank=True)

    class Meta:
        app_label = 'linker'
        db_table = 'igny8_sag_links'

PK: BigAutoField (integer) — inherits from SiteSectorBaseModel

SAGLinkAudit (linker app)

class SAGLinkAudit(SiteSectorBaseModel):
    """
    Stores results of a site-wide or cluster-level link audit.
    """
    blueprint = models.ForeignKey(
        'planner.SAGBlueprint',
        on_delete=models.SET_NULL,
        null=True,
        blank=True,
        related_name='link_audits'
    )
    audit_date = models.DateTimeField(auto_now_add=True)
    total_links = models.IntegerField(default=0)
    missing_mandatory = models.IntegerField(default=0)
    orphan_pages = models.IntegerField(default=0)
    broken_links = models.IntegerField(default=0)
    over_linked_pages = models.IntegerField(default=0)
    under_linked_pages = models.IntegerField(default=0)
    cluster_scores = models.JSONField(
        default=dict,
        help_text='{cluster_id: {score, missing, issues[]}}'
    )
    recommendations = models.JSONField(
        default=list,
        help_text='[{content_id, action, link_type, target_id, anchor_suggestion, priority}]'
    )
    overall_health_score = models.FloatField(
        default=0,
        help_text='Average of cluster scores (0-100)'
    )

    class Meta:
        app_label = 'linker'
        db_table = 'igny8_sag_link_audits'

PK: BigAutoField (integer) — inherits from SiteSectorBaseModel

LinkMap (linker app)

class LinkMap(SiteSectorBaseModel):
    """
    Full link map of all internal (and external) links found in published content.
    Built by crawling content_html of all published content records.
    """
    source_url = models.URLField()
    source_content = models.ForeignKey(
        'writer.Content',
        on_delete=models.SET_NULL,
        null=True,
        blank=True,
        related_name='outbound_link_map'
    )
    target_url = models.URLField()
    target_content = models.ForeignKey(
        'writer.Content',
        on_delete=models.SET_NULL,
        null=True,
        blank=True,
        related_name='inbound_link_map'
    )
    anchor_text = models.CharField(max_length=500)
    is_internal = models.BooleanField(default=True)
    is_follow = models.BooleanField(default=True)
    position = models.CharField(
        max_length=20,
        choices=[
            ('in_content', 'In Content'),
            ('navigation', 'Navigation'),
            ('footer', 'Footer'),
            ('sidebar', 'Sidebar'),
        ],
        default='in_content'
    )
    last_verified = models.DateTimeField(null=True, blank=True)
    status = models.CharField(
        max_length=15,
        choices=[
            ('active', 'Active'),
            ('broken', 'Broken'),
            ('removed', 'Removed'),
        ],
        default='active'
    )

    class Meta:
        app_label = 'linker'
        db_table = 'igny8_link_map'

PK: BigAutoField (integer) — inherits from SiteSectorBaseModel

3.2 Modified Models

Content (writer app) — add 4 fields

# Add to Content model:
link_plan = models.JSONField(
    null=True,
    blank=True,
    help_text='Planned links before insertion: [{target_id, link_type, anchor, score}]'
)
links_inserted = models.BooleanField(
    default=False,
    help_text='Whether link plan has been applied to content_html'
)
inbound_link_count = models.IntegerField(
    default=0,
    help_text='Cached count of inbound internal links'
)
outbound_link_count = models.IntegerField(
    default=0,
    help_text='Cached count of outbound internal links'
)

3.3 New App Registration

Create linker app:

  • App config: igny8_core/modules/linker/apps.py with app_label = 'linker'
  • Add to INSTALLED_APPS in igny8_core/settings.py

3.4 Migration

igny8_core/migrations/XXXX_add_linker_models.py

Operations:

  1. CreateModel('SAGLink', ...) — with indexes on source_content, target_content, link_type, status
  2. CreateModel('SAGLinkAudit', ...)
  3. CreateModel('LinkMap', ...) — with index on source_url, target_url
  4. AddField('Content', 'link_plan', JSONField(null=True, blank=True))
  5. AddField('Content', 'links_inserted', BooleanField(default=False))
  6. AddField('Content', 'inbound_link_count', IntegerField(default=0))
  7. AddField('Content', 'outbound_link_count', IntegerField(default=0))

3.5 API Endpoints

All endpoints under /api/v1/linker/:

Method Path Description
GET /api/v1/linker/links/?site_id=X List all SAGLink records with filters (link_type, status, cluster_id, source_content_id)
POST /api/v1/linker/links/plan/ Generate link plan for a content piece. Body: {content_id}. Returns planned SAGLink records.
POST /api/v1/linker/links/insert/ Insert planned links into content_html. Body: {content_id}. Modifies Content.content_html.
POST /api/v1/linker/links/batch-insert/ Batch insert for multiple content. Body: {content_ids: [int]}. Queues Celery task.
GET /api/v1/linker/content/{id}/links/ All inbound + outbound links for a specific content piece.
Method Path Description
GET /api/v1/linker/audit/?site_id=X Latest SAGLinkAudit results.
POST /api/v1/linker/audit/run/ Trigger site-wide link audit. Body: {site_id}. Queues Celery task. Returns task ID.
GET /api/v1/linker/audit/recommendations/?site_id=X Get fix recommendations from latest audit.
POST /api/v1/linker/audit/apply/ Apply recommended fixes in batch. Body: {site_id, recommendation_ids: [int]}.
Method Path Description
GET /api/v1/linker/link-map/?site_id=X Full LinkMap for site with pagination.
GET /api/v1/linker/orphans/?site_id=X List orphan pages (0 inbound internal links).
GET /api/v1/linker/health/?site_id=X Cluster-level link health scores.

Permissions: All endpoints use SiteSectorModelViewSet permission patterns.

Location: igny8_core/business/link_planning.py

class LinkPlanningService:
    """
    Generates internal link plans for content based on SAG hierarchy
    and scoring algorithm.
    """

    SCORE_WEIGHTS = {
        'shared_attributes': 0.40,
        'target_authority': 0.25,
        'keyword_overlap': 0.20,
        'content_recency': 0.10,
        'link_count_gap': 0.05,
    }

    AUTO_LINK_THRESHOLD = 60
    REVIEW_THRESHOLD = 40

    def plan(self, content_id):
        """
        Generate link plan for a content piece.
        1. Identify content's cluster and role (hub vs supporting)
        2. Determine mandatory links (vertical_up for supporting)
        3. Score all candidate targets
        4. Select targets within density limits
        5. Generate anchor text per link
        6. Create SAGLink records with status='planned'
        Returns list of planned SAGLink records.
        """
        pass

    def _get_mandatory_links(self, content, cluster):
        """Vertical upward: supporting → hub. Always added."""
        pass

    def _get_candidates(self, content, cluster, blueprint):
        """Gather all potential link targets from cluster and related clusters."""
        pass

    def _score_candidate(self, source_content, target_content, source_cluster,
                         target_cluster, blueprint):
        """Calculate 0-100 score using 5-factor algorithm."""
        pass

    def _select_within_density(self, content, scored_candidates):
        """Filter candidates to stay within density limits for page type and word count."""
        pass

    def _generate_anchor_text(self, source_content, target_content, link_type):
        """AI-generate contextually appropriate anchor text."""
        pass

Location: igny8_core/business/link_insertion.py

class LinkInsertionService:
    """
    Inserts planned links into content_html.
    Handles placement, anchor text insertion, and collision avoidance.
    """

    def insert(self, content_id):
        """
        Insert all planned SAGLink records into Content.content_html.
        1. Load all SAGLinks where source_content=content_id, status='planned'
        2. Parse content_html
        3. For each link, find insertion point based on placement_zone + position
        4. Insert <a> tag with anchor text
        5. Update SAGLink status='inserted', set inserted_at
        6. Update Content.content_html, links_inserted=True, outbound_link_count
        7. Update target Content.inbound_link_count
        """
        pass

    def _find_insertion_point(self, html_tree, link):
        """
        Find best insertion point in parsed HTML:
        - in_body: find paragraph at placement_position, find natural spot for anchor
        - related_section: append to "Related Articles" section (create if missing)
        - breadcrumb: insert breadcrumb trail at top
        """
        pass

    def _insert_link(self, html_tree, position, anchor_text, target_url):
        """Insert <a href> tag at position without breaking existing HTML."""
        pass

Location: igny8_core/business/link_audit.py

class LinkAuditService:
    """
    Runs site-wide link audits: builds link map, identifies issues,
    generates recommendations.
    """

    def run_audit(self, site_id):
        """
        Full audit:
        1. Crawl all published Content for site
        2. Extract all <a> tags, build/update LinkMap records
        3. Identify orphan pages, over/under-linked, missing mandatory, broken
        4. Calculate per-cluster health scores
        5. Generate prioritized recommendations
        6. Create SAGLinkAudit record
        Returns SAGLinkAudit instance.
        """
        pass

    def _build_link_map(self, site_id):
        """Extract links from all published content_html, create LinkMap records."""
        pass

    def _find_orphans(self, site_id):
        """Content with 0 inbound internal links."""
        pass

    def _check_density(self, site_id):
        """Compare outbound counts against density rules per page type."""
        pass

    def _check_mandatory(self, site_id):
        """Verify all supporting articles have vertical_up link to their hub."""
        pass

    def _calculate_cluster_health(self, site_id, cluster):
        """Calculate 0-100 health score per cluster."""
        pass

    def _generate_recommendations(self, issues):
        """Priority-scored recommendations with AI-suggested anchor text."""
        pass

3.9 Celery Tasks

Location: igny8_core/tasks/linker_tasks.py

@shared_task(name='generate_link_plan')
def generate_link_plan(content_id):
    """Runs after content generation, before publish. Creates SAGLink records."""
    pass

@shared_task(name='run_link_audit')
def run_link_audit(site_id):
    """Scheduled weekly or triggered manually. Full site-wide audit."""
    pass

@shared_task(name='verify_links')
def verify_links(site_id):
    """Check for broken links via HTTP status checks on LinkMap URLs."""
    pass

@shared_task(name='rebuild_link_map')
def rebuild_link_map(site_id):
    """Full crawl of published content to rebuild LinkMap from scratch."""
    pass

Beat Schedule Additions:

Task Schedule Notes
run_link_audit Weekly (Sunday 1:00 AM) Site-wide audit for all active sites
verify_links Weekly (Wednesday 2:00 AM) HTTP check all active LinkMap entries

4. IMPLEMENTATION STEPS

Step 1: Create Linker App

  1. Create igny8_core/modules/linker/ directory with __init__.py and apps.py
  2. Add linker to INSTALLED_APPS in settings.py
  3. Create models: SAGLink, SAGLinkAudit, LinkMap

Step 2: Migration

  1. Create migration for 3 new models
  2. Add 4 new fields to Content model (link_plan, links_inserted, inbound_link_count, outbound_link_count)
  3. Run migration

Step 3: Services

  1. Implement LinkPlanningService in igny8_core/business/link_planning.py
  2. Implement LinkInsertionService in igny8_core/business/link_insertion.py
  3. Implement LinkAuditService in igny8_core/business/link_audit.py

Step 4: Pipeline Integration

Insert link planning + insertion between Stage 4 and Stage 7:

# After content generation completes in pipeline:
def post_content_generation(content_id):
    # 02G: Generate schema + SERP elements
    # ...
    # 02D: Plan and insert internal links
    link_service = LinkPlanningService()
    link_service.plan(content_id)
    insertion_service = LinkInsertionService()
    insertion_service.insert(content_id)

Step 5: API Endpoints

  1. Create igny8_core/urls/linker.py with link, audit, and health endpoints
  2. Create views extending SiteSectorModelViewSet
  3. Register URL patterns under /api/v1/linker/

Step 6: Celery Tasks

  1. Implement all 4 tasks in igny8_core/tasks/linker_tasks.py
  2. Add run_link_audit and verify_links to Celery beat schedule

Step 7: Serializers & Admin

  1. Create DRF serializers for SAGLink, SAGLinkAudit, LinkMap
  2. Register models in Django admin

Step 8: Credit Cost Configuration

Add to CreditCostConfig (billing app):

operation_type default_cost description
link_audit 1 Site-wide link audit
link_generation 0.5 Generate 1-5 links with AI anchor text
link_audit_full 3-5 Full site audit with recommendations

5. ACCEPTANCE CRITERIA

  • Vertical upward link (supporting → hub) automatically inserted for all supporting articles
  • Vertical downward links (hub → supporting) generated with "Related Articles" section
  • Horizontal sibling links (max 2) between same-cluster supporting articles
  • Cross-cluster links (max 2) between hubs sharing SAGAttribute values
  • Taxonomy contextual links from term pages to all relevant cluster hubs
  • Breadcrumb chain generated from SAG hierarchy for all content
  • Related content section (2-3 links) generated at end of article
  • 5-factor scoring algorithm produces 0-100 scores
  • Links with score ≥ 60 auto-inserted
  • Links with score 40-59 suggested for manual review
  • Score algorithm uses: shared attributes (40%), authority (25%), keyword overlap (20%), recency (10%), gap boost (5%)

Anchor Text

  • Anchor text 2-8 words, grammatically natural
  • Same exact anchor not used >3 times to same target
  • Distribution per target: 60% primary keyword, 30% page title, 10% natural
  • Diversification audit flags if any anchor >40% of links to a target
  • Hub pages: 5-20 outbound links based on word count
  • Blog pages: 2-12 outbound links based on word count
  • Product/Service pages: 2-5 outbound links
  • Term pages: 3+ outbound, unlimited for taxonomy contextual

Audit & Remediation

  • Link audit identifies orphan pages, over/under-linked, missing mandatory, broken links
  • Cluster-level health score (0-100) calculated per cluster
  • Recommendations generated with priority scores and AI-suggested anchors
  • Batch application of recommendations modifies content_html correctly

Pipeline Integration

  • Link plan generated automatically after content generation in pipeline
  • Links inserted before publish stage
  • Mandatory vertical_up link verified before allowing publish
  • Content.inbound_link_count and outbound_link_count updated on insert

6. CLAUDE CODE INSTRUCTIONS

File Locations

igny8_core/
├── modules/
│   └── linker/
│       ├── __init__.py
│       ├── apps.py                    # app_label = 'linker'
│       └── models.py                  # SAGLink, SAGLinkAudit, LinkMap
├── business/
│   ├── link_planning.py               # LinkPlanningService
│   ├── link_insertion.py              # LinkInsertionService
│   └── link_audit.py                  # LinkAuditService
├── tasks/
│   └── linker_tasks.py                # Celery tasks
├── urls/
│   └── linker.py                      # Linker endpoints
└── migrations/
    └── XXXX_add_linker_models.py

Conventions

  • PKs: BigAutoField (integer) — do NOT use UUIDs
  • Table prefix: igny8_ on all new tables
  • App label: linker (new app)
  • Celery app name: igny8_core
  • URL pattern: /api/v1/linker/...
  • Permissions: Use SiteSectorModelViewSet permission pattern
  • Model inheritance: SAGLink and SAGLinkAudit extend SiteSectorBaseModel; LinkMap extends SiteSectorBaseModel
  • Frontend: .tsx files with Zustand stores for state management

Cross-References

Doc Relationship
01A SAGBlueprint/SAGCluster/SAGAttribute provide hierarchy and cross-cluster relationships
01E Pipeline integration — link planning hooks after Stage 4, before Stage 7
01G SAG health monitoring incorporates cluster link health scores
02B ContentTaxonomy cluster mapping enables taxonomy contextual links
02E External backlinks complement internal links; authority distributed by internal links
02F Optimizer identifies internal link opportunities and feeds to linker
03A WP plugin standalone mode has its own internal linking module — separate from this
03C Theme renders breadcrumbs and related content sections generated by linker

Key Decisions

  1. New linker app — Separate app because linking is a distinct domain with its own models, not tightly coupled to writer or planner
  2. SAGLink stores planned AND inserted — Single model tracks the full lifecycle from planning through insertion to verification
  3. LinkMap is separate from SAGLink — LinkMap stores the actual crawled link state (including non-SAG links); SAGLink stores the planned/managed links
  4. Cached counts on Contentinbound_link_count and outbound_link_count are denormalized for fast queries; updated on insert/removal
  5. HTML parsing for insertion — Use Python HTML parser (BeautifulSoup or lxml) for safe link insertion without corrupting content_html