Files
igny8/v2/V2-Execution-Docs/01F-existing-site-analysis-case1.md
IGNY8 VPS (Salman) 128b186865 temproary docs uplaoded
2026-03-23 09:02:49 +00:00

41 KiB

01F: IGNY8 Phase 1 — Existing Site Analysis (Case 1)

Document Type: Build Specification Phase: Phase 1: Existing Site Analysis Use Case: Case 1 (Users with existing sites) Status: Active Development Last Updated: 2026-03-23


1. Current State

1.1 Existing IGNY8 WordPress Plugin

The IGNY8 WordPress plugin is currently operational with the following capabilities:

Current Data Collection:

  • Post status tracking
  • Site metadata (domain, WordPress version, plugin count, theme)
  • Keyword mapping and analysis
  • Site structure analysis
  • Taxonomy sync across registered taxonomies
  • 7 active cron jobs managing periodic data updates

Current Plugin Endpoint:

  • GET /wp-json/igny8/v1/health — basic health check
  • Plugin location: WordPress plugins directory
  • Sync frequency: Configurable via cron (daily default)

Limitations:

  • Does not collect detailed product data (WooCommerce stores)
  • Does not analyze product descriptions for attribute patterns
  • No collection of custom attribute assignments
  • No menu structure analysis
  • No blog content summary extraction
  • No confidence scoring for discovered patterns
  • Manual attribute creation required post-analysis

1.2 Case 1 User Journey

Trigger: User logs into IGNY8 platform with existing WordPress site (WooCommerce-based)

Current Flow:

  1. User connects WordPress site via API key
  2. Plugin syncs basic site data
  3. User manually creates SAG blueprint
  4. User manually defines attributes
  5. User manually tags existing products

Desired Flow:

  1. User connects WordPress site via API key
  2. Plugin collects comprehensive site data (products, categories, content)
  3. AI automatically extracts attributes from product titles/descriptions
  4. System generates SAG blueprint with discovered attributes
  5. System performs gap analysis (what's missing vs. SAG template)
  6. User reviews and confirms blueprint
  7. System auto-tags existing products
  8. Blueprint feeds into content pipeline (01E) and cluster formation (01C)

1.3 Dependencies & Prerequisites

  • WordPress 5.8+ with WooCommerce 5.0+
  • IGNY8 plugin v2.0+ installed and activated
  • OpenAI API or compatible LLM for attribute extraction
  • Celery for async task processing (analysis may take 2-5 minutes)
  • Database schema supports site analysis metadata storage
  • Sector templates (01B) available for validation

2. What to Build

2.1 Enhanced Plugin: Site Data Collection

Objective: Extend WordPress plugin to collect comprehensive site data for SAG analysis.

New Plugin Endpoint:

GET /wp-json/igny8/v1/sag/site-analysis
Headers: Authorization: Bearer {IGNY8_API_TOKEN}
Query Parameters:
  - limit_products: 500 (max products to analyze; default 500)
  - include_drafts: false (include draft products; default false)
  - cache_ttl: 3600 (cache results for N seconds; default 3600)

Response: 200 OK with payload (see section 2.3)

Data Collection Modules:

Module Responsibility Data Returned
ProductCollector Extract all products with metadata titles, descriptions, prices, categories, tags, images, custom attributes, sku
CategoryCollector Map product category hierarchy names, slugs, parent-child hierarchy, descriptions, product counts
TaxonomyCollector Enumerate all custom taxonomies taxonomy names, all registered terms, term hierarchies, term metadata
AttributeCollector Extract WooCommerce attributes attribute names, attribute types (select/text/color), all values, product assignments
PageCollector Identify key pages titles, URLs, content summaries (first 500 chars), page type detection
PostCollector Extract blog posts titles, URLs, content summaries, categories, tags, publish date
MenuCollector Analyze navigation structure menu items, hierarchy, target URLs/categories
PluginCollector Document site technical stack active plugins, theme, WordPress version, WooCommerce version

Implementation:

  • Location: plugins/igny8-sync/includes/collectors/
  • Each collector implements DataCollectorInterface with collect() and sanitize() methods
  • Data sanitization: Remove PII, HTML tags, limit text length
  • Error handling: Log failures per collector, return partial data if one collector fails
  • Performance: Optimize queries to avoid site slowdown (use transients, batch operations)

Plugin Cron Job Addition:

  • New job: igny8_sync_sag_site_analysis (optional, runs if user triggers analysis)
  • Frequency: On-demand via API call, not scheduled
  • Timeout: 60 seconds (analysis itself happens server-side via Celery)

2.2 AI Attribute Extraction Service

File: sag/ai_functions/attribute_extraction.py Register Key: extract_site_attributes Input Type: SiteAnalysisPayload Output Type: AttributeExtractionResult

Function Signature:

def extract_site_attributes(
    site_data: SiteAnalysisPayload,
    sector_template: Optional[SectorTemplate] = None,
    confidence_threshold: float = 0.6,
    max_attributes: int = 20
) -> AttributeExtractionResult:
    """
    Analyze site data to discover attributes.

    Args:
        site_data: Raw site data from WordPress plugin
        sector_template: Optional sector template for validation
        confidence_threshold: Min confidence to include attribute (0.0-1.0)
        max_attributes: Max attributes to return

    Returns:
        AttributeExtractionResult with discovered attributes, frequencies, confidence scores
    """

Algorithm:

  1. Text Analysis Phase

    • Concatenate product titles and descriptions
    • Apply tokenization and noun phrase extraction
    • Identify recurring modifiers and descriptors
    • Extract from category names and tags
    • Extract from custom attribute values (if any exist)
  2. Pattern Recognition Phase

    • Group similar terms (e.g., "back pain" + "back relief" + "lower back" → "back/spine")
    • Calculate frequency across product dataset
    • Identify dimensional axes (e.g., "target area," "device type")
    • Score statistical significance
  3. Validation Phase

    • Cross-reference against sector template (if provided)
    • Validate against common attribute taxonomies
    • Flag conflicting or ambiguous discoveries
    • Assign confidence scores based on:
      • Frequency (how often appears)
      • Consistency (appears across multiple products)
      • Specificity (not too vague)
      • Template alignment (matches known attributes)
  4. Ranking Phase

    • Rank by frequency and confidence
    • Assign dimensionality (Primary/Secondary/Tertiary)
    • Cap results at max_attributes

Output Structure:

{
  "analysis_id": "uuid",
  "site_id": "uuid",
  "timestamp": "2026-03-23T14:30:00Z",
  "analysis_confidence": 0.82,
  "attributes": [
    {
      "name": "Target Area",
      "dimension": "Primary",
      "confidence": 0.95,
      "frequency": 32,
      "discovered_from": ["product_titles", "product_descriptions", "categories"],
      "values": [
        {
          "value": "Neck",
          "frequency": 12,
          "example_products": ["Product A", "Product B"]
        },
        {
          "value": "Back",
          "frequency": 8,
          "example_products": ["Product C"]
        },
        {
          "value": "Foot",
          "frequency": 25,
          "example_products": ["Product D", "Product E"]
        }
      ],
      "template_validation": {
        "matched_sector": "massage_devices",
        "matched_attribute": "body_region",
        "alignment_score": 0.98
      }
    },
    {
      "name": "Device Type",
      "dimension": "Primary",
      "confidence": 0.88,
      "frequency": 28,
      "discovered_from": ["product_titles", "product_descriptions"],
      "values": [
        {
          "value": "Shiatsu",
          "frequency": 18,
          "example_products": ["Product F"]
        },
        {
          "value": "EMS",
          "frequency": 7,
          "example_products": ["Product G"]
        },
        {
          "value": "Percussion",
          "frequency": 3,
          "example_products": ["Product H"]
        }
      ],
      "template_validation": {
        "matched_sector": "massage_devices",
        "matched_attribute": "therapy_type",
        "alignment_score": 0.91
      }
    },
    {
      "name": "Heat Setting",
      "dimension": "Secondary",
      "confidence": 0.72,
      "frequency": 15,
      "discovered_from": ["product_descriptions"],
      "values": [
        {
          "value": "Heated",
          "frequency": 15,
          "example_products": ["Product I", "Product J"]
        }
      ],
      "template_validation": {
        "matched_sector": "massage_devices",
        "matched_attribute": "heat_enabled",
        "alignment_score": 0.85
      }
    }
  ],
  "low_confidence_discoveries": [
    {
      "name": "Brand",
      "confidence": 0.55,
      "reason": "High variability, many single-mention values"
    }
  ],
  "analysis_notes": {
    "total_products_analyzed": 50,
    "total_categories": 8,
    "total_tags": 23,
    "extraction_method": "llm_analysis",
    "model_used": "gpt-4-turbo"
  }
}

Error Handling:

  • Insufficient data: Log warning, return empty attributes list
  • LLM API failure: Retry with exponential backoff (3 retries)
  • Timeout (>5 minutes): Abort and return partial results
  • Invalid sector template: Log error, continue analysis without validation

Performance Considerations:

  • Cache sector templates in memory
  • Batch LLM calls (process 5-10 products per API call)
  • Store extraction results in database for audit trail
  • Return results within 2-5 minutes for typical sites

2.3 Data Models

SiteAnalysisPayload

from dataclasses import dataclass
from typing import List, Dict, Optional

@dataclass
class Product:
    id: str
    title: str
    description: str
    sku: str
    price: float
    categories: List[str]
    tags: List[str]
    custom_attributes: Dict[str, List[str]]
    image_urls: List[str]

@dataclass
class Category:
    id: str
    name: str
    slug: str
    parent_id: Optional[str]
    description: str
    product_count: int

@dataclass
class Taxonomy:
    name: str
    label: str
    is_hierarchical: bool
    terms: List['Term']

@dataclass
class Term:
    id: str
    name: str
    slug: str
    parent_id: Optional[str]
    description: str
    count: int

@dataclass
class Page:
    id: str
    title: str
    url: str
    content_summary: str
    page_type: str  # e.g., "shop", "landing", "faq"

@dataclass
class Post:
    id: str
    title: str
    url: str
    content_summary: str
    categories: List[str]
    tags: List[str]
    publish_date: str

@dataclass
class MenuItem:
    id: str
    title: str
    url: str
    target: str
    parent_id: Optional[str]

@dataclass
class SiteMetadata:
    site_id: str
    domain: str
    wordpress_version: str
    woocommerce_version: str
    total_products: int
    total_categories: int
    total_pages: int
    total_posts: int
    active_plugins: List[str]
    theme: str

@dataclass
class SiteAnalysisPayload:
    metadata: SiteMetadata
    products: List[Product]
    categories: List[Category]
    taxonomies: List[Taxonomy]
    pages: List[Page]
    posts: List[Post]
    menus: List[MenuItem]
    collected_at: str  # ISO 8601 timestamp

AttributeExtractionResult

@dataclass
class AttributeValue:
    value: str
    frequency: int
    example_products: List[str]

@dataclass
class TemplateValidation:
    matched_sector: str
    matched_attribute: str
    alignment_score: float

@dataclass
class DiscoveredAttribute:
    name: str
    dimension: str  # "Primary", "Secondary", "Tertiary"
    confidence: float  # 0.0-1.0
    frequency: int
    discovered_from: List[str]  # ["product_titles", "product_descriptions", ...]
    values: List[AttributeValue]
    template_validation: Optional[TemplateValidation]

@dataclass
class LowConfideryDiscovery:
    name: str
    confidence: float
    reason: str

@dataclass
class AnalysisNotes:
    total_products_analyzed: int
    total_categories: int
    total_tags: int
    extraction_method: str
    model_used: str

@dataclass
class AttributeExtractionResult:
    analysis_id: str
    site_id: str
    timestamp: str
    analysis_confidence: float
    attributes: List[DiscoveredAttribute]
    low_confidence_discoveries: List[LowConfideryDiscovery]
    analysis_notes: AnalysisNotes

2.4 Gap Analysis Service

File: sag/services/gap_analysis_service.py Class: GapAnalysisService Method: analyze_gap(site_data: SiteAnalysisPayload, blueprint: SAGBlueprint) -> GapAnalysisReport

Purpose: Compare existing site structure against SAG blueprint to identify gaps.

Analysis Dimensions:

  1. Attribute Coverage Gap

    • SAG blueprint specifies X attributes
    • Site currently has Y custom attributes assigned to products
    • Gap: Missing attributes or low coverage (% of products with attribute values)
  2. Hub Page Gap

    • Blueprint specifies Z cluster hubs
    • Site analysis identifies M existing pages
    • Gap: Missing hub pages (authority pages for attribute clusters)
  3. Term Landing Page Gap

    • Blueprint specifies N attribute values requiring term landing pages
    • Site has existing category/tag pages
    • Gap: Missing term landing pages (one per attribute value)
  4. Blog Content Gap

    • Blueprint specifies recommended blog posts per cluster
    • Site has P existing blog posts
    • Gap: Blog content aligned to clusters and keyword targets
  5. Internal Linking Gap

    • Blueprint specifies internal linking strategy
    • Site has current internal link structure
    • Gap: Missing cross-cluster and term-to-hub links
  6. Product Enrichment Gap

    • Products lacking attribute assignments
    • Products missing description optimization
    • Products missing images
  7. Technical SEO Gap

    • Missing schema markup for products
    • Category pages lacking optimization
    • Menu structure not optimized for crawlability

Output Structure:

{
  "analysis_id": "uuid",
  "site_id": "uuid",
  "blueprint_id": "uuid",
  "timestamp": "2026-03-23T14:30:00Z",
  "summary": {
    "products_current": 50,
    "products_gap": 0,
    "attributes_current": 3,
    "attributes_blueprint": 8,
    "attributes_gap": 5,
    "hub_pages_current": 2,
    "hub_pages_blueprint": 4,
    "hub_pages_gap": 2,
    "term_pages_current": 12,
    "term_pages_blueprint": 35,
    "term_pages_gap": 23,
    "blog_posts_current": 8,
    "blog_posts_blueprint": 24,
    "blog_posts_gap": 16,
    "overall_gap_percentage": 62
  },
  "attributes_gap_detail": [
    {
      "attribute": "Target Area",
      "coverage_current": "100% (50/50)",
      "coverage_blueprint": "100% (50/50)",
      "gap": "None — attribute well-covered"
    },
    {
      "attribute": "Device Type",
      "coverage_current": "80% (40/50)",
      "coverage_blueprint": "100% (50/50)",
      "gap": "10 products missing Device Type assignment"
    }
  ],
  "hub_pages_gap_detail": [
    {
      "cluster": "Foot Massagers",
      "status": "EXISTS",
      "url": "/shop/foot-massagers",
      "optimization_notes": "Good; consider adding testimonials section"
    },
    {
      "cluster": "Neck & Shoulder Relief",
      "status": "MISSING",
      "recommendation": "Create hub page at /neck-shoulder-relief"
    }
  ],
  "term_pages_gap_detail": [
    {
      "attribute": "Target Area",
      "term": "Neck",
      "status": "MISSING",
      "recommendation": "Create term page at /target-area/neck (products filter + blog links)"
    }
  ],
  "blog_posts_gap_detail": [
    {
      "cluster": "Foot Massagers",
      "recommended_posts": [
        "Best Foot Massagers for Neuropathy",
        "How to Use Shiatsu Foot Massagers",
        "Foot Massage Benefits"
      ],
      "existing_posts": [
        "Foot Massage 101"
      ],
      "gap": 2
    }
  ],
  "internal_linking_gap": {
    "status": "High gaps identified",
    "recommendation": "Blueprint specifies 3-5 internal links per hub page; current average: 1.2",
    "priority_links": [
      "Neck hub → Foot hub (shared body region cluster)",
      "Device Type pages → Hub pages",
      "Blog posts → Related term pages"
    ]
  },
  "actionable_recommendations": [
    "IMMEDIATE: Assign Device Type to 10 untagged products",
    "WEEK 1: Create 2 missing hub pages",
    "WEEK 2: Create 23 term landing pages via script",
    "WEEK 3: Bulk create 16 blog posts (outline + AI generation)",
    "WEEK 4: Implement internal linking strategy"
  ]
}

2.5 Product Auto-Tagging Service

File: sag/services/auto_tagger_service.py Class: ProductAutoTagger Method: generate_tag_suggestions(products: List[Product], attributes: List[DiscoveredAttribute], blueprint: SAGBlueprint) -> List[TagSuggestion]

Purpose: Generate batch product-to-attribute assignments based on product titles/descriptions.

Algorithm:

  1. For each product:

    • Extract key terms from title and description
    • Match against attribute values (fuzzy matching allowed)
    • Score confidence for each attribute assignment
    • Rank by confidence
  2. For each attribute:

    • Verify assignment makes semantic sense
    • Check for conflicting assignments (e.g., can't be both "Shiatsu" and "EMS")
    • Return ranked list
  3. Group by product for review UI

Output Structure:

{
  "batch_id": "uuid",
  "site_id": "uuid",
  "blueprint_id": "uuid",
  "timestamp": "2026-03-23T14:30:00Z",
  "total_products": 50,
  "total_suggestions": 87,
  "suggestions": [
    {
      "product_id": "woo_123",
      "product_title": "Nekteck Foot Massager with Heat",
      "proposed_tags": [
        {
          "attribute": "Target Area",
          "value": "Foot",
          "confidence": 0.98,
          "reasoning": "Title contains 'Foot Massager'"
        },
        {
          "attribute": "Device Type",
          "value": "Shiatsu",
          "confidence": 0.82,
          "reasoning": "Description mentions shiatsu nodes"
        },
        {
          "attribute": "Heat Setting",
          "value": "Heated",
          "confidence": 0.95,
          "reasoning": "Title explicitly states 'with Heat'"
        }
      ],
      "status": "pending_review"
    }
  ],
  "summary": {
    "high_confidence_suggestions": 72,
    "medium_confidence_suggestions": 12,
    "low_confidence_suggestions": 3,
    "conflicts_detected": 0,
    "ready_to_apply": true
  }
}

3. APIs & Endpoints

3.1 Backend API Endpoints

All endpoints are authenticated via Authorization: Bearer {IGNY8_API_TOKEN} header.

POST /api/v1/sag/sites/{site_id}/analyze/

Purpose: Trigger comprehensive site analysis (async).

Request:

{
  "include_draft_products": false,
  "product_limit": 500,
  "sector_template_id": "optional_uuid",
  "webhook_url": "optional_https_url_for_completion_notification"
}

Response: 202 Accepted

{
  "task_id": "celery_task_uuid",
  "site_id": "site_uuid",
  "status": "queued",
  "estimated_duration_seconds": 120,
  "check_status_url": "/api/v1/sag/sites/{site_id}/analysis-status/?task_id={task_id}"
}

Error Responses:

  • 400: Invalid parameters
  • 401: Unauthorized
  • 404: Site not found
  • 429: Rate limited (max 1 analysis per 30 minutes per site)

GET /api/v1/sag/sites/{site_id}/analysis-status/

Purpose: Check analysis progress.

Query Parameters:

  • task_id (required): Celery task ID from analysis trigger

Response: 200 OK

{
  "task_id": "celery_task_uuid",
  "site_id": "site_uuid",
  "status": "processing",
  "progress_percent": 45,
  "current_step": "Analyzing product attributes",
  "elapsed_seconds": 32,
  "estimated_remaining_seconds": 48
}

Status Values:

  • queued — waiting to start
  • processing — actively analyzing
  • complete — analysis finished
  • failed — analysis error (see error message)

GET /api/v1/sag/sites/{site_id}/analysis-results/

Purpose: Retrieve completed analysis results.

Response: 200 OK

{
  "analysis_id": "uuid",
  "site_id": "site_uuid",
  "timestamp": "2026-03-23T14:30:00Z",
  "site_data_summary": {
    "total_products": 50,
    "total_categories": 8,
    "total_pages": 12,
    "total_posts": 8
  },
  "extracted_attributes": {
    "analysis_confidence": 0.82,
    "attributes_count": 8,
    "attributes": [
      { "name": "Target Area", "dimension": "Primary", "confidence": 0.95, ... }
    ]
  },
  "gap_analysis": {
    "overall_gap_percentage": 62,
    "summary": { ... }
  },
  "status": "ready_for_review"
}

Status Values:

  • ready_for_review — user should review before confirming
  • confirmed — user has accepted analysis
  • archived — superceded by newer analysis

POST /api/v1/sag/sites/{site_id}/confirm-analysis/

Purpose: User confirms analysis; creates SAG blueprint.

Request:

{
  "analysis_id": "uuid",
  "approved_attributes": [
    {
      "name": "Target Area",
      "approved_values": ["Neck", "Back", "Foot"],
      "exclude_values": []
    }
  ],
  "confirmed_by_user_id": "user_uuid"
}

Response: 201 Created

{
  "blueprint_id": "uuid",
  "site_id": "site_uuid",
  "analysis_id": "uuid",
  "status": "created",
  "attributes_count": 8,
  "attribute_values_count": 45,
  "created_at": "2026-03-23T14:32:00Z",
  "next_steps": [
    "Review auto-tagging suggestions",
    "Approve product tags",
    "Start content pipeline (01E)"
  ]
}

GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/

Purpose: Retrieve product auto-tagging suggestions.

Query Parameters:

  • blueprint_id (required): ID of confirmed blueprint
  • confidence_min (optional): Filter by minimum confidence (0.0-1.0, default 0.6)
  • limit (optional): Max suggestions per product (default 5)

Response: 200 OK

{
  "batch_id": "uuid",
  "blueprint_id": "blueprint_uuid",
  "total_suggestions": 87,
  "suggestions": [
    {
      "product_id": "woo_123",
      "product_title": "Nekteck Foot Massager",
      "proposed_tags": [
        {
          "attribute": "Target Area",
          "value": "Foot",
          "confidence": 0.98,
          "reasoning": "Title contains 'Foot Massager'"
        }
      ]
    }
  ]
}

POST /api/v1/sag/sites/{site_id}/auto-tag/apply/

Purpose: Apply approved product tags to site (async bulk operation).

Request:

{
  "blueprint_id": "uuid",
  "approved_suggestions": [
    {
      "product_id": "woo_123",
      "approved_tags": [
        {
          "attribute": "Target Area",
          "value": "Foot"
        }
      ]
    }
  ],
  "skip_existing_values": true
}

Response: 202 Accepted

{
  "task_id": "celery_task_uuid",
  "site_id": "site_uuid",
  "blueprint_id": "blueprint_uuid",
  "status": "processing",
  "products_to_tag": 47,
  "tags_to_apply": 87,
  "check_status_url": "/api/v1/sag/sites/{site_id}/auto-tag/status/?task_id={task_id}"
}

GET /api/v1/sag/sites/{site_id}/auto-tag/status/

Purpose: Check auto-tagging progress.

Query Parameters:

  • task_id (required): Celery task ID

Response: 200 OK

{
  "task_id": "celery_task_uuid",
  "site_id": "site_uuid",
  "status": "processing",
  "progress_percent": 62,
  "products_tagged": 29,
  "total_products": 47,
  "tags_applied": 54,
  "estimated_remaining_seconds": 30
}

3.2 WordPress Plugin Endpoint

GET /wp-json/igny8/v1/sag/site-analysis

Purpose: Collect comprehensive site data for analysis.

Headers:

  • Authorization: Bearer {IGNY8_API_TOKEN}
  • X-IGNY8-Request-ID: {uuid} (optional, for request tracking)

Query Parameters:

  • limit_products: int (1-1000, default 500)
  • include_drafts: boolean (default false)
  • cache_ttl: int (seconds, default 3600)

Response: 200 OK

{
  "metadata": {
    "site_id": "uuid",
    "domain": "example-store.com",
    "wordpress_version": "6.4.2",
    "woocommerce_version": "8.5.0",
    "total_products": 50,
    "total_categories": 8,
    "total_pages": 12,
    "total_posts": 8,
    "active_plugins": ["woocommerce", "yoast-seo", ...],
    "theme": "storefront"
  },
  "products": [
    {
      "id": "woo_123",
      "title": "Nekteck Foot Massager with Heat",
      "description": "Premium foot massage device...",
      "sku": "NEKTECK-FM-001",
      "price": 79.99,
      "categories": ["Foot Massagers", "Massage Devices"],
      "tags": ["heated", "cordless"],
      "custom_attributes": {
        "brand": ["Nekteck"],
        "color": ["Black"],
        "warranty": ["2 Year"]
      },
      "image_urls": ["image1.jpg", "image2.jpg"]
    }
  ],
  "categories": [
    {
      "id": "cat_1",
      "name": "Foot Massagers",
      "slug": "foot-massagers",
      "parent_id": null,
      "description": "Electronic foot massage devices",
      "product_count": 12
    }
  ],
  "taxonomies": [
    {
      "name": "brand",
      "label": "Brand",
      "is_hierarchical": false,
      "terms": [
        {
          "id": "brand_1",
          "name": "Nekteck",
          "slug": "nekteck",
          "parent_id": null,
          "description": "",
          "count": 5
        }
      ]
    }
  ],
  "pages": [
    {
      "id": "page_1",
      "title": "Shop",
      "url": "/shop",
      "content_summary": "Browse our selection of massage devices",
      "page_type": "shop"
    }
  ],
  "posts": [
    {
      "id": "post_1",
      "title": "Benefits of Foot Massage",
      "url": "/blog/foot-massage-benefits",
      "content_summary": "Learn why foot massage is beneficial...",
      "categories": ["Health"],
      "tags": ["foot", "massage"],
      "publish_date": "2026-03-15"
    }
  ],
  "menus": [
    {
      "id": "menu_1",
      "title": "Main Menu",
      "items": [
        {
          "id": "item_1",
          "title": "Shop",
          "url": "/shop",
          "target": "_self",
          "parent_id": null
        }
      ]
    }
  ],
  "collected_at": "2026-03-23T14:30:00Z"
}

Error Responses:

  • 400: Invalid query parameters
  • 401: Invalid or missing API token
  • 500: Plugin error (logged on WordPress side)

Performance:

  • Response time target: <5 seconds for sites with <500 products
  • Data is cached for 1 hour (configurable via cache_ttl)
  • Uses WordPress transients API for caching

4. Implementation Steps

Phase 1: Plugin Enhancement (Week 1)

Tasks:

  1. Create collector classes in plugins/igny8-sync/includes/collectors/

    • ProductCollector
    • CategoryCollector
    • TaxonomyCollector
    • AttributeCollector
    • PageCollector
    • PostCollector
    • MenuCollector
    • PluginCollector
  2. Implement DataCollectorInterface

    • collect() method (fetches raw data)
    • sanitize() method (removes PII, normalizes format)
    • Error handling per collector
  3. Add /wp-json/igny8/v1/sag/site-analysis endpoint

    • Route definition
    • Parameter validation
    • Response formatting
    • Caching logic
  4. Add unit tests for collectors

    • Mock data tests
    • Error condition tests
    • Performance tests

Acceptance Criteria:

  • Endpoint returns valid JSON payload matching schema
  • All 8 collectors implemented and tested
  • Response time <5 seconds for 500 products
  • Caching works correctly
  • Error handling tested

Phase 2: AI Attribute Extraction (Week 1-2)

Tasks:

  1. Implement attribute_extraction.py

    • Text analysis functions
    • Pattern recognition logic
    • Confidence scoring
    • Validation against sector templates
  2. Register with LLM framework

    • Implement extract_site_attributes function
    • Add input/output validation
    • Error handling (retry logic)
  3. Create data models

    • DiscoveredAttribute
    • AttributeValue
    • TemplateValidation
    • AttributeExtractionResult
  4. Add unit and integration tests

    • Mock LLM responses
    • Test with real site data
    • Confidence scoring validation
    • Performance tests (2-5 minute runtime)

Acceptance Criteria:

  • Extracts 5-20 attributes from sample site data
  • Confidence scores accurate and meaningful
  • Sector template validation works
  • Low-confidence discoveries flagged
  • Results auditable (model used, reasoning provided)

Phase 3: Gap Analysis Service (Week 2)

Tasks:

  1. Implement gap_analysis_service.py

    • GapAnalysisService class
    • analyze_gap() method
    • All 7 gap dimensions analyzed
  2. Create gap analysis models

    • GapAnalysisReport
    • Recommendation structures
    • Detail sections
  3. Integrate with blueprint comparison

    • Query SAG blueprint
    • Compare against site data
    • Calculate gap percentages
  4. Add unit tests

    • Test each gap dimension
    • Test recommendation generation
    • Test report structure

Acceptance Criteria:

  • All 7 gap dimensions analyzed
  • Report clearly identifies missing elements
  • Actionable recommendations provided
  • Report generated in <1 second

Phase 4: API Endpoints (Week 2-3)

Tasks:

  1. Implement analysis trigger endpoint

    • POST /api/v1/sag/sites/{site_id}/analyze/
    • Celery task queueing
    • Webhook support
  2. Implement status check endpoint

    • GET /api/v1/sag/sites/{site_id}/analysis-status/
    • Real-time progress updates
  3. Implement results retrieval endpoint

    • GET /api/v1/sag/sites/{site_id}/analysis-results/
    • Caching of results
  4. Implement blueprint confirmation endpoint

    • POST /api/v1/sag/sites/{site_id}/confirm-analysis/
    • Attribute approval logic
    • Blueprint creation
  5. Add request/response validation

    • Marshmallow schemas
    • Error responses
  6. Add authentication/authorization checks

    • API token validation
    • User site ownership verification

Acceptance Criteria:

  • All 4 endpoints implemented
  • Endpoints return correct status codes
  • Validation working
  • Authentication required and checked
  • Error responses follow standard format

Phase 5: Product Auto-Tagging (Week 3)

Tasks:

  1. Implement auto_tagger_service.py

    • ProductAutoTagger class
    • generate_tag_suggestions() method
    • Confidence scoring
  2. Create auto-tagging endpoints

    • GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/
    • POST /api/v1/sag/sites/{site_id}/auto-tag/apply/
    • GET /api/v1/sag/sites/{site_id}/auto-tag/status/
  3. Implement Celery task for bulk tagging

    • Batch product processing
    • Conflict detection
    • Error handling
  4. Add unit tests

    • Test suggestion generation
    • Test bulk tagging
    • Test conflict detection

Acceptance Criteria:

  • Suggestions endpoint returns valid suggestions
  • Confidence scores reasonable (0.6+)
  • Bulk tagging applies tags correctly to products
  • Progress tracking works
  • 47+ products can be tagged in <2 minutes

Phase 6: Frontend Components (Week 3-4)

Tasks:

  1. Implement SiteAnalysisPanel

    • Trigger analysis button
    • Progress indicator
    • Error messaging
  2. Implement DiscoveredAttributesReview

    • Display discovered attributes
    • Show confidence scores
    • Allow approval/rejection per attribute
    • Show example products
  3. Implement GapAnalysisReport

    • Visual representation of gaps
    • Actionable recommendations
    • Priority ordering
  4. Implement AutoTagReviewPanel

    • Display product suggestions
    • Batch selection/deselection
    • Apply tags button
    • Progress tracking
  5. Add styling and UX polish

    • Responsive design
    • Loading states
    • Error states
    • Success confirmations

Acceptance Criteria:

  • All 4 components implemented
  • Responsive on desktop/tablet
  • Accessible (WCAG 2.1 AA)
  • User can complete workflow without errors
  • Loading/error states clearly communicated

Phase 7: Integration & Testing (Week 4)

Tasks:

  1. End-to-end testing

    • Connect real WordPress site
    • Run full analysis workflow
    • Confirm blueprint created
    • Verify auto-tagging works
  2. Performance testing

    • Benchmark analysis with various site sizes
    • Optimize slow operations
    • Load testing on API endpoints
  3. Documentation

    • API documentation (OpenAPI/Swagger)
    • Plugin setup guide
    • User guide for Case 1 workflow
    • Developer setup guide
  4. Bug fixing and refinement

    • Fix integration issues
    • Refine UI/UX based on testing
    • Improve error messages

Acceptance Criteria:

  • End-to-end workflow works without errors
  • Performance meets targets (analysis <5 min for 500 products)
  • Documentation complete
  • All bugs fixed
  • Ready for beta testing

5. Acceptance Criteria

5.1 Functional Requirements

Site Data Collection:

  • Plugin collects all 8 data types (products, categories, taxonomies, pages, posts, menus, attributes, metadata)
  • Data is valid JSON matching defined schema
  • All product titles/descriptions included
  • Custom attribute values extracted correctly
  • Menu hierarchy preserved

Attribute Extraction:

  • AI identifies 5-20 attributes from site data
  • Confidence scores meaningful and accurate
  • Low-confidence discoveries flagged
  • Sector template validation working
  • Results include frequency counts and example products

Gap Analysis:

  • All 7 gap dimensions analyzed
  • Missing hubs, term pages, blog posts clearly identified
  • Product attribute coverage calculated
  • Internal linking gaps identified
  • Actionable recommendations provided

Blueprint Creation:

  • Confirmed analysis creates valid SAGBlueprint
  • Attributes and values recorded correctly
  • Gap analysis linked to blueprint
  • Blueprint feeds into cluster formation (01C)

Product Auto-Tagging:

  • Suggestions generated for 90%+ of products
  • Confidence scores reasonable (0.6+)
  • Bulk tagging applies tags correctly
  • No data loss or corruption
  • Existing tags not overwritten (configurable)

API Endpoints:

  • All 4 analysis endpoints implemented
  • All 3 auto-tagging endpoints implemented
  • Correct HTTP status codes
  • Valid error responses
  • Authentication required

Frontend Components:

  • SiteAnalysisPanel triggers analysis and shows progress
  • DiscoveredAttributesReview allows attribute approval
  • GapAnalysisReport displays gaps clearly
  • AutoTagReviewPanel allows batch product tagging
  • All components responsive and accessible

5.2 Non-Functional Requirements

Performance:

  • Site analysis completes in <5 minutes for typical sites (50-500 products)
  • WordPress plugin endpoint responds in <5 seconds
  • API endpoints respond in <2 seconds
  • Frontend components load in <3 seconds

Reliability:

  • Plugin handles errors gracefully (missing products, etc.)
  • Partial failures return partial data with warnings
  • Celery tasks have retry logic
  • Webhook notifications reliable

Security:

  • API token authentication required
  • User can only access own sites
  • No PII in logs
  • HTTPS enforced
  • Input validation on all endpoints

Scalability:

  • Plugin handles 1000+ products
  • API handles 100+ concurrent analysis requests
  • Database indexes optimized for queries
  • Caching prevents redundant processing

Data Quality:

  • Analysis results auditable (model used, timestamps, reasoning)
  • No duplicate attribute suggestions
  • Confidence scores calibrated
  • Low-confidence results flagged for review

5.3 User Experience Requirements

Clarity:

  • User understands analysis process and time required
  • Gap analysis clearly shows what's missing
  • Recommendations are actionable
  • Error messages explain what went wrong

Simplicity:

  • Workflow is 4-5 steps (analyze → review → confirm → auto-tag → apply)
  • One button to trigger analysis
  • Clear next steps after each stage

Feedback:

  • Real-time progress updates during analysis
  • Success/error notifications
  • Ability to view raw analysis results
  • Audit trail of approvals

6. Claude Code Instructions

6.1 Skill Development

Skill Name: igny8-case1-analysis Version: 2.0 Prerequisites: IGNY8 platform deployed, WordPress plugin v2.0+, Celery configured

Skill Workflow:

Trigger: User connects existing WordPress site to IGNY8

Step 1: Collect Site Data
  - Call: POST /api/v1/sag/sites/{site_id}/analyze/
  - Wait: Poll /api/v1/sag/sites/{site_id}/analysis-status/ every 10 seconds
  - Timeout: 5 minutes
  - Output: task_id for tracking

Step 2: Retrieve Analysis Results
  - Call: GET /api/v1/sag/sites/{site_id}/analysis-results/
  - Parse: extracted_attributes, gap_analysis
  - Display: DiscoveredAttributesReview panel
  - User action: Approve/reject attributes

Step 3: Confirm Analysis
  - Call: POST /api/v1/sag/sites/{site_id}/confirm-analysis/
  - Payload: approved_attributes from user review
  - Output: blueprint_id
  - Display: Gap analysis report
  - Next: Show auto-tagging recommendations

Step 4: Generate Auto-Tag Suggestions
  - Call: GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/?blueprint_id={blueprint_id}
  - Display: AutoTagReviewPanel
  - User action: Select products to tag

Step 5: Apply Auto-Tags
  - Call: POST /api/v1/sag/sites/{site_id}/auto-tag/apply/
  - Wait: Poll /api/v1/sag/sites/{site_id}/auto-tag/status/ every 5 seconds
  - Timeout: 10 minutes
  - Output: Number of tags applied, products tagged

Step 6: Complete & Next Steps
  - Display: Success message
  - Recommendations: Run cluster formation (01C), start content pipeline (01E)
  - Links: View blueprint, view gap report, start cluster creation

6.2 Development Checklist

Code Quality:

  • All functions have docstrings
  • Type hints on all function parameters and returns
  • Logging at DEBUG, INFO, WARNING levels as appropriate
  • Error handling with specific exception types
  • No hardcoded values (use config/env vars)

Testing:

  • Unit tests for each service (>80% coverage)
  • Integration tests for API endpoints
  • Fixtures for sample site data
  • Mock LLM responses for deterministic tests
  • Performance tests for analysis (time and memory)

Documentation:

  • Docstrings follow Google style
  • README with setup instructions
  • API documentation in OpenAPI format
  • Example requests/responses for each endpoint
  • Troubleshooting guide for common errors

Security:

  • API token validation on all endpoints
  • User ownership checks before accessing site data
  • Input validation with Marshmallow
  • SQL injection prevention (use ORM)
  • No credentials in logs or errors

Performance:

  • Database queries indexed
  • Caching implemented for plugin endpoint
  • Celery task optimization
  • LLM API call batching
  • Frontend component lazy loading

6.3 Debugging & Troubleshooting

Common Issues:

Issue: Analysis hangs or times out

  • Check: Celery worker status (celery -A sag inspect active)
  • Check: Redis/message queue status
  • Check: LLM API rate limits
  • Solution: Reduce product limit, retry analysis

Issue: Plugin endpoint returns partial data

  • Check: Specific collector failure (check logs)
  • Solution: Fix collector, re-run analysis (uses cache bypass)
  • Note: Partial data is returned if one collector fails

Issue: Auto-tagging misses products

  • Check: Product title/description quality (missing keywords)
  • Check: Confidence threshold (lower if needed)
  • Solution: Review low-confidence suggestions, adjust threshold

Issue: Gap analysis shows 100% gaps

  • Check: Blueprint created correctly
  • Check: Gap analysis query (verify site_id matches)
  • Solution: Re-run analysis, confirm blueprint

6.4 Integration Checkpoints

Integration with 01A (SAGBlueprint):

  • Confirmed analysis creates SAGBlueprint via POST /api/v1/sag/sites/{site_id}/confirm-analysis/
  • Blueprint includes extracted attributes and values
  • Blueprint links to analysis for audit trail
  • Blueprint ready for cluster formation (01C)

Integration with 01B (Sector Templates):

  • Attribute extraction uses sector template for validation (optional parameter)
  • Alignment scores show how closely discovered attributes match template
  • Low-confidence discoveries flagged if they don't align with template
  • Template selection based on site category detection

Integration with 01C (Cluster Formation):

  • Blueprint created from Case 1 analysis feeds into cluster formation
  • Attributes and values used to create cluster hierarchies
  • Cluster formation references blueprint_id for traceability
  • Can override clusters if needed

Integration with 01E (Content Pipeline):

  • Blueprint creation triggers content pipeline pre-planning
  • Gap analysis informs content prioritization
  • Hub page templates created for missing clusters
  • Blog post outlines generated for content gaps

Integration with 01G (Health Monitoring):

  • Analysis metrics stored for health dashboard
  • Gap analysis metrics tracked over time
  • Product attribute coverage tracked
  • Auto-tagging success rate monitored

  • 01A: SAGBlueprint Definition — Output of Case 1 analysis
  • 01B: Sector Templates — Used for attribute validation
  • 01C: Cluster Formation — Consumes SAGBlueprint from Case 1
  • 01D: Case 2 Wizard — Alternative path for new sites
  • 01E: Content Pipeline — Feeds blueprint and gap analysis
  • 01G: Health Monitoring — Tracks analysis and enrichment metrics

8. Glossary

  • SAG: Semantic Attribute Grid — the structured product attribute framework
  • Attribute: A dimension of product information (e.g., "Target Area," "Device Type")
  • Attribute Value: A specific instance of an attribute (e.g., "Foot" for Target Area)
  • Cluster: A group of related attribute values forming a content hub
  • Gap: Missing element compared to SAG blueprint (hub pages, term pages, blog posts, etc.)
  • Confidence Score: AI's confidence in discovered attribute (0.0-1.0)
  • Dimension: Priority level of attribute (Primary, Secondary, Tertiary)
  • Term Landing Page: Single-page optimized for specific attribute value
  • Hub Page: Authority page for entire attribute cluster
  • Auto-Tagging: Bulk assignment of attributes to products

Document Status: Ready for Development Last Review: 2026-03-23 Next Review: Post-Phase 2 Development