Files

IGNY8 VPS (Salman) 128b186865 temproary docs uplaoded

2026-03-23 09:02:49 +00:00

41 KiB

Raw Blame History

01F: IGNY8 Phase 1 — Existing Site Analysis (Case 1)

Document Type: Build Specification Phase: Phase 1: Existing Site Analysis Use Case: Case 1 (Users with existing sites) Status: Active Development Last Updated: 2026-03-23

1. Current State

1.1 Existing IGNY8 WordPress Plugin

The IGNY8 WordPress plugin is currently operational with the following capabilities:

Current Data Collection:

Post status tracking
Site metadata (domain, WordPress version, plugin count, theme)
Keyword mapping and analysis
Site structure analysis
Taxonomy sync across registered taxonomies
7 active cron jobs managing periodic data updates

Current Plugin Endpoint:

GET /wp-json/igny8/v1/health — basic health check
Plugin location: WordPress plugins directory
Sync frequency: Configurable via cron (daily default)

Limitations:

Does not collect detailed product data (WooCommerce stores)
Does not analyze product descriptions for attribute patterns
No collection of custom attribute assignments
No menu structure analysis
No blog content summary extraction
No confidence scoring for discovered patterns
Manual attribute creation required post-analysis

1.2 Case 1 User Journey

Trigger: User logs into IGNY8 platform with existing WordPress site (WooCommerce-based)

Current Flow:

User connects WordPress site via API key
Plugin syncs basic site data
User manually creates SAG blueprint
User manually defines attributes
User manually tags existing products

Desired Flow:

User connects WordPress site via API key
Plugin collects comprehensive site data (products, categories, content)
AI automatically extracts attributes from product titles/descriptions
System generates SAG blueprint with discovered attributes
System performs gap analysis (what's missing vs. SAG template)
User reviews and confirms blueprint
System auto-tags existing products
Blueprint feeds into content pipeline (01E) and cluster formation (01C)

1.3 Dependencies & Prerequisites

WordPress 5.8+ with WooCommerce 5.0+
IGNY8 plugin v2.0+ installed and activated
OpenAI API or compatible LLM for attribute extraction
Celery for async task processing (analysis may take 2-5 minutes)
Database schema supports site analysis metadata storage
Sector templates (01B) available for validation

2. What to Build

2.1 Enhanced Plugin: Site Data Collection

Objective: Extend WordPress plugin to collect comprehensive site data for SAG analysis.

New Plugin Endpoint:

GET /wp-json/igny8/v1/sag/site-analysis
Headers: Authorization: Bearer {IGNY8_API_TOKEN}
Query Parameters:
  - limit_products: 500 (max products to analyze; default 500)
  - include_drafts: false (include draft products; default false)
  - cache_ttl: 3600 (cache results for N seconds; default 3600)

Response: 200 OK with payload (see section 2.3)

Data Collection Modules:

Module	Responsibility	Data Returned
ProductCollector	Extract all products with metadata	titles, descriptions, prices, categories, tags, images, custom attributes, sku
CategoryCollector	Map product category hierarchy	names, slugs, parent-child hierarchy, descriptions, product counts
TaxonomyCollector	Enumerate all custom taxonomies	taxonomy names, all registered terms, term hierarchies, term metadata
AttributeCollector	Extract WooCommerce attributes	attribute names, attribute types (select/text/color), all values, product assignments
PageCollector	Identify key pages	titles, URLs, content summaries (first 500 chars), page type detection
PostCollector	Extract blog posts	titles, URLs, content summaries, categories, tags, publish date
MenuCollector	Analyze navigation structure	menu items, hierarchy, target URLs/categories
PluginCollector	Document site technical stack	active plugins, theme, WordPress version, WooCommerce version

Implementation:

Location: plugins/igny8-sync/includes/collectors/
Each collector implements DataCollectorInterface with collect() and sanitize() methods
Data sanitization: Remove PII, HTML tags, limit text length
Error handling: Log failures per collector, return partial data if one collector fails
Performance: Optimize queries to avoid site slowdown (use transients, batch operations)

Plugin Cron Job Addition:

New job: igny8_sync_sag_site_analysis (optional, runs if user triggers analysis)
Frequency: On-demand via API call, not scheduled
Timeout: 60 seconds (analysis itself happens server-side via Celery)

2.2 AI Attribute Extraction Service

File: sag/ai_functions/attribute_extraction.py Register Key: extract_site_attributes Input Type: SiteAnalysisPayload Output Type: AttributeExtractionResult

Function Signature:

def extract_site_attributes(
    site_data: SiteAnalysisPayload,
    sector_template: Optional[SectorTemplate] = None,
    confidence_threshold: float = 0.6,
    max_attributes: int = 20
) -> AttributeExtractionResult:
    """
    Analyze site data to discover attributes.

    Args:
        site_data: Raw site data from WordPress plugin
        sector_template: Optional sector template for validation
        confidence_threshold: Min confidence to include attribute (0.0-1.0)
        max_attributes: Max attributes to return

    Returns:
        AttributeExtractionResult with discovered attributes, frequencies, confidence scores
    """

Algorithm:

Text Analysis Phase
- Concatenate product titles and descriptions
- Apply tokenization and noun phrase extraction
- Identify recurring modifiers and descriptors
- Extract from category names and tags
- Extract from custom attribute values (if any exist)
Pattern Recognition Phase
- Group similar terms (e.g., "back pain" + "back relief" + "lower back" → "back/spine")
- Calculate frequency across product dataset
- Identify dimensional axes (e.g., "target area," "device type")
- Score statistical significance
Validation Phase
- Cross-reference against sector template (if provided)
- Validate against common attribute taxonomies
- Flag conflicting or ambiguous discoveries
- Assign confidence scores based on:
  - Frequency (how often appears)
  - Consistency (appears across multiple products)
  - Specificity (not too vague)
  - Template alignment (matches known attributes)
Ranking Phase
- Rank by frequency and confidence
- Assign dimensionality (Primary/Secondary/Tertiary)
- Cap results at max_attributes

Output Structure:

{
  "analysis_id": "uuid",
  "site_id": "uuid",
  "timestamp": "2026-03-23T14:30:00Z",
  "analysis_confidence": 0.82,
  "attributes": [
    {
      "name": "Target Area",
      "dimension": "Primary",
      "confidence": 0.95,
      "frequency": 32,
      "discovered_from": ["product_titles", "product_descriptions", "categories"],
      "values": [
        {
          "value": "Neck",
          "frequency": 12,
          "example_products": ["Product A", "Product B"]
        },
        {
          "value": "Back",
          "frequency": 8,
          "example_products": ["Product C"]
        },
        {
          "value": "Foot",
          "frequency": 25,
          "example_products": ["Product D", "Product E"]
        }
      ],
      "template_validation": {
        "matched_sector": "massage_devices",
        "matched_attribute": "body_region",
        "alignment_score": 0.98
      }
    },
    {
      "name": "Device Type",
      "dimension": "Primary",
      "confidence": 0.88,
      "frequency": 28,
      "discovered_from": ["product_titles", "product_descriptions"],
      "values": [
        {
          "value": "Shiatsu",
          "frequency": 18,
          "example_products": ["Product F"]
        },
        {
          "value": "EMS",
          "frequency": 7,
          "example_products": ["Product G"]
        },
        {
          "value": "Percussion",
          "frequency": 3,
          "example_products": ["Product H"]
        }
      ],
      "template_validation": {
        "matched_sector": "massage_devices",
        "matched_attribute": "therapy_type",
        "alignment_score": 0.91
      }
    },
    {
      "name": "Heat Setting",
      "dimension": "Secondary",
      "confidence": 0.72,
      "frequency": 15,
      "discovered_from": ["product_descriptions"],
      "values": [
        {
          "value": "Heated",
          "frequency": 15,
          "example_products": ["Product I", "Product J"]
        }
      ],
      "template_validation": {
        "matched_sector": "massage_devices",
        "matched_attribute": "heat_enabled",
        "alignment_score": 0.85
      }
    }
  ],
  "low_confidence_discoveries": [
    {
      "name": "Brand",
      "confidence": 0.55,
      "reason": "High variability, many single-mention values"
    }
  ],
  "analysis_notes": {
    "total_products_analyzed": 50,
    "total_categories": 8,
    "total_tags": 23,
    "extraction_method": "llm_analysis",
    "model_used": "gpt-4-turbo"
  }
}

Error Handling:

Insufficient data: Log warning, return empty attributes list
LLM API failure: Retry with exponential backoff (3 retries)
Timeout (>5 minutes): Abort and return partial results
Invalid sector template: Log error, continue analysis without validation

Performance Considerations:

Cache sector templates in memory
Batch LLM calls (process 5-10 products per API call)
Store extraction results in database for audit trail
Return results within 2-5 minutes for typical sites

2.3 Data Models

SiteAnalysisPayload

from dataclasses import dataclass
from typing import List, Dict, Optional

@dataclass
class Product:
    id: str
    title: str
    description: str
    sku: str
    price: float
    categories: List[str]
    tags: List[str]
    custom_attributes: Dict[str, List[str]]
    image_urls: List[str]

@dataclass
class Category:
    id: str
    name: str
    slug: str
    parent_id: Optional[str]
    description: str
    product_count: int

@dataclass
class Taxonomy:
    name: str
    label: str
    is_hierarchical: bool
    terms: List['Term']

@dataclass
class Term:
    id: str
    name: str
    slug: str
    parent_id: Optional[str]
    description: str
    count: int

@dataclass
class Page:
    id: str
    title: str
    url: str
    content_summary: str
    page_type: str  # e.g., "shop", "landing", "faq"

@dataclass
class Post:
    id: str
    title: str
    url: str
    content_summary: str
    categories: List[str]
    tags: List[str]
    publish_date: str

@dataclass
class MenuItem:
    id: str
    title: str
    url: str
    target: str
    parent_id: Optional[str]

@dataclass
class SiteMetadata:
    site_id: str
    domain: str
    wordpress_version: str
    woocommerce_version: str
    total_products: int
    total_categories: int
    total_pages: int
    total_posts: int
    active_plugins: List[str]
    theme: str

@dataclass
class SiteAnalysisPayload:
    metadata: SiteMetadata
    products: List[Product]
    categories: List[Category]
    taxonomies: List[Taxonomy]
    pages: List[Page]
    posts: List[Post]
    menus: List[MenuItem]
    collected_at: str  # ISO 8601 timestamp

AttributeExtractionResult

@dataclass
class AttributeValue:
    value: str
    frequency: int
    example_products: List[str]

@dataclass
class TemplateValidation:
    matched_sector: str
    matched_attribute: str
    alignment_score: float

@dataclass
class DiscoveredAttribute:
    name: str
    dimension: str  # "Primary", "Secondary", "Tertiary"
    confidence: float  # 0.0-1.0
    frequency: int
    discovered_from: List[str]  # ["product_titles", "product_descriptions", ...]
    values: List[AttributeValue]
    template_validation: Optional[TemplateValidation]

@dataclass
class LowConfideryDiscovery:
    name: str
    confidence: float
    reason: str

@dataclass
class AnalysisNotes:
    total_products_analyzed: int
    total_categories: int
    total_tags: int
    extraction_method: str
    model_used: str

@dataclass
class AttributeExtractionResult:
    analysis_id: str
    site_id: str
    timestamp: str
    analysis_confidence: float
    attributes: List[DiscoveredAttribute]
    low_confidence_discoveries: List[LowConfideryDiscovery]
    analysis_notes: AnalysisNotes

2.4 Gap Analysis Service

File: sag/services/gap_analysis_service.py Class: GapAnalysisService Method: analyze_gap(site_data: SiteAnalysisPayload, blueprint: SAGBlueprint) -> GapAnalysisReport

Purpose: Compare existing site structure against SAG blueprint to identify gaps.

Analysis Dimensions:

Attribute Coverage Gap
- SAG blueprint specifies X attributes
- Site currently has Y custom attributes assigned to products
- Gap: Missing attributes or low coverage (% of products with attribute values)
Hub Page Gap
- Blueprint specifies Z cluster hubs
- Site analysis identifies M existing pages
- Gap: Missing hub pages (authority pages for attribute clusters)
Term Landing Page Gap
- Blueprint specifies N attribute values requiring term landing pages
- Site has existing category/tag pages
- Gap: Missing term landing pages (one per attribute value)
Blog Content Gap
- Blueprint specifies recommended blog posts per cluster
- Site has P existing blog posts
- Gap: Blog content aligned to clusters and keyword targets
Internal Linking Gap
- Blueprint specifies internal linking strategy
- Site has current internal link structure
- Gap: Missing cross-cluster and term-to-hub links
Product Enrichment Gap
- Products lacking attribute assignments
- Products missing description optimization
- Products missing images
Technical SEO Gap
- Missing schema markup for products
- Category pages lacking optimization
- Menu structure not optimized for crawlability

Output Structure:

{
  "analysis_id": "uuid",
  "site_id": "uuid",
  "blueprint_id": "uuid",
  "timestamp": "2026-03-23T14:30:00Z",
  "summary": {
    "products_current": 50,
    "products_gap": 0,
    "attributes_current": 3,
    "attributes_blueprint": 8,
    "attributes_gap": 5,
    "hub_pages_current": 2,
    "hub_pages_blueprint": 4,
    "hub_pages_gap": 2,
    "term_pages_current": 12,
    "term_pages_blueprint": 35,
    "term_pages_gap": 23,
    "blog_posts_current": 8,
    "blog_posts_blueprint": 24,
    "blog_posts_gap": 16,
    "overall_gap_percentage": 62
  },
  "attributes_gap_detail": [
    {
      "attribute": "Target Area",
      "coverage_current": "100% (50/50)",
      "coverage_blueprint": "100% (50/50)",
      "gap": "None — attribute well-covered"
    },
    {
      "attribute": "Device Type",
      "coverage_current": "80% (40/50)",
      "coverage_blueprint": "100% (50/50)",
      "gap": "10 products missing Device Type assignment"
    }
  ],
  "hub_pages_gap_detail": [
    {
      "cluster": "Foot Massagers",
      "status": "EXISTS",
      "url": "/shop/foot-massagers",
      "optimization_notes": "Good; consider adding testimonials section"
    },
    {
      "cluster": "Neck & Shoulder Relief",
      "status": "MISSING",
      "recommendation": "Create hub page at /neck-shoulder-relief"
    }
  ],
  "term_pages_gap_detail": [
    {
      "attribute": "Target Area",
      "term": "Neck",
      "status": "MISSING",
      "recommendation": "Create term page at /target-area/neck (products filter + blog links)"
    }
  ],
  "blog_posts_gap_detail": [
    {
      "cluster": "Foot Massagers",
      "recommended_posts": [
        "Best Foot Massagers for Neuropathy",
        "How to Use Shiatsu Foot Massagers",
        "Foot Massage Benefits"
      ],
      "existing_posts": [
        "Foot Massage 101"
      ],
      "gap": 2
    }
  ],
  "internal_linking_gap": {
    "status": "High gaps identified",
    "recommendation": "Blueprint specifies 3-5 internal links per hub page; current average: 1.2",
    "priority_links": [
      "Neck hub → Foot hub (shared body region cluster)",
      "Device Type pages → Hub pages",
      "Blog posts → Related term pages"
    ]
  },
  "actionable_recommendations": [
    "IMMEDIATE: Assign Device Type to 10 untagged products",
    "WEEK 1: Create 2 missing hub pages",
    "WEEK 2: Create 23 term landing pages via script",
    "WEEK 3: Bulk create 16 blog posts (outline + AI generation)",
    "WEEK 4: Implement internal linking strategy"
  ]
}

2.5 Product Auto-Tagging Service

File: sag/services/auto_tagger_service.py Class: ProductAutoTagger Method: generate_tag_suggestions(products: List[Product], attributes: List[DiscoveredAttribute], blueprint: SAGBlueprint) -> List[TagSuggestion]

Purpose: Generate batch product-to-attribute assignments based on product titles/descriptions.

Algorithm:

For each product:
- Extract key terms from title and description
- Match against attribute values (fuzzy matching allowed)
- Score confidence for each attribute assignment
- Rank by confidence
For each attribute:
- Verify assignment makes semantic sense
- Check for conflicting assignments (e.g., can't be both "Shiatsu" and "EMS")
- Return ranked list
Group by product for review UI

Output Structure:

{
  "batch_id": "uuid",
  "site_id": "uuid",
  "blueprint_id": "uuid",
  "timestamp": "2026-03-23T14:30:00Z",
  "total_products": 50,
  "total_suggestions": 87,
  "suggestions": [
    {
      "product_id": "woo_123",
      "product_title": "Nekteck Foot Massager with Heat",
      "proposed_tags": [
        {
          "attribute": "Target Area",
          "value": "Foot",
          "confidence": 0.98,
          "reasoning": "Title contains 'Foot Massager'"
        },
        {
          "attribute": "Device Type",
          "value": "Shiatsu",
          "confidence": 0.82,
          "reasoning": "Description mentions shiatsu nodes"
        },
        {
          "attribute": "Heat Setting",
          "value": "Heated",
          "confidence": 0.95,
          "reasoning": "Title explicitly states 'with Heat'"
        }
      ],
      "status": "pending_review"
    }
  ],
  "summary": {
    "high_confidence_suggestions": 72,
    "medium_confidence_suggestions": 12,
    "low_confidence_suggestions": 3,
    "conflicts_detected": 0,
    "ready_to_apply": true
  }
}

3. APIs & Endpoints

3.1 Backend API Endpoints

All endpoints are authenticated via Authorization: Bearer {IGNY8_API_TOKEN} header.

POST /api/v1/sag/sites/{site_id}/analyze/

Purpose: Trigger comprehensive site analysis (async).

Request:

{
  "include_draft_products": false,
  "product_limit": 500,
  "sector_template_id": "optional_uuid",
  "webhook_url": "optional_https_url_for_completion_notification"
}

Response: 202 Accepted

{
  "task_id": "celery_task_uuid",
  "site_id": "site_uuid",
  "status": "queued",
  "estimated_duration_seconds": 120,
  "check_status_url": "/api/v1/sag/sites/{site_id}/analysis-status/?task_id={task_id}"
}

Error Responses:

400: Invalid parameters
401: Unauthorized
404: Site not found
429: Rate limited (max 1 analysis per 30 minutes per site)

GET /api/v1/sag/sites/{site_id}/analysis-status/

Purpose: Check analysis progress.

Query Parameters:

task_id (required): Celery task ID from analysis trigger

Response: 200 OK

{
  "task_id": "celery_task_uuid",
  "site_id": "site_uuid",
  "status": "processing",
  "progress_percent": 45,
  "current_step": "Analyzing product attributes",
  "elapsed_seconds": 32,
  "estimated_remaining_seconds": 48
}

Status Values:

queued — waiting to start
processing — actively analyzing
complete — analysis finished
failed — analysis error (see error message)

GET /api/v1/sag/sites/{site_id}/analysis-results/

Purpose: Retrieve completed analysis results.

Response: 200 OK

{
  "analysis_id": "uuid",
  "site_id": "site_uuid",
  "timestamp": "2026-03-23T14:30:00Z",
  "site_data_summary": {
    "total_products": 50,
    "total_categories": 8,
    "total_pages": 12,
    "total_posts": 8
  },
  "extracted_attributes": {
    "analysis_confidence": 0.82,
    "attributes_count": 8,
    "attributes": [
      { "name": "Target Area", "dimension": "Primary", "confidence": 0.95, ... }
    ]
  },
  "gap_analysis": {
    "overall_gap_percentage": 62,
    "summary": { ... }
  },
  "status": "ready_for_review"
}

Status Values:

ready_for_review — user should review before confirming
confirmed — user has accepted analysis
archived — superceded by newer analysis

POST /api/v1/sag/sites/{site_id}/confirm-analysis/

Purpose: User confirms analysis; creates SAG blueprint.

Request:

{
  "analysis_id": "uuid",
  "approved_attributes": [
    {
      "name": "Target Area",
      "approved_values": ["Neck", "Back", "Foot"],
      "exclude_values": []
    }
  ],
  "confirmed_by_user_id": "user_uuid"
}

Response: 201 Created

{
  "blueprint_id": "uuid",
  "site_id": "site_uuid",
  "analysis_id": "uuid",
  "status": "created",
  "attributes_count": 8,
  "attribute_values_count": 45,
  "created_at": "2026-03-23T14:32:00Z",
  "next_steps": [
    "Review auto-tagging suggestions",
    "Approve product tags",
    "Start content pipeline (01E)"
  ]
}

GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/

Purpose: Retrieve product auto-tagging suggestions.

Query Parameters:

blueprint_id (required): ID of confirmed blueprint
confidence_min (optional): Filter by minimum confidence (0.0-1.0, default 0.6)
limit (optional): Max suggestions per product (default 5)

Response: 200 OK

{
  "batch_id": "uuid",
  "blueprint_id": "blueprint_uuid",
  "total_suggestions": 87,
  "suggestions": [
    {
      "product_id": "woo_123",
      "product_title": "Nekteck Foot Massager",
      "proposed_tags": [
        {
          "attribute": "Target Area",
          "value": "Foot",
          "confidence": 0.98,
          "reasoning": "Title contains 'Foot Massager'"
        }
      ]
    }
  ]
}

POST /api/v1/sag/sites/{site_id}/auto-tag/apply/

Purpose: Apply approved product tags to site (async bulk operation).

Request:

{
  "blueprint_id": "uuid",
  "approved_suggestions": [
    {
      "product_id": "woo_123",
      "approved_tags": [
        {
          "attribute": "Target Area",
          "value": "Foot"
        }
      ]
    }
  ],
  "skip_existing_values": true
}

Response: 202 Accepted

{
  "task_id": "celery_task_uuid",
  "site_id": "site_uuid",
  "blueprint_id": "blueprint_uuid",
  "status": "processing",
  "products_to_tag": 47,
  "tags_to_apply": 87,
  "check_status_url": "/api/v1/sag/sites/{site_id}/auto-tag/status/?task_id={task_id}"
}

GET /api/v1/sag/sites/{site_id}/auto-tag/status/

Purpose: Check auto-tagging progress.

Query Parameters:

task_id (required): Celery task ID

Response: 200 OK

{
  "task_id": "celery_task_uuid",
  "site_id": "site_uuid",
  "status": "processing",
  "progress_percent": 62,
  "products_tagged": 29,
  "total_products": 47,
  "tags_applied": 54,
  "estimated_remaining_seconds": 30
}

3.2 WordPress Plugin Endpoint

GET /wp-json/igny8/v1/sag/site-analysis

Purpose: Collect comprehensive site data for analysis.

Headers:

Authorization: Bearer {IGNY8_API_TOKEN}
X-IGNY8-Request-ID: {uuid} (optional, for request tracking)

Query Parameters:

limit_products: int (1-1000, default 500)
include_drafts: boolean (default false)
cache_ttl: int (seconds, default 3600)

Response: 200 OK

{
  "metadata": {
    "site_id": "uuid",
    "domain": "example-store.com",
    "wordpress_version": "6.4.2",
    "woocommerce_version": "8.5.0",
    "total_products": 50,
    "total_categories": 8,
    "total_pages": 12,
    "total_posts": 8,
    "active_plugins": ["woocommerce", "yoast-seo", ...],
    "theme": "storefront"
  },
  "products": [
    {
      "id": "woo_123",
      "title": "Nekteck Foot Massager with Heat",
      "description": "Premium foot massage device...",
      "sku": "NEKTECK-FM-001",
      "price": 79.99,
      "categories": ["Foot Massagers", "Massage Devices"],
      "tags": ["heated", "cordless"],
      "custom_attributes": {
        "brand": ["Nekteck"],
        "color": ["Black"],
        "warranty": ["2 Year"]
      },
      "image_urls": ["image1.jpg", "image2.jpg"]
    }
  ],
  "categories": [
    {
      "id": "cat_1",
      "name": "Foot Massagers",
      "slug": "foot-massagers",
      "parent_id": null,
      "description": "Electronic foot massage devices",
      "product_count": 12
    }
  ],
  "taxonomies": [
    {
      "name": "brand",
      "label": "Brand",
      "is_hierarchical": false,
      "terms": [
        {
          "id": "brand_1",
          "name": "Nekteck",
          "slug": "nekteck",
          "parent_id": null,
          "description": "",
          "count": 5
        }
      ]
    }
  ],
  "pages": [
    {
      "id": "page_1",
      "title": "Shop",
      "url": "/shop",
      "content_summary": "Browse our selection of massage devices",
      "page_type": "shop"
    }
  ],
  "posts": [
    {
      "id": "post_1",
      "title": "Benefits of Foot Massage",
      "url": "/blog/foot-massage-benefits",
      "content_summary": "Learn why foot massage is beneficial...",
      "categories": ["Health"],
      "tags": ["foot", "massage"],
      "publish_date": "2026-03-15"
    }
  ],
  "menus": [
    {
      "id": "menu_1",
      "title": "Main Menu",
      "items": [
        {
          "id": "item_1",
          "title": "Shop",
          "url": "/shop",
          "target": "_self",
          "parent_id": null
        }
      ]
    }
  ],
  "collected_at": "2026-03-23T14:30:00Z"
}

Error Responses:

400: Invalid query parameters
401: Invalid or missing API token
500: Plugin error (logged on WordPress side)

Performance:

Response time target: <5 seconds for sites with <500 products
Data is cached for 1 hour (configurable via cache_ttl)
Uses WordPress transients API for caching

4. Implementation Steps

Phase 1: Plugin Enhancement (Week 1)

Tasks:

Create collector classes in plugins/igny8-sync/includes/collectors/
- ProductCollector
- CategoryCollector
- TaxonomyCollector
- AttributeCollector
- PageCollector
- PostCollector
- MenuCollector
- PluginCollector
Implement DataCollectorInterface
- collect() method (fetches raw data)
- sanitize() method (removes PII, normalizes format)
- Error handling per collector
Add /wp-json/igny8/v1/sag/site-analysis endpoint
- Route definition
- Parameter validation
- Response formatting
- Caching logic
Add unit tests for collectors
- Mock data tests
- Error condition tests
- Performance tests

Acceptance Criteria:

Endpoint returns valid JSON payload matching schema
All 8 collectors implemented and tested
Response time <5 seconds for 500 products
Caching works correctly
Error handling tested

Phase 2: AI Attribute Extraction (Week 1-2)

Tasks:

Implement attribute_extraction.py
- Text analysis functions
- Pattern recognition logic
- Confidence scoring
- Validation against sector templates
Register with LLM framework
- Implement extract_site_attributes function
- Add input/output validation
- Error handling (retry logic)
Create data models
- DiscoveredAttribute
- AttributeValue
- TemplateValidation
- AttributeExtractionResult
Add unit and integration tests
- Mock LLM responses
- Test with real site data
- Confidence scoring validation
- Performance tests (2-5 minute runtime)

Acceptance Criteria:

Extracts 5-20 attributes from sample site data
Confidence scores accurate and meaningful
Sector template validation works
Low-confidence discoveries flagged
Results auditable (model used, reasoning provided)

Phase 3: Gap Analysis Service (Week 2)

Tasks:

Implement gap_analysis_service.py
- GapAnalysisService class
- analyze_gap() method
- All 7 gap dimensions analyzed
Create gap analysis models
- GapAnalysisReport
- Recommendation structures
- Detail sections
Integrate with blueprint comparison
- Query SAG blueprint
- Compare against site data
- Calculate gap percentages
Add unit tests
- Test each gap dimension
- Test recommendation generation
- Test report structure

Acceptance Criteria:

All 7 gap dimensions analyzed
Report clearly identifies missing elements
Actionable recommendations provided
Report generated in <1 second

Phase 4: API Endpoints (Week 2-3)

Tasks:

Implement analysis trigger endpoint
- POST /api/v1/sag/sites/{site_id}/analyze/
- Celery task queueing
- Webhook support
Implement status check endpoint
- GET /api/v1/sag/sites/{site_id}/analysis-status/
- Real-time progress updates
Implement results retrieval endpoint
- GET /api/v1/sag/sites/{site_id}/analysis-results/
- Caching of results
Implement blueprint confirmation endpoint
- POST /api/v1/sag/sites/{site_id}/confirm-analysis/
- Attribute approval logic
- Blueprint creation
Add request/response validation
- Marshmallow schemas
- Error responses
Add authentication/authorization checks
- API token validation
- User site ownership verification

Acceptance Criteria:

All 4 endpoints implemented
Endpoints return correct status codes
Validation working
Authentication required and checked
Error responses follow standard format

Phase 5: Product Auto-Tagging (Week 3)

Tasks:

Implement auto_tagger_service.py
- ProductAutoTagger class
- generate_tag_suggestions() method
- Confidence scoring
Create auto-tagging endpoints
- GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/
- POST /api/v1/sag/sites/{site_id}/auto-tag/apply/
- GET /api/v1/sag/sites/{site_id}/auto-tag/status/
Implement Celery task for bulk tagging
- Batch product processing
- Conflict detection
- Error handling
Add unit tests
- Test suggestion generation
- Test bulk tagging
- Test conflict detection

Acceptance Criteria:

Suggestions endpoint returns valid suggestions
Confidence scores reasonable (0.6+)
Bulk tagging applies tags correctly to products
Progress tracking works
47+ products can be tagged in <2 minutes

Phase 6: Frontend Components (Week 3-4)

Tasks:

Implement SiteAnalysisPanel
- Trigger analysis button
- Progress indicator
- Error messaging
Implement DiscoveredAttributesReview
- Display discovered attributes
- Show confidence scores
- Allow approval/rejection per attribute
- Show example products
Implement GapAnalysisReport
- Visual representation of gaps
- Actionable recommendations
- Priority ordering
Implement AutoTagReviewPanel
- Display product suggestions
- Batch selection/deselection
- Apply tags button
- Progress tracking
Add styling and UX polish
- Responsive design
- Loading states
- Error states
- Success confirmations

Acceptance Criteria:

All 4 components implemented
Responsive on desktop/tablet
Accessible (WCAG 2.1 AA)
User can complete workflow without errors
Loading/error states clearly communicated

Phase 7: Integration & Testing (Week 4)

Tasks:

End-to-end testing
- Connect real WordPress site
- Run full analysis workflow
- Confirm blueprint created
- Verify auto-tagging works
Performance testing
- Benchmark analysis with various site sizes
- Optimize slow operations
- Load testing on API endpoints
Documentation
- API documentation (OpenAPI/Swagger)
- Plugin setup guide
- User guide for Case 1 workflow
- Developer setup guide
Bug fixing and refinement
- Fix integration issues
- Refine UI/UX based on testing
- Improve error messages

Acceptance Criteria:

End-to-end workflow works without errors
Performance meets targets (analysis <5 min for 500 products)
Documentation complete
All bugs fixed
Ready for beta testing

5. Acceptance Criteria

5.1 Functional Requirements

Site Data Collection:

Plugin collects all 8 data types (products, categories, taxonomies, pages, posts, menus, attributes, metadata)
Data is valid JSON matching defined schema
All product titles/descriptions included
Custom attribute values extracted correctly
Menu hierarchy preserved

Attribute Extraction:

AI identifies 5-20 attributes from site data
Confidence scores meaningful and accurate
Low-confidence discoveries flagged
Sector template validation working
Results include frequency counts and example products

Gap Analysis:

All 7 gap dimensions analyzed
Missing hubs, term pages, blog posts clearly identified
Product attribute coverage calculated
Internal linking gaps identified
Actionable recommendations provided

Blueprint Creation:

Confirmed analysis creates valid SAGBlueprint
Attributes and values recorded correctly
Gap analysis linked to blueprint
Blueprint feeds into cluster formation (01C)

Product Auto-Tagging:

Suggestions generated for 90%+ of products
Confidence scores reasonable (0.6+)
Bulk tagging applies tags correctly
No data loss or corruption
Existing tags not overwritten (configurable)

API Endpoints:

All 4 analysis endpoints implemented
All 3 auto-tagging endpoints implemented
Correct HTTP status codes
Valid error responses
Authentication required

Frontend Components:

SiteAnalysisPanel triggers analysis and shows progress
DiscoveredAttributesReview allows attribute approval
GapAnalysisReport displays gaps clearly
AutoTagReviewPanel allows batch product tagging
All components responsive and accessible

5.2 Non-Functional Requirements

Performance:

Site analysis completes in <5 minutes for typical sites (50-500 products)
WordPress plugin endpoint responds in <5 seconds
API endpoints respond in <2 seconds
Frontend components load in <3 seconds

Reliability:

Plugin handles errors gracefully (missing products, etc.)
Partial failures return partial data with warnings
Celery tasks have retry logic
Webhook notifications reliable

Security:

API token authentication required
User can only access own sites
No PII in logs
HTTPS enforced
Input validation on all endpoints

Scalability:

Plugin handles 1000+ products
API handles 100+ concurrent analysis requests
Database indexes optimized for queries
Caching prevents redundant processing

Data Quality:

Analysis results auditable (model used, timestamps, reasoning)
No duplicate attribute suggestions
Confidence scores calibrated
Low-confidence results flagged for review

5.3 User Experience Requirements

Clarity:

User understands analysis process and time required
Gap analysis clearly shows what's missing
Recommendations are actionable
Error messages explain what went wrong

Simplicity:

Workflow is 4-5 steps (analyze → review → confirm → auto-tag → apply)
One button to trigger analysis
Clear next steps after each stage

Feedback:

Real-time progress updates during analysis
Success/error notifications
Ability to view raw analysis results
Audit trail of approvals

6. Claude Code Instructions

6.1 Skill Development

Skill Name: igny8-case1-analysis Version: 2.0 Prerequisites: IGNY8 platform deployed, WordPress plugin v2.0+, Celery configured

Skill Workflow:

Trigger: User connects existing WordPress site to IGNY8

Step 1: Collect Site Data
  - Call: POST /api/v1/sag/sites/{site_id}/analyze/
  - Wait: Poll /api/v1/sag/sites/{site_id}/analysis-status/ every 10 seconds
  - Timeout: 5 minutes
  - Output: task_id for tracking

Step 2: Retrieve Analysis Results
  - Call: GET /api/v1/sag/sites/{site_id}/analysis-results/
  - Parse: extracted_attributes, gap_analysis
  - Display: DiscoveredAttributesReview panel
  - User action: Approve/reject attributes

Step 3: Confirm Analysis
  - Call: POST /api/v1/sag/sites/{site_id}/confirm-analysis/
  - Payload: approved_attributes from user review
  - Output: blueprint_id
  - Display: Gap analysis report
  - Next: Show auto-tagging recommendations

Step 4: Generate Auto-Tag Suggestions
  - Call: GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/?blueprint_id={blueprint_id}
  - Display: AutoTagReviewPanel
  - User action: Select products to tag

Step 5: Apply Auto-Tags
  - Call: POST /api/v1/sag/sites/{site_id}/auto-tag/apply/
  - Wait: Poll /api/v1/sag/sites/{site_id}/auto-tag/status/ every 5 seconds
  - Timeout: 10 minutes
  - Output: Number of tags applied, products tagged

Step 6: Complete & Next Steps
  - Display: Success message
  - Recommendations: Run cluster formation (01C), start content pipeline (01E)
  - Links: View blueprint, view gap report, start cluster creation

6.2 Development Checklist

Code Quality:

All functions have docstrings
Type hints on all function parameters and returns
Logging at DEBUG, INFO, WARNING levels as appropriate
Error handling with specific exception types
No hardcoded values (use config/env vars)

Testing:

Unit tests for each service (>80% coverage)
Integration tests for API endpoints
Fixtures for sample site data
Mock LLM responses for deterministic tests
Performance tests for analysis (time and memory)

Documentation:

Docstrings follow Google style
README with setup instructions
API documentation in OpenAPI format
Example requests/responses for each endpoint
Troubleshooting guide for common errors

Security:

API token validation on all endpoints
User ownership checks before accessing site data
Input validation with Marshmallow
SQL injection prevention (use ORM)
No credentials in logs or errors

Performance:

Database queries indexed
Caching implemented for plugin endpoint
Celery task optimization
LLM API call batching
Frontend component lazy loading

6.3 Debugging & Troubleshooting

Common Issues:

Issue: Analysis hangs or times out

Check: Celery worker status (celery -A sag inspect active)
Check: Redis/message queue status
Check: LLM API rate limits
Solution: Reduce product limit, retry analysis

Issue: Plugin endpoint returns partial data

Check: Specific collector failure (check logs)
Solution: Fix collector, re-run analysis (uses cache bypass)
Note: Partial data is returned if one collector fails

Issue: Auto-tagging misses products

Check: Product title/description quality (missing keywords)
Check: Confidence threshold (lower if needed)
Solution: Review low-confidence suggestions, adjust threshold

Issue: Gap analysis shows 100% gaps

Check: Blueprint created correctly
Check: Gap analysis query (verify site_id matches)
Solution: Re-run analysis, confirm blueprint

6.4 Integration Checkpoints

Integration with 01A (SAGBlueprint):

Confirmed analysis creates SAGBlueprint via POST /api/v1/sag/sites/{site_id}/confirm-analysis/
Blueprint includes extracted attributes and values
Blueprint links to analysis for audit trail
Blueprint ready for cluster formation (01C)

Integration with 01B (Sector Templates):

Attribute extraction uses sector template for validation (optional parameter)
Alignment scores show how closely discovered attributes match template
Low-confidence discoveries flagged if they don't align with template
Template selection based on site category detection

Integration with 01C (Cluster Formation):

Blueprint created from Case 1 analysis feeds into cluster formation
Attributes and values used to create cluster hierarchies
Cluster formation references blueprint_id for traceability
Can override clusters if needed

Integration with 01E (Content Pipeline):

Blueprint creation triggers content pipeline pre-planning
Gap analysis informs content prioritization
Hub page templates created for missing clusters
Blog post outlines generated for content gaps

Integration with 01G (Health Monitoring):

Analysis metrics stored for health dashboard
Gap analysis metrics tracked over time
Product attribute coverage tracked
Auto-tagging success rate monitored

01A: SAGBlueprint Definition — Output of Case 1 analysis
01B: Sector Templates — Used for attribute validation
01C: Cluster Formation — Consumes SAGBlueprint from Case 1
01D: Case 2 Wizard — Alternative path for new sites
01E: Content Pipeline — Feeds blueprint and gap analysis
01G: Health Monitoring — Tracks analysis and enrichment metrics

8. Glossary

SAG: Semantic Attribute Grid — the structured product attribute framework
Attribute: A dimension of product information (e.g., "Target Area," "Device Type")
Attribute Value: A specific instance of an attribute (e.g., "Foot" for Target Area)
Cluster: A group of related attribute values forming a content hub
Gap: Missing element compared to SAG blueprint (hub pages, term pages, blog posts, etc.)
Confidence Score: AI's confidence in discovered attribute (0.0-1.0)
Dimension: Priority level of attribute (Primary, Secondary, Tertiary)
Term Landing Page: Single-page optimized for specific attribute value
Hub Page: Authority page for entire attribute cluster
Auto-Tagging: Bulk assignment of attributes to products

Document Status: Ready for Development Last Review: 2026-03-23 Next Review: Post-Phase 2 Development

41 KiB Raw Blame History

01F: IGNY8 Phase 1 — Existing Site Analysis (Case 1)

1. Current State

1.1 Existing IGNY8 WordPress Plugin

1.2 Case 1 User Journey

1.3 Dependencies & Prerequisites

2. What to Build

2.1 Enhanced Plugin: Site Data Collection

2.2 AI Attribute Extraction Service

2.3 Data Models

SiteAnalysisPayload

AttributeExtractionResult

2.4 Gap Analysis Service

2.5 Product Auto-Tagging Service

3. APIs & Endpoints

3.1 Backend API Endpoints

POST /api/v1/sag/sites/{site_id}/analyze/

GET /api/v1/sag/sites/{site_id}/analysis-status/

GET /api/v1/sag/sites/{site_id}/analysis-results/

POST /api/v1/sag/sites/{site_id}/confirm-analysis/

GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/

POST /api/v1/sag/sites/{site_id}/auto-tag/apply/

GET /api/v1/sag/sites/{site_id}/auto-tag/status/

3.2 WordPress Plugin Endpoint

GET /wp-json/igny8/v1/sag/site-analysis

4. Implementation Steps

Phase 1: Plugin Enhancement (Week 1)

Phase 2: AI Attribute Extraction (Week 1-2)

Phase 3: Gap Analysis Service (Week 2)

Phase 4: API Endpoints (Week 2-3)

Phase 5: Product Auto-Tagging (Week 3)

Phase 6: Frontend Components (Week 3-4)

Phase 7: Integration & Testing (Week 4)

5. Acceptance Criteria

5.1 Functional Requirements

5.2 Non-Functional Requirements

5.3 User Experience Requirements

6. Claude Code Instructions

6.1 Skill Development

6.2 Development Checklist

6.3 Debugging & Troubleshooting

6.4 Integration Checkpoints

7. Related Documents

8. Glossary

41 KiB

Raw Blame History