41 KiB
01F: IGNY8 Phase 1 — Existing Site Analysis (Case 1)
Document Type: Build Specification Phase: Phase 1: Existing Site Analysis Use Case: Case 1 (Users with existing sites) Status: Active Development Last Updated: 2026-03-23
1. Current State
1.1 Existing IGNY8 WordPress Plugin
The IGNY8 WordPress plugin is currently operational with the following capabilities:
Current Data Collection:
- Post status tracking
- Site metadata (domain, WordPress version, plugin count, theme)
- Keyword mapping and analysis
- Site structure analysis
- Taxonomy sync across registered taxonomies
- 7 active cron jobs managing periodic data updates
Current Plugin Endpoint:
GET /wp-json/igny8/v1/health— basic health check- Plugin location: WordPress plugins directory
- Sync frequency: Configurable via cron (daily default)
Limitations:
- Does not collect detailed product data (WooCommerce stores)
- Does not analyze product descriptions for attribute patterns
- No collection of custom attribute assignments
- No menu structure analysis
- No blog content summary extraction
- No confidence scoring for discovered patterns
- Manual attribute creation required post-analysis
1.2 Case 1 User Journey
Trigger: User logs into IGNY8 platform with existing WordPress site (WooCommerce-based)
Current Flow:
- User connects WordPress site via API key
- Plugin syncs basic site data
- User manually creates SAG blueprint
- User manually defines attributes
- User manually tags existing products
Desired Flow:
- User connects WordPress site via API key
- Plugin collects comprehensive site data (products, categories, content)
- AI automatically extracts attributes from product titles/descriptions
- System generates SAG blueprint with discovered attributes
- System performs gap analysis (what's missing vs. SAG template)
- User reviews and confirms blueprint
- System auto-tags existing products
- Blueprint feeds into content pipeline (01E) and cluster formation (01C)
1.3 Dependencies & Prerequisites
- WordPress 5.8+ with WooCommerce 5.0+
- IGNY8 plugin v2.0+ installed and activated
- OpenAI API or compatible LLM for attribute extraction
- Celery for async task processing (analysis may take 2-5 minutes)
- Database schema supports site analysis metadata storage
- Sector templates (01B) available for validation
2. What to Build
2.1 Enhanced Plugin: Site Data Collection
Objective: Extend WordPress plugin to collect comprehensive site data for SAG analysis.
New Plugin Endpoint:
GET /wp-json/igny8/v1/sag/site-analysis
Headers: Authorization: Bearer {IGNY8_API_TOKEN}
Query Parameters:
- limit_products: 500 (max products to analyze; default 500)
- include_drafts: false (include draft products; default false)
- cache_ttl: 3600 (cache results for N seconds; default 3600)
Response: 200 OK with payload (see section 2.3)
Data Collection Modules:
| Module | Responsibility | Data Returned |
|---|---|---|
| ProductCollector | Extract all products with metadata | titles, descriptions, prices, categories, tags, images, custom attributes, sku |
| CategoryCollector | Map product category hierarchy | names, slugs, parent-child hierarchy, descriptions, product counts |
| TaxonomyCollector | Enumerate all custom taxonomies | taxonomy names, all registered terms, term hierarchies, term metadata |
| AttributeCollector | Extract WooCommerce attributes | attribute names, attribute types (select/text/color), all values, product assignments |
| PageCollector | Identify key pages | titles, URLs, content summaries (first 500 chars), page type detection |
| PostCollector | Extract blog posts | titles, URLs, content summaries, categories, tags, publish date |
| MenuCollector | Analyze navigation structure | menu items, hierarchy, target URLs/categories |
| PluginCollector | Document site technical stack | active plugins, theme, WordPress version, WooCommerce version |
Implementation:
- Location:
plugins/igny8-sync/includes/collectors/ - Each collector implements
DataCollectorInterfacewithcollect()andsanitize()methods - Data sanitization: Remove PII, HTML tags, limit text length
- Error handling: Log failures per collector, return partial data if one collector fails
- Performance: Optimize queries to avoid site slowdown (use transients, batch operations)
Plugin Cron Job Addition:
- New job:
igny8_sync_sag_site_analysis(optional, runs if user triggers analysis) - Frequency: On-demand via API call, not scheduled
- Timeout: 60 seconds (analysis itself happens server-side via Celery)
2.2 AI Attribute Extraction Service
File: sag/ai_functions/attribute_extraction.py
Register Key: extract_site_attributes
Input Type: SiteAnalysisPayload
Output Type: AttributeExtractionResult
Function Signature:
def extract_site_attributes(
site_data: SiteAnalysisPayload,
sector_template: Optional[SectorTemplate] = None,
confidence_threshold: float = 0.6,
max_attributes: int = 20
) -> AttributeExtractionResult:
"""
Analyze site data to discover attributes.
Args:
site_data: Raw site data from WordPress plugin
sector_template: Optional sector template for validation
confidence_threshold: Min confidence to include attribute (0.0-1.0)
max_attributes: Max attributes to return
Returns:
AttributeExtractionResult with discovered attributes, frequencies, confidence scores
"""
Algorithm:
-
Text Analysis Phase
- Concatenate product titles and descriptions
- Apply tokenization and noun phrase extraction
- Identify recurring modifiers and descriptors
- Extract from category names and tags
- Extract from custom attribute values (if any exist)
-
Pattern Recognition Phase
- Group similar terms (e.g., "back pain" + "back relief" + "lower back" → "back/spine")
- Calculate frequency across product dataset
- Identify dimensional axes (e.g., "target area," "device type")
- Score statistical significance
-
Validation Phase
- Cross-reference against sector template (if provided)
- Validate against common attribute taxonomies
- Flag conflicting or ambiguous discoveries
- Assign confidence scores based on:
- Frequency (how often appears)
- Consistency (appears across multiple products)
- Specificity (not too vague)
- Template alignment (matches known attributes)
-
Ranking Phase
- Rank by frequency and confidence
- Assign dimensionality (Primary/Secondary/Tertiary)
- Cap results at
max_attributes
Output Structure:
{
"analysis_id": "uuid",
"site_id": "uuid",
"timestamp": "2026-03-23T14:30:00Z",
"analysis_confidence": 0.82,
"attributes": [
{
"name": "Target Area",
"dimension": "Primary",
"confidence": 0.95,
"frequency": 32,
"discovered_from": ["product_titles", "product_descriptions", "categories"],
"values": [
{
"value": "Neck",
"frequency": 12,
"example_products": ["Product A", "Product B"]
},
{
"value": "Back",
"frequency": 8,
"example_products": ["Product C"]
},
{
"value": "Foot",
"frequency": 25,
"example_products": ["Product D", "Product E"]
}
],
"template_validation": {
"matched_sector": "massage_devices",
"matched_attribute": "body_region",
"alignment_score": 0.98
}
},
{
"name": "Device Type",
"dimension": "Primary",
"confidence": 0.88,
"frequency": 28,
"discovered_from": ["product_titles", "product_descriptions"],
"values": [
{
"value": "Shiatsu",
"frequency": 18,
"example_products": ["Product F"]
},
{
"value": "EMS",
"frequency": 7,
"example_products": ["Product G"]
},
{
"value": "Percussion",
"frequency": 3,
"example_products": ["Product H"]
}
],
"template_validation": {
"matched_sector": "massage_devices",
"matched_attribute": "therapy_type",
"alignment_score": 0.91
}
},
{
"name": "Heat Setting",
"dimension": "Secondary",
"confidence": 0.72,
"frequency": 15,
"discovered_from": ["product_descriptions"],
"values": [
{
"value": "Heated",
"frequency": 15,
"example_products": ["Product I", "Product J"]
}
],
"template_validation": {
"matched_sector": "massage_devices",
"matched_attribute": "heat_enabled",
"alignment_score": 0.85
}
}
],
"low_confidence_discoveries": [
{
"name": "Brand",
"confidence": 0.55,
"reason": "High variability, many single-mention values"
}
],
"analysis_notes": {
"total_products_analyzed": 50,
"total_categories": 8,
"total_tags": 23,
"extraction_method": "llm_analysis",
"model_used": "gpt-4-turbo"
}
}
Error Handling:
- Insufficient data: Log warning, return empty attributes list
- LLM API failure: Retry with exponential backoff (3 retries)
- Timeout (>5 minutes): Abort and return partial results
- Invalid sector template: Log error, continue analysis without validation
Performance Considerations:
- Cache sector templates in memory
- Batch LLM calls (process 5-10 products per API call)
- Store extraction results in database for audit trail
- Return results within 2-5 minutes for typical sites
2.3 Data Models
SiteAnalysisPayload
from dataclasses import dataclass
from typing import List, Dict, Optional
@dataclass
class Product:
id: str
title: str
description: str
sku: str
price: float
categories: List[str]
tags: List[str]
custom_attributes: Dict[str, List[str]]
image_urls: List[str]
@dataclass
class Category:
id: str
name: str
slug: str
parent_id: Optional[str]
description: str
product_count: int
@dataclass
class Taxonomy:
name: str
label: str
is_hierarchical: bool
terms: List['Term']
@dataclass
class Term:
id: str
name: str
slug: str
parent_id: Optional[str]
description: str
count: int
@dataclass
class Page:
id: str
title: str
url: str
content_summary: str
page_type: str # e.g., "shop", "landing", "faq"
@dataclass
class Post:
id: str
title: str
url: str
content_summary: str
categories: List[str]
tags: List[str]
publish_date: str
@dataclass
class MenuItem:
id: str
title: str
url: str
target: str
parent_id: Optional[str]
@dataclass
class SiteMetadata:
site_id: str
domain: str
wordpress_version: str
woocommerce_version: str
total_products: int
total_categories: int
total_pages: int
total_posts: int
active_plugins: List[str]
theme: str
@dataclass
class SiteAnalysisPayload:
metadata: SiteMetadata
products: List[Product]
categories: List[Category]
taxonomies: List[Taxonomy]
pages: List[Page]
posts: List[Post]
menus: List[MenuItem]
collected_at: str # ISO 8601 timestamp
AttributeExtractionResult
@dataclass
class AttributeValue:
value: str
frequency: int
example_products: List[str]
@dataclass
class TemplateValidation:
matched_sector: str
matched_attribute: str
alignment_score: float
@dataclass
class DiscoveredAttribute:
name: str
dimension: str # "Primary", "Secondary", "Tertiary"
confidence: float # 0.0-1.0
frequency: int
discovered_from: List[str] # ["product_titles", "product_descriptions", ...]
values: List[AttributeValue]
template_validation: Optional[TemplateValidation]
@dataclass
class LowConfideryDiscovery:
name: str
confidence: float
reason: str
@dataclass
class AnalysisNotes:
total_products_analyzed: int
total_categories: int
total_tags: int
extraction_method: str
model_used: str
@dataclass
class AttributeExtractionResult:
analysis_id: str
site_id: str
timestamp: str
analysis_confidence: float
attributes: List[DiscoveredAttribute]
low_confidence_discoveries: List[LowConfideryDiscovery]
analysis_notes: AnalysisNotes
2.4 Gap Analysis Service
File: sag/services/gap_analysis_service.py
Class: GapAnalysisService
Method: analyze_gap(site_data: SiteAnalysisPayload, blueprint: SAGBlueprint) -> GapAnalysisReport
Purpose: Compare existing site structure against SAG blueprint to identify gaps.
Analysis Dimensions:
-
Attribute Coverage Gap
- SAG blueprint specifies X attributes
- Site currently has Y custom attributes assigned to products
- Gap: Missing attributes or low coverage (% of products with attribute values)
-
Hub Page Gap
- Blueprint specifies Z cluster hubs
- Site analysis identifies M existing pages
- Gap: Missing hub pages (authority pages for attribute clusters)
-
Term Landing Page Gap
- Blueprint specifies N attribute values requiring term landing pages
- Site has existing category/tag pages
- Gap: Missing term landing pages (one per attribute value)
-
Blog Content Gap
- Blueprint specifies recommended blog posts per cluster
- Site has P existing blog posts
- Gap: Blog content aligned to clusters and keyword targets
-
Internal Linking Gap
- Blueprint specifies internal linking strategy
- Site has current internal link structure
- Gap: Missing cross-cluster and term-to-hub links
-
Product Enrichment Gap
- Products lacking attribute assignments
- Products missing description optimization
- Products missing images
-
Technical SEO Gap
- Missing schema markup for products
- Category pages lacking optimization
- Menu structure not optimized for crawlability
Output Structure:
{
"analysis_id": "uuid",
"site_id": "uuid",
"blueprint_id": "uuid",
"timestamp": "2026-03-23T14:30:00Z",
"summary": {
"products_current": 50,
"products_gap": 0,
"attributes_current": 3,
"attributes_blueprint": 8,
"attributes_gap": 5,
"hub_pages_current": 2,
"hub_pages_blueprint": 4,
"hub_pages_gap": 2,
"term_pages_current": 12,
"term_pages_blueprint": 35,
"term_pages_gap": 23,
"blog_posts_current": 8,
"blog_posts_blueprint": 24,
"blog_posts_gap": 16,
"overall_gap_percentage": 62
},
"attributes_gap_detail": [
{
"attribute": "Target Area",
"coverage_current": "100% (50/50)",
"coverage_blueprint": "100% (50/50)",
"gap": "None — attribute well-covered"
},
{
"attribute": "Device Type",
"coverage_current": "80% (40/50)",
"coverage_blueprint": "100% (50/50)",
"gap": "10 products missing Device Type assignment"
}
],
"hub_pages_gap_detail": [
{
"cluster": "Foot Massagers",
"status": "EXISTS",
"url": "/shop/foot-massagers",
"optimization_notes": "Good; consider adding testimonials section"
},
{
"cluster": "Neck & Shoulder Relief",
"status": "MISSING",
"recommendation": "Create hub page at /neck-shoulder-relief"
}
],
"term_pages_gap_detail": [
{
"attribute": "Target Area",
"term": "Neck",
"status": "MISSING",
"recommendation": "Create term page at /target-area/neck (products filter + blog links)"
}
],
"blog_posts_gap_detail": [
{
"cluster": "Foot Massagers",
"recommended_posts": [
"Best Foot Massagers for Neuropathy",
"How to Use Shiatsu Foot Massagers",
"Foot Massage Benefits"
],
"existing_posts": [
"Foot Massage 101"
],
"gap": 2
}
],
"internal_linking_gap": {
"status": "High gaps identified",
"recommendation": "Blueprint specifies 3-5 internal links per hub page; current average: 1.2",
"priority_links": [
"Neck hub → Foot hub (shared body region cluster)",
"Device Type pages → Hub pages",
"Blog posts → Related term pages"
]
},
"actionable_recommendations": [
"IMMEDIATE: Assign Device Type to 10 untagged products",
"WEEK 1: Create 2 missing hub pages",
"WEEK 2: Create 23 term landing pages via script",
"WEEK 3: Bulk create 16 blog posts (outline + AI generation)",
"WEEK 4: Implement internal linking strategy"
]
}
2.5 Product Auto-Tagging Service
File: sag/services/auto_tagger_service.py
Class: ProductAutoTagger
Method: generate_tag_suggestions(products: List[Product], attributes: List[DiscoveredAttribute], blueprint: SAGBlueprint) -> List[TagSuggestion]
Purpose: Generate batch product-to-attribute assignments based on product titles/descriptions.
Algorithm:
-
For each product:
- Extract key terms from title and description
- Match against attribute values (fuzzy matching allowed)
- Score confidence for each attribute assignment
- Rank by confidence
-
For each attribute:
- Verify assignment makes semantic sense
- Check for conflicting assignments (e.g., can't be both "Shiatsu" and "EMS")
- Return ranked list
-
Group by product for review UI
Output Structure:
{
"batch_id": "uuid",
"site_id": "uuid",
"blueprint_id": "uuid",
"timestamp": "2026-03-23T14:30:00Z",
"total_products": 50,
"total_suggestions": 87,
"suggestions": [
{
"product_id": "woo_123",
"product_title": "Nekteck Foot Massager with Heat",
"proposed_tags": [
{
"attribute": "Target Area",
"value": "Foot",
"confidence": 0.98,
"reasoning": "Title contains 'Foot Massager'"
},
{
"attribute": "Device Type",
"value": "Shiatsu",
"confidence": 0.82,
"reasoning": "Description mentions shiatsu nodes"
},
{
"attribute": "Heat Setting",
"value": "Heated",
"confidence": 0.95,
"reasoning": "Title explicitly states 'with Heat'"
}
],
"status": "pending_review"
}
],
"summary": {
"high_confidence_suggestions": 72,
"medium_confidence_suggestions": 12,
"low_confidence_suggestions": 3,
"conflicts_detected": 0,
"ready_to_apply": true
}
}
3. APIs & Endpoints
3.1 Backend API Endpoints
All endpoints are authenticated via Authorization: Bearer {IGNY8_API_TOKEN} header.
POST /api/v1/sag/sites/{site_id}/analyze/
Purpose: Trigger comprehensive site analysis (async).
Request:
{
"include_draft_products": false,
"product_limit": 500,
"sector_template_id": "optional_uuid",
"webhook_url": "optional_https_url_for_completion_notification"
}
Response: 202 Accepted
{
"task_id": "celery_task_uuid",
"site_id": "site_uuid",
"status": "queued",
"estimated_duration_seconds": 120,
"check_status_url": "/api/v1/sag/sites/{site_id}/analysis-status/?task_id={task_id}"
}
Error Responses:
- 400: Invalid parameters
- 401: Unauthorized
- 404: Site not found
- 429: Rate limited (max 1 analysis per 30 minutes per site)
GET /api/v1/sag/sites/{site_id}/analysis-status/
Purpose: Check analysis progress.
Query Parameters:
task_id(required): Celery task ID from analysis trigger
Response: 200 OK
{
"task_id": "celery_task_uuid",
"site_id": "site_uuid",
"status": "processing",
"progress_percent": 45,
"current_step": "Analyzing product attributes",
"elapsed_seconds": 32,
"estimated_remaining_seconds": 48
}
Status Values:
queued— waiting to startprocessing— actively analyzingcomplete— analysis finishedfailed— analysis error (see error message)
GET /api/v1/sag/sites/{site_id}/analysis-results/
Purpose: Retrieve completed analysis results.
Response: 200 OK
{
"analysis_id": "uuid",
"site_id": "site_uuid",
"timestamp": "2026-03-23T14:30:00Z",
"site_data_summary": {
"total_products": 50,
"total_categories": 8,
"total_pages": 12,
"total_posts": 8
},
"extracted_attributes": {
"analysis_confidence": 0.82,
"attributes_count": 8,
"attributes": [
{ "name": "Target Area", "dimension": "Primary", "confidence": 0.95, ... }
]
},
"gap_analysis": {
"overall_gap_percentage": 62,
"summary": { ... }
},
"status": "ready_for_review"
}
Status Values:
ready_for_review— user should review before confirmingconfirmed— user has accepted analysisarchived— superceded by newer analysis
POST /api/v1/sag/sites/{site_id}/confirm-analysis/
Purpose: User confirms analysis; creates SAG blueprint.
Request:
{
"analysis_id": "uuid",
"approved_attributes": [
{
"name": "Target Area",
"approved_values": ["Neck", "Back", "Foot"],
"exclude_values": []
}
],
"confirmed_by_user_id": "user_uuid"
}
Response: 201 Created
{
"blueprint_id": "uuid",
"site_id": "site_uuid",
"analysis_id": "uuid",
"status": "created",
"attributes_count": 8,
"attribute_values_count": 45,
"created_at": "2026-03-23T14:32:00Z",
"next_steps": [
"Review auto-tagging suggestions",
"Approve product tags",
"Start content pipeline (01E)"
]
}
GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/
Purpose: Retrieve product auto-tagging suggestions.
Query Parameters:
blueprint_id(required): ID of confirmed blueprintconfidence_min(optional): Filter by minimum confidence (0.0-1.0, default 0.6)limit(optional): Max suggestions per product (default 5)
Response: 200 OK
{
"batch_id": "uuid",
"blueprint_id": "blueprint_uuid",
"total_suggestions": 87,
"suggestions": [
{
"product_id": "woo_123",
"product_title": "Nekteck Foot Massager",
"proposed_tags": [
{
"attribute": "Target Area",
"value": "Foot",
"confidence": 0.98,
"reasoning": "Title contains 'Foot Massager'"
}
]
}
]
}
POST /api/v1/sag/sites/{site_id}/auto-tag/apply/
Purpose: Apply approved product tags to site (async bulk operation).
Request:
{
"blueprint_id": "uuid",
"approved_suggestions": [
{
"product_id": "woo_123",
"approved_tags": [
{
"attribute": "Target Area",
"value": "Foot"
}
]
}
],
"skip_existing_values": true
}
Response: 202 Accepted
{
"task_id": "celery_task_uuid",
"site_id": "site_uuid",
"blueprint_id": "blueprint_uuid",
"status": "processing",
"products_to_tag": 47,
"tags_to_apply": 87,
"check_status_url": "/api/v1/sag/sites/{site_id}/auto-tag/status/?task_id={task_id}"
}
GET /api/v1/sag/sites/{site_id}/auto-tag/status/
Purpose: Check auto-tagging progress.
Query Parameters:
task_id(required): Celery task ID
Response: 200 OK
{
"task_id": "celery_task_uuid",
"site_id": "site_uuid",
"status": "processing",
"progress_percent": 62,
"products_tagged": 29,
"total_products": 47,
"tags_applied": 54,
"estimated_remaining_seconds": 30
}
3.2 WordPress Plugin Endpoint
GET /wp-json/igny8/v1/sag/site-analysis
Purpose: Collect comprehensive site data for analysis.
Headers:
Authorization: Bearer {IGNY8_API_TOKEN}X-IGNY8-Request-ID: {uuid}(optional, for request tracking)
Query Parameters:
limit_products: int (1-1000, default 500)include_drafts: boolean (default false)cache_ttl: int (seconds, default 3600)
Response: 200 OK
{
"metadata": {
"site_id": "uuid",
"domain": "example-store.com",
"wordpress_version": "6.4.2",
"woocommerce_version": "8.5.0",
"total_products": 50,
"total_categories": 8,
"total_pages": 12,
"total_posts": 8,
"active_plugins": ["woocommerce", "yoast-seo", ...],
"theme": "storefront"
},
"products": [
{
"id": "woo_123",
"title": "Nekteck Foot Massager with Heat",
"description": "Premium foot massage device...",
"sku": "NEKTECK-FM-001",
"price": 79.99,
"categories": ["Foot Massagers", "Massage Devices"],
"tags": ["heated", "cordless"],
"custom_attributes": {
"brand": ["Nekteck"],
"color": ["Black"],
"warranty": ["2 Year"]
},
"image_urls": ["image1.jpg", "image2.jpg"]
}
],
"categories": [
{
"id": "cat_1",
"name": "Foot Massagers",
"slug": "foot-massagers",
"parent_id": null,
"description": "Electronic foot massage devices",
"product_count": 12
}
],
"taxonomies": [
{
"name": "brand",
"label": "Brand",
"is_hierarchical": false,
"terms": [
{
"id": "brand_1",
"name": "Nekteck",
"slug": "nekteck",
"parent_id": null,
"description": "",
"count": 5
}
]
}
],
"pages": [
{
"id": "page_1",
"title": "Shop",
"url": "/shop",
"content_summary": "Browse our selection of massage devices",
"page_type": "shop"
}
],
"posts": [
{
"id": "post_1",
"title": "Benefits of Foot Massage",
"url": "/blog/foot-massage-benefits",
"content_summary": "Learn why foot massage is beneficial...",
"categories": ["Health"],
"tags": ["foot", "massage"],
"publish_date": "2026-03-15"
}
],
"menus": [
{
"id": "menu_1",
"title": "Main Menu",
"items": [
{
"id": "item_1",
"title": "Shop",
"url": "/shop",
"target": "_self",
"parent_id": null
}
]
}
],
"collected_at": "2026-03-23T14:30:00Z"
}
Error Responses:
- 400: Invalid query parameters
- 401: Invalid or missing API token
- 500: Plugin error (logged on WordPress side)
Performance:
- Response time target: <5 seconds for sites with <500 products
- Data is cached for 1 hour (configurable via
cache_ttl) - Uses WordPress transients API for caching
4. Implementation Steps
Phase 1: Plugin Enhancement (Week 1)
Tasks:
-
Create collector classes in
plugins/igny8-sync/includes/collectors/- ProductCollector
- CategoryCollector
- TaxonomyCollector
- AttributeCollector
- PageCollector
- PostCollector
- MenuCollector
- PluginCollector
-
Implement
DataCollectorInterfacecollect()method (fetches raw data)sanitize()method (removes PII, normalizes format)- Error handling per collector
-
Add
/wp-json/igny8/v1/sag/site-analysisendpoint- Route definition
- Parameter validation
- Response formatting
- Caching logic
-
Add unit tests for collectors
- Mock data tests
- Error condition tests
- Performance tests
Acceptance Criteria:
- Endpoint returns valid JSON payload matching schema
- All 8 collectors implemented and tested
- Response time <5 seconds for 500 products
- Caching works correctly
- Error handling tested
Phase 2: AI Attribute Extraction (Week 1-2)
Tasks:
-
Implement
attribute_extraction.py- Text analysis functions
- Pattern recognition logic
- Confidence scoring
- Validation against sector templates
-
Register with LLM framework
- Implement
extract_site_attributesfunction - Add input/output validation
- Error handling (retry logic)
- Implement
-
Create data models
- DiscoveredAttribute
- AttributeValue
- TemplateValidation
- AttributeExtractionResult
-
Add unit and integration tests
- Mock LLM responses
- Test with real site data
- Confidence scoring validation
- Performance tests (2-5 minute runtime)
Acceptance Criteria:
- Extracts 5-20 attributes from sample site data
- Confidence scores accurate and meaningful
- Sector template validation works
- Low-confidence discoveries flagged
- Results auditable (model used, reasoning provided)
Phase 3: Gap Analysis Service (Week 2)
Tasks:
-
Implement
gap_analysis_service.py- GapAnalysisService class
- analyze_gap() method
- All 7 gap dimensions analyzed
-
Create gap analysis models
- GapAnalysisReport
- Recommendation structures
- Detail sections
-
Integrate with blueprint comparison
- Query SAG blueprint
- Compare against site data
- Calculate gap percentages
-
Add unit tests
- Test each gap dimension
- Test recommendation generation
- Test report structure
Acceptance Criteria:
- All 7 gap dimensions analyzed
- Report clearly identifies missing elements
- Actionable recommendations provided
- Report generated in <1 second
Phase 4: API Endpoints (Week 2-3)
Tasks:
-
Implement analysis trigger endpoint
- POST /api/v1/sag/sites/{site_id}/analyze/
- Celery task queueing
- Webhook support
-
Implement status check endpoint
- GET /api/v1/sag/sites/{site_id}/analysis-status/
- Real-time progress updates
-
Implement results retrieval endpoint
- GET /api/v1/sag/sites/{site_id}/analysis-results/
- Caching of results
-
Implement blueprint confirmation endpoint
- POST /api/v1/sag/sites/{site_id}/confirm-analysis/
- Attribute approval logic
- Blueprint creation
-
Add request/response validation
- Marshmallow schemas
- Error responses
-
Add authentication/authorization checks
- API token validation
- User site ownership verification
Acceptance Criteria:
- All 4 endpoints implemented
- Endpoints return correct status codes
- Validation working
- Authentication required and checked
- Error responses follow standard format
Phase 5: Product Auto-Tagging (Week 3)
Tasks:
-
Implement
auto_tagger_service.py- ProductAutoTagger class
- generate_tag_suggestions() method
- Confidence scoring
-
Create auto-tagging endpoints
- GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/
- POST /api/v1/sag/sites/{site_id}/auto-tag/apply/
- GET /api/v1/sag/sites/{site_id}/auto-tag/status/
-
Implement Celery task for bulk tagging
- Batch product processing
- Conflict detection
- Error handling
-
Add unit tests
- Test suggestion generation
- Test bulk tagging
- Test conflict detection
Acceptance Criteria:
- Suggestions endpoint returns valid suggestions
- Confidence scores reasonable (0.6+)
- Bulk tagging applies tags correctly to products
- Progress tracking works
- 47+ products can be tagged in <2 minutes
Phase 6: Frontend Components (Week 3-4)
Tasks:
-
Implement SiteAnalysisPanel
- Trigger analysis button
- Progress indicator
- Error messaging
-
Implement DiscoveredAttributesReview
- Display discovered attributes
- Show confidence scores
- Allow approval/rejection per attribute
- Show example products
-
Implement GapAnalysisReport
- Visual representation of gaps
- Actionable recommendations
- Priority ordering
-
Implement AutoTagReviewPanel
- Display product suggestions
- Batch selection/deselection
- Apply tags button
- Progress tracking
-
Add styling and UX polish
- Responsive design
- Loading states
- Error states
- Success confirmations
Acceptance Criteria:
- All 4 components implemented
- Responsive on desktop/tablet
- Accessible (WCAG 2.1 AA)
- User can complete workflow without errors
- Loading/error states clearly communicated
Phase 7: Integration & Testing (Week 4)
Tasks:
-
End-to-end testing
- Connect real WordPress site
- Run full analysis workflow
- Confirm blueprint created
- Verify auto-tagging works
-
Performance testing
- Benchmark analysis with various site sizes
- Optimize slow operations
- Load testing on API endpoints
-
Documentation
- API documentation (OpenAPI/Swagger)
- Plugin setup guide
- User guide for Case 1 workflow
- Developer setup guide
-
Bug fixing and refinement
- Fix integration issues
- Refine UI/UX based on testing
- Improve error messages
Acceptance Criteria:
- End-to-end workflow works without errors
- Performance meets targets (analysis <5 min for 500 products)
- Documentation complete
- All bugs fixed
- Ready for beta testing
5. Acceptance Criteria
5.1 Functional Requirements
Site Data Collection:
- Plugin collects all 8 data types (products, categories, taxonomies, pages, posts, menus, attributes, metadata)
- Data is valid JSON matching defined schema
- All product titles/descriptions included
- Custom attribute values extracted correctly
- Menu hierarchy preserved
Attribute Extraction:
- AI identifies 5-20 attributes from site data
- Confidence scores meaningful and accurate
- Low-confidence discoveries flagged
- Sector template validation working
- Results include frequency counts and example products
Gap Analysis:
- All 7 gap dimensions analyzed
- Missing hubs, term pages, blog posts clearly identified
- Product attribute coverage calculated
- Internal linking gaps identified
- Actionable recommendations provided
Blueprint Creation:
- Confirmed analysis creates valid SAGBlueprint
- Attributes and values recorded correctly
- Gap analysis linked to blueprint
- Blueprint feeds into cluster formation (01C)
Product Auto-Tagging:
- Suggestions generated for 90%+ of products
- Confidence scores reasonable (0.6+)
- Bulk tagging applies tags correctly
- No data loss or corruption
- Existing tags not overwritten (configurable)
API Endpoints:
- All 4 analysis endpoints implemented
- All 3 auto-tagging endpoints implemented
- Correct HTTP status codes
- Valid error responses
- Authentication required
Frontend Components:
- SiteAnalysisPanel triggers analysis and shows progress
- DiscoveredAttributesReview allows attribute approval
- GapAnalysisReport displays gaps clearly
- AutoTagReviewPanel allows batch product tagging
- All components responsive and accessible
5.2 Non-Functional Requirements
Performance:
- Site analysis completes in <5 minutes for typical sites (50-500 products)
- WordPress plugin endpoint responds in <5 seconds
- API endpoints respond in <2 seconds
- Frontend components load in <3 seconds
Reliability:
- Plugin handles errors gracefully (missing products, etc.)
- Partial failures return partial data with warnings
- Celery tasks have retry logic
- Webhook notifications reliable
Security:
- API token authentication required
- User can only access own sites
- No PII in logs
- HTTPS enforced
- Input validation on all endpoints
Scalability:
- Plugin handles 1000+ products
- API handles 100+ concurrent analysis requests
- Database indexes optimized for queries
- Caching prevents redundant processing
Data Quality:
- Analysis results auditable (model used, timestamps, reasoning)
- No duplicate attribute suggestions
- Confidence scores calibrated
- Low-confidence results flagged for review
5.3 User Experience Requirements
Clarity:
- User understands analysis process and time required
- Gap analysis clearly shows what's missing
- Recommendations are actionable
- Error messages explain what went wrong
Simplicity:
- Workflow is 4-5 steps (analyze → review → confirm → auto-tag → apply)
- One button to trigger analysis
- Clear next steps after each stage
Feedback:
- Real-time progress updates during analysis
- Success/error notifications
- Ability to view raw analysis results
- Audit trail of approvals
6. Claude Code Instructions
6.1 Skill Development
Skill Name: igny8-case1-analysis
Version: 2.0
Prerequisites: IGNY8 platform deployed, WordPress plugin v2.0+, Celery configured
Skill Workflow:
Trigger: User connects existing WordPress site to IGNY8
Step 1: Collect Site Data
- Call: POST /api/v1/sag/sites/{site_id}/analyze/
- Wait: Poll /api/v1/sag/sites/{site_id}/analysis-status/ every 10 seconds
- Timeout: 5 minutes
- Output: task_id for tracking
Step 2: Retrieve Analysis Results
- Call: GET /api/v1/sag/sites/{site_id}/analysis-results/
- Parse: extracted_attributes, gap_analysis
- Display: DiscoveredAttributesReview panel
- User action: Approve/reject attributes
Step 3: Confirm Analysis
- Call: POST /api/v1/sag/sites/{site_id}/confirm-analysis/
- Payload: approved_attributes from user review
- Output: blueprint_id
- Display: Gap analysis report
- Next: Show auto-tagging recommendations
Step 4: Generate Auto-Tag Suggestions
- Call: GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/?blueprint_id={blueprint_id}
- Display: AutoTagReviewPanel
- User action: Select products to tag
Step 5: Apply Auto-Tags
- Call: POST /api/v1/sag/sites/{site_id}/auto-tag/apply/
- Wait: Poll /api/v1/sag/sites/{site_id}/auto-tag/status/ every 5 seconds
- Timeout: 10 minutes
- Output: Number of tags applied, products tagged
Step 6: Complete & Next Steps
- Display: Success message
- Recommendations: Run cluster formation (01C), start content pipeline (01E)
- Links: View blueprint, view gap report, start cluster creation
6.2 Development Checklist
Code Quality:
- All functions have docstrings
- Type hints on all function parameters and returns
- Logging at DEBUG, INFO, WARNING levels as appropriate
- Error handling with specific exception types
- No hardcoded values (use config/env vars)
Testing:
- Unit tests for each service (>80% coverage)
- Integration tests for API endpoints
- Fixtures for sample site data
- Mock LLM responses for deterministic tests
- Performance tests for analysis (time and memory)
Documentation:
- Docstrings follow Google style
- README with setup instructions
- API documentation in OpenAPI format
- Example requests/responses for each endpoint
- Troubleshooting guide for common errors
Security:
- API token validation on all endpoints
- User ownership checks before accessing site data
- Input validation with Marshmallow
- SQL injection prevention (use ORM)
- No credentials in logs or errors
Performance:
- Database queries indexed
- Caching implemented for plugin endpoint
- Celery task optimization
- LLM API call batching
- Frontend component lazy loading
6.3 Debugging & Troubleshooting
Common Issues:
Issue: Analysis hangs or times out
- Check: Celery worker status (
celery -A sag inspect active) - Check: Redis/message queue status
- Check: LLM API rate limits
- Solution: Reduce product limit, retry analysis
Issue: Plugin endpoint returns partial data
- Check: Specific collector failure (check logs)
- Solution: Fix collector, re-run analysis (uses cache bypass)
- Note: Partial data is returned if one collector fails
Issue: Auto-tagging misses products
- Check: Product title/description quality (missing keywords)
- Check: Confidence threshold (lower if needed)
- Solution: Review low-confidence suggestions, adjust threshold
Issue: Gap analysis shows 100% gaps
- Check: Blueprint created correctly
- Check: Gap analysis query (verify site_id matches)
- Solution: Re-run analysis, confirm blueprint
6.4 Integration Checkpoints
Integration with 01A (SAGBlueprint):
- Confirmed analysis creates SAGBlueprint via POST /api/v1/sag/sites/{site_id}/confirm-analysis/
- Blueprint includes extracted attributes and values
- Blueprint links to analysis for audit trail
- Blueprint ready for cluster formation (01C)
Integration with 01B (Sector Templates):
- Attribute extraction uses sector template for validation (optional parameter)
- Alignment scores show how closely discovered attributes match template
- Low-confidence discoveries flagged if they don't align with template
- Template selection based on site category detection
Integration with 01C (Cluster Formation):
- Blueprint created from Case 1 analysis feeds into cluster formation
- Attributes and values used to create cluster hierarchies
- Cluster formation references blueprint_id for traceability
- Can override clusters if needed
Integration with 01E (Content Pipeline):
- Blueprint creation triggers content pipeline pre-planning
- Gap analysis informs content prioritization
- Hub page templates created for missing clusters
- Blog post outlines generated for content gaps
Integration with 01G (Health Monitoring):
- Analysis metrics stored for health dashboard
- Gap analysis metrics tracked over time
- Product attribute coverage tracked
- Auto-tagging success rate monitored
7. Related Documents
- 01A: SAGBlueprint Definition — Output of Case 1 analysis
- 01B: Sector Templates — Used for attribute validation
- 01C: Cluster Formation — Consumes SAGBlueprint from Case 1
- 01D: Case 2 Wizard — Alternative path for new sites
- 01E: Content Pipeline — Feeds blueprint and gap analysis
- 01G: Health Monitoring — Tracks analysis and enrichment metrics
8. Glossary
- SAG: Semantic Attribute Grid — the structured product attribute framework
- Attribute: A dimension of product information (e.g., "Target Area," "Device Type")
- Attribute Value: A specific instance of an attribute (e.g., "Foot" for Target Area)
- Cluster: A group of related attribute values forming a content hub
- Gap: Missing element compared to SAG blueprint (hub pages, term pages, blog posts, etc.)
- Confidence Score: AI's confidence in discovered attribute (0.0-1.0)
- Dimension: Priority level of attribute (Primary, Secondary, Tertiary)
- Term Landing Page: Single-page optimized for specific attribute value
- Hub Page: Authority page for entire attribute cluster
- Auto-Tagging: Bulk assignment of attributes to products
Document Status: Ready for Development Last Review: 2026-03-23 Next Review: Post-Phase 2 Development