# 01F: IGNY8 Phase 1 — Existing Site Analysis (Case 1) > **Version:** 1.1 (codebase-verified) > **Source of Truth:** Codebase at `/data/app/igny8/backend/` > **Last Verified:** 2025-07-14 **Document Type:** Build Specification **Phase:** Phase 1: Existing Site Analysis **Use Case:** Case 1 (Users with existing sites) **Status:** Active Development **Last Updated:** 2026-03-23 --- ## 1. Current State ### 1.1 Existing IGNY8 WordPress Plugin The IGNY8 WordPress plugin is currently operational with the following capabilities: **Current Data Collection:** - Post status tracking - Site metadata (domain, WordPress version, plugin count, theme) - Keyword mapping and analysis - Site structure analysis - Taxonomy sync across registered taxonomies - 7 active cron jobs managing periodic data updates **Current Plugin Endpoint:** - `GET /wp-json/igny8/v1/health` — basic health check - Plugin location: WordPress plugins directory - Sync frequency: Configurable via cron (daily default) **Limitations:** - Does not collect detailed product data (WooCommerce stores) - Does not analyze product descriptions for attribute patterns - No collection of custom attribute assignments - No menu structure analysis - No blog content summary extraction - No confidence scoring for discovered patterns - Manual attribute creation required post-analysis ### 1.2 Case 1 User Journey **Trigger:** User logs into IGNY8 platform with existing WordPress site (WooCommerce-based) **Current Flow:** 1. User connects WordPress site via API key 2. Plugin syncs basic site data 3. User manually creates SAG blueprint 4. User manually defines attributes 5. User manually tags existing products **Desired Flow:** 1. User connects WordPress site via API key 2. Plugin collects comprehensive site data (products, categories, content) 3. AI automatically extracts attributes from product titles/descriptions 4. System generates SAG blueprint with discovered attributes 5. System performs gap analysis (what's missing vs. SAG template) 6. User reviews and confirms blueprint 7. System auto-tags existing products 8. Blueprint feeds into content pipeline (01E) and cluster formation (01C) ### 1.3 Dependencies & Prerequisites - WordPress 5.8+ with WooCommerce 5.0+ - IGNY8 plugin v2.0+ installed and activated - OpenAI API or compatible LLM for attribute extraction - Celery for async task processing (analysis may take 2-5 minutes) - Database schema supports site analysis metadata storage - Sector templates (01B) available for validation --- ## 2. What to Build ### 2.1 Enhanced Plugin: Site Data Collection **Objective:** Extend WordPress plugin to collect comprehensive site data for SAG analysis. **New Plugin Endpoint:** ``` GET /wp-json/igny8/v1/sag/site-analysis Headers: Authorization: Bearer {IGNY8_API_TOKEN} Query Parameters: - limit_products: 500 (max products to analyze; default 500) - include_drafts: false (include draft products; default false) - cache_ttl: 3600 (cache results for N seconds; default 3600) Response: 200 OK with payload (see section 2.3) ``` **Data Collection Modules:** | Module | Responsibility | Data Returned | |--------|-----------------|----------------| | ProductCollector | Extract all products with metadata | titles, descriptions, prices, categories, tags, images, custom attributes, sku | | CategoryCollector | Map product category hierarchy | names, slugs, parent-child hierarchy, descriptions, product counts | | TaxonomyCollector | Enumerate all custom taxonomies | taxonomy names, all registered terms, term hierarchies, term metadata | | AttributeCollector | Extract WooCommerce attributes | attribute names, attribute types (select/text/color), all values, product assignments | | PageCollector | Identify key pages | titles, URLs, content summaries (first 500 chars), page type detection | | PostCollector | Extract blog posts | titles, URLs, content summaries, categories, tags, publish date | | MenuCollector | Analyze navigation structure | menu items, hierarchy, target URLs/categories | | PluginCollector | Document site technical stack | active plugins, theme, WordPress version, WooCommerce version | **Implementation:** - Location: `plugins/igny8-sync/includes/collectors/` - Each collector implements `DataCollectorInterface` with `collect()` and `sanitize()` methods - Data sanitization: Remove PII, HTML tags, limit text length - Error handling: Log failures per collector, return partial data if one collector fails - Performance: Optimize queries to avoid site slowdown (use transients, batch operations) **Plugin Cron Job Addition:** - New job: `igny8_sync_sag_site_analysis` (optional, runs if user triggers analysis) - Frequency: On-demand via API call, not scheduled - Timeout: 60 seconds (analysis itself happens server-side via Celery) ### 2.2 AI Attribute Extraction Service **File:** `sag/ai_functions/attribute_extraction.py` **Register Key:** `extract_site_attributes` **Input Type:** SiteAnalysisPayload **Output Type:** AttributeExtractionResult **Function Signature:** ```python def extract_site_attributes( site_data: SiteAnalysisPayload, sector_template: Optional[SectorTemplate] = None, confidence_threshold: float = 0.6, max_attributes: int = 20 ) -> AttributeExtractionResult: """ Analyze site data to discover attributes. Args: site_data: Raw site data from WordPress plugin sector_template: Optional sector template for validation confidence_threshold: Min confidence to include attribute (0.0-1.0) max_attributes: Max attributes to return Returns: AttributeExtractionResult with discovered attributes, frequencies, confidence scores """ ``` **Algorithm:** 1. **Text Analysis Phase** - Concatenate product titles and descriptions - Apply tokenization and noun phrase extraction - Identify recurring modifiers and descriptors - Extract from category names and tags - Extract from custom attribute values (if any exist) 2. **Pattern Recognition Phase** - Group similar terms (e.g., "back pain" + "back relief" + "lower back" → "back/spine") - Calculate frequency across product dataset - Identify dimensional axes (e.g., "target area," "device type") - Score statistical significance 3. **Validation Phase** - Cross-reference against sector template (if provided) - Validate against common attribute taxonomies - Flag conflicting or ambiguous discoveries - Assign confidence scores based on: - Frequency (how often appears) - Consistency (appears across multiple products) - Specificity (not too vague) - Template alignment (matches known attributes) 4. **Ranking Phase** - Rank by frequency and confidence - Assign dimensionality (Primary/Secondary/Tertiary) - Cap results at `max_attributes` **Output Structure:** ```json { "analysis_id": 42, "site_id": 7, "timestamp": "2026-03-23T14:30:00Z", "analysis_confidence": 0.82, "attributes": [ { "name": "Target Area", "dimension": "Primary", "confidence": 0.95, "frequency": 32, "discovered_from": ["product_titles", "product_descriptions", "categories"], "values": [ { "value": "Neck", "frequency": 12, "example_products": ["Product A", "Product B"] }, { "value": "Back", "frequency": 8, "example_products": ["Product C"] }, { "value": "Foot", "frequency": 25, "example_products": ["Product D", "Product E"] } ], "template_validation": { "matched_sector": "massage_devices", "matched_attribute": "body_region", "alignment_score": 0.98 } }, { "name": "Device Type", "dimension": "Primary", "confidence": 0.88, "frequency": 28, "discovered_from": ["product_titles", "product_descriptions"], "values": [ { "value": "Shiatsu", "frequency": 18, "example_products": ["Product F"] }, { "value": "EMS", "frequency": 7, "example_products": ["Product G"] }, { "value": "Percussion", "frequency": 3, "example_products": ["Product H"] } ], "template_validation": { "matched_sector": "massage_devices", "matched_attribute": "therapy_type", "alignment_score": 0.91 } }, { "name": "Heat Setting", "dimension": "Secondary", "confidence": 0.72, "frequency": 15, "discovered_from": ["product_descriptions"], "values": [ { "value": "Heated", "frequency": 15, "example_products": ["Product I", "Product J"] } ], "template_validation": { "matched_sector": "massage_devices", "matched_attribute": "heat_enabled", "alignment_score": 0.85 } } ], "low_confidence_discoveries": [ { "name": "Brand", "confidence": 0.55, "reason": "High variability, many single-mention values" } ], "analysis_notes": { "total_products_analyzed": 50, "total_categories": 8, "total_tags": 23, "extraction_method": "llm_analysis", "model_used": "gpt-4-turbo" } } ``` **Error Handling:** - Insufficient data: Log warning, return empty attributes list - LLM API failure: Retry with exponential backoff (3 retries) - Timeout (>5 minutes): Abort and return partial results - Invalid sector template: Log error, continue analysis without validation **Performance Considerations:** - Cache sector templates in memory - Batch LLM calls (process 5-10 products per API call) - Store extraction results in database for audit trail - Return results within 2-5 minutes for typical sites ### 2.3 Data Models #### SiteAnalysisPayload ```python from dataclasses import dataclass from typing import List, Dict, Optional @dataclass class Product: id: int title: str description: str sku: str price: float categories: List[str] tags: List[str] custom_attributes: Dict[str, List[str]] image_urls: List[str] @dataclass class Category: id: int name: str slug: str parent_id: Optional[int] description: str product_count: int @dataclass class Taxonomy: name: str label: str is_hierarchical: bool terms: List['Term'] @dataclass class Term: id: int name: str slug: str parent_id: Optional[int] description: str count: int @dataclass class Page: id: int title: str url: str content_summary: str page_type: str # e.g., "shop", "landing", "faq" @dataclass class Post: id: int title: str url: str content_summary: str categories: List[str] tags: List[str] publish_date: str @dataclass class MenuItem: id: int title: str url: str target: str parent_id: Optional[int] @dataclass class SiteMetadata: site_id: int domain: str wordpress_version: str woocommerce_version: str total_products: int total_categories: int total_pages: int total_posts: int active_plugins: List[str] theme: str @dataclass class SiteAnalysisPayload: metadata: SiteMetadata products: List[Product] categories: List[Category] taxonomies: List[Taxonomy] pages: List[Page] posts: List[Post] menus: List[MenuItem] collected_at: str # ISO 8601 timestamp ``` #### AttributeExtractionResult ```python @dataclass class AttributeValue: value: str frequency: int example_products: List[str] @dataclass class TemplateValidation: matched_sector: str matched_attribute: str alignment_score: float @dataclass class DiscoveredAttribute: name: str dimension: str # "Primary", "Secondary", "Tertiary" confidence: float # 0.0-1.0 frequency: int discovered_from: List[str] # ["product_titles", "product_descriptions", ...] values: List[AttributeValue] template_validation: Optional[TemplateValidation] @dataclass class LowConfideryDiscovery: name: str confidence: float reason: str @dataclass class AnalysisNotes: total_products_analyzed: int total_categories: int total_tags: int extraction_method: str model_used: str @dataclass class AttributeExtractionResult: analysis_id: int site_id: int timestamp: str analysis_confidence: float attributes: List[DiscoveredAttribute] low_confidence_discoveries: List[LowConfideryDiscovery] analysis_notes: AnalysisNotes ``` ### 2.4 Gap Analysis Service **File:** `sag/services/gap_analysis_service.py` **Class:** `GapAnalysisService` **Method:** `analyze_gap(site_data: SiteAnalysisPayload, blueprint: SAGBlueprint) -> GapAnalysisReport` **Purpose:** Compare existing site structure against SAG blueprint to identify gaps. **Analysis Dimensions:** 1. **Attribute Coverage Gap** - SAG blueprint specifies X attributes - Site currently has Y custom attributes assigned to products - Gap: Missing attributes or low coverage (% of products with attribute values) 2. **Hub Page Gap** - Blueprint specifies Z cluster hubs - Site analysis identifies M existing pages - Gap: Missing hub pages (authority pages for attribute clusters) 3. **Term Landing Page Gap** - Blueprint specifies N attribute values requiring term landing pages - Site has existing category/tag pages - Gap: Missing term landing pages (one per attribute value) 4. **Blog Content Gap** - Blueprint specifies recommended blog posts per cluster - Site has P existing blog posts - Gap: Blog content aligned to clusters and keyword targets 5. **Internal Linking Gap** - Blueprint specifies internal linking strategy - Site has current internal link structure - Gap: Missing cross-cluster and term-to-hub links 6. **Product Enrichment Gap** - Products lacking attribute assignments - Products missing description optimization - Products missing images 7. **Technical SEO Gap** - Missing schema markup for products - Category pages lacking optimization - Menu structure not optimized for crawlability **Output Structure:** ```json { "analysis_id": 42, "site_id": 7, "blueprint_id": 15, "timestamp": "2026-03-23T14:30:00Z", "summary": { "products_current": 50, "products_gap": 0, "attributes_current": 3, "attributes_blueprint": 8, "attributes_gap": 5, "hub_pages_current": 2, "hub_pages_blueprint": 4, "hub_pages_gap": 2, "term_pages_current": 12, "term_pages_blueprint": 35, "term_pages_gap": 23, "blog_posts_current": 8, "blog_posts_blueprint": 24, "blog_posts_gap": 16, "overall_gap_percentage": 62 }, "attributes_gap_detail": [ { "attribute": "Target Area", "coverage_current": "100% (50/50)", "coverage_blueprint": "100% (50/50)", "gap": "None — attribute well-covered" }, { "attribute": "Device Type", "coverage_current": "80% (40/50)", "coverage_blueprint": "100% (50/50)", "gap": "10 products missing Device Type assignment" } ], "hub_pages_gap_detail": [ { "cluster": "Foot Massagers", "status": "EXISTS", "url": "/shop/foot-massagers", "optimization_notes": "Good; consider adding testimonials section" }, { "cluster": "Neck & Shoulder Relief", "status": "MISSING", "recommendation": "Create hub page at /neck-shoulder-relief" } ], "term_pages_gap_detail": [ { "attribute": "Target Area", "term": "Neck", "status": "MISSING", "recommendation": "Create term page at /target-area/neck (products filter + blog links)" } ], "blog_posts_gap_detail": [ { "cluster": "Foot Massagers", "recommended_posts": [ "Best Foot Massagers for Neuropathy", "How to Use Shiatsu Foot Massagers", "Foot Massage Benefits" ], "existing_posts": [ "Foot Massage 101" ], "gap": 2 } ], "internal_linking_gap": { "status": "High gaps identified", "recommendation": "Blueprint specifies 3-5 internal links per hub page; current average: 1.2", "priority_links": [ "Neck hub → Foot hub (shared body region cluster)", "Device Type pages → Hub pages", "Blog posts → Related term pages" ] }, "actionable_recommendations": [ "IMMEDIATE: Assign Device Type to 10 untagged products", "WEEK 1: Create 2 missing hub pages", "WEEK 2: Create 23 term landing pages via script", "WEEK 3: Bulk create 16 blog posts (outline + AI generation)", "WEEK 4: Implement internal linking strategy" ] } ``` ### 2.5 Product Auto-Tagging Service **File:** `sag/services/auto_tagger_service.py` **Class:** `ProductAutoTagger` **Method:** `generate_tag_suggestions(products: List[Product], attributes: List[DiscoveredAttribute], blueprint: SAGBlueprint) -> List[TagSuggestion]` **Purpose:** Generate batch product-to-attribute assignments based on product titles/descriptions. **Algorithm:** 1. For each product: - Extract key terms from title and description - Match against attribute values (fuzzy matching allowed) - Score confidence for each attribute assignment - Rank by confidence 2. For each attribute: - Verify assignment makes semantic sense - Check for conflicting assignments (e.g., can't be both "Shiatsu" and "EMS") - Return ranked list 3. Group by product for review UI **Output Structure:** ```json { "batch_id": 23, "site_id": 7, "blueprint_id": 15, "timestamp": "2026-03-23T14:30:00Z", "total_products": 50, "total_suggestions": 87, "suggestions": [ { "product_id": 123, "product_title": "Nekteck Foot Massager with Heat", "proposed_tags": [ { "attribute": "Target Area", "value": "Foot", "confidence": 0.98, "reasoning": "Title contains 'Foot Massager'" }, { "attribute": "Device Type", "value": "Shiatsu", "confidence": 0.82, "reasoning": "Description mentions shiatsu nodes" }, { "attribute": "Heat Setting", "value": "Heated", "confidence": 0.95, "reasoning": "Title explicitly states 'with Heat'" } ], "status": "pending_review" } ], "summary": { "high_confidence_suggestions": 72, "medium_confidence_suggestions": 12, "low_confidence_suggestions": 3, "conflicts_detected": 0, "ready_to_apply": true } } ``` --- ## 3. APIs & Endpoints ### 3.1 Backend API Endpoints All endpoints are authenticated via `Authorization: Bearer {IGNY8_API_TOKEN}` header. #### POST /api/v1/sag/sites/{site_id}/analyze/ **Purpose:** Trigger comprehensive site analysis (async). **Request:** ```json { "include_draft_products": false, "product_limit": 500, "sector_template_id": null, "webhook_url": "optional_https_url_for_completion_notification" } ``` **Response:** 202 Accepted ```json { "task_id": "celery_task_uuid", "site_id": 7, "status": "queued", "estimated_duration_seconds": 120, "check_status_url": "/api/v1/sag/sites/{site_id}/analysis-status/?task_id={task_id}" } ``` **Error Responses:** - 400: Invalid parameters - 401: Unauthorized - 404: Site not found - 429: Rate limited (max 1 analysis per 30 minutes per site) --- #### GET /api/v1/sag/sites/{site_id}/analysis-status/ **Purpose:** Check analysis progress. **Query Parameters:** - `task_id` (required): Celery task ID from analysis trigger **Response:** 200 OK ```json { "task_id": "celery_task_uuid", "site_id": 7, "status": "processing", "progress_percent": 45, "current_step": "Analyzing product attributes", "elapsed_seconds": 32, "estimated_remaining_seconds": 48 } ``` **Status Values:** - `queued` — waiting to start - `processing` — actively analyzing - `complete` — analysis finished - `failed` — analysis error (see error message) --- #### GET /api/v1/sag/sites/{site_id}/analysis-results/ **Purpose:** Retrieve completed analysis results. **Response:** 200 OK ```json { "analysis_id": 42, "site_id": 7, "timestamp": "2026-03-23T14:30:00Z", "site_data_summary": { "total_products": 50, "total_categories": 8, "total_pages": 12, "total_posts": 8 }, "extracted_attributes": { "analysis_confidence": 0.82, "attributes_count": 8, "attributes": [ { "name": "Target Area", "dimension": "Primary", "confidence": 0.95, ... } ] }, "gap_analysis": { "overall_gap_percentage": 62, "summary": { ... } }, "status": "ready_for_review" } ``` **Status Values:** - `ready_for_review` — user should review before confirming - `confirmed` — user has accepted analysis - `archived` — superceded by newer analysis --- #### POST /api/v1/sag/sites/{site_id}/confirm-analysis/ **Purpose:** User confirms analysis; creates SAG blueprint. **Request:** ```json { "analysis_id": 42, "approved_attributes": [ { "name": "Target Area", "approved_values": ["Neck", "Back", "Foot"], "exclude_values": [] } ], "confirmed_by_user_id": 3 } ``` **Response:** 201 Created ```json { "blueprint_id": 15, "site_id": 7, "analysis_id": 42, "status": "created", "attributes_count": 8, "attribute_values_count": 45, "created_at": "2026-03-23T14:32:00Z", "next_steps": [ "Review auto-tagging suggestions", "Approve product tags", "Start content pipeline (01E)" ] } ``` --- #### GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/ **Purpose:** Retrieve product auto-tagging suggestions. **Query Parameters:** - `blueprint_id` (required): ID of confirmed blueprint - `confidence_min` (optional): Filter by minimum confidence (0.0-1.0, default 0.6) - `limit` (optional): Max suggestions per product (default 5) **Response:** 200 OK ```json { "batch_id": 23, "blueprint_id": 15, "total_suggestions": 87, "suggestions": [ { "product_id": 123, "product_title": "Nekteck Foot Massager", "proposed_tags": [ { "attribute": "Target Area", "value": "Foot", "confidence": 0.98, "reasoning": "Title contains 'Foot Massager'" } ] } ] } ``` --- #### POST /api/v1/sag/sites/{site_id}/auto-tag/apply/ **Purpose:** Apply approved product tags to site (async bulk operation). **Request:** ```json { "blueprint_id": 15, "approved_suggestions": [ { "product_id": 123, "approved_tags": [ { "attribute": "Target Area", "value": "Foot" } ] } ], "skip_existing_values": true } ``` **Response:** 202 Accepted ```json { "task_id": "celery_task_uuid", "site_id": 7, "blueprint_id": 15, "status": "processing", "products_to_tag": 47, "tags_to_apply": 87, "check_status_url": "/api/v1/sag/sites/{site_id}/auto-tag/status/?task_id={task_id}" } ``` --- #### GET /api/v1/sag/sites/{site_id}/auto-tag/status/ **Purpose:** Check auto-tagging progress. **Query Parameters:** - `task_id` (required): Celery task ID **Response:** 200 OK ```json { "task_id": "celery_task_uuid", "site_id": 7, "status": "processing", "progress_percent": 62, "products_tagged": 29, "total_products": 47, "tags_applied": 54, "estimated_remaining_seconds": 30 } ``` --- ### 3.2 WordPress Plugin Endpoint #### GET /wp-json/igny8/v1/sag/site-analysis **Purpose:** Collect comprehensive site data for analysis. **Headers:** - `Authorization: Bearer {IGNY8_API_TOKEN}` - `X-IGNY8-Request-ID: {uuid}` (optional, for request tracking) **Query Parameters:** - `limit_products`: int (1-1000, default 500) - `include_drafts`: boolean (default false) - `cache_ttl`: int (seconds, default 3600) **Response:** 200 OK ```json { "metadata": { "site_id": 7, "domain": "example-store.com", "wordpress_version": "6.4.2", "woocommerce_version": "8.5.0", "total_products": 50, "total_categories": 8, "total_pages": 12, "total_posts": 8, "active_plugins": ["woocommerce", "yoast-seo", ...], "theme": "storefront" }, "products": [ { "id": 123, "title": "Nekteck Foot Massager with Heat", "description": "Premium foot massage device...", "sku": "NEKTECK-FM-001", "price": 79.99, "categories": ["Foot Massagers", "Massage Devices"], "tags": ["heated", "cordless"], "custom_attributes": { "brand": ["Nekteck"], "color": ["Black"], "warranty": ["2 Year"] }, "image_urls": ["image1.jpg", "image2.jpg"] } ], "categories": [ { "id": 1, "name": "Foot Massagers", "slug": "foot-massagers", "parent_id": null, "description": "Electronic foot massage devices", "product_count": 12 } ], "taxonomies": [ { "name": "brand", "label": "Brand", "is_hierarchical": false, "terms": [ { "id": 1, "name": "Nekteck", "slug": "nekteck", "parent_id": null, "description": "", "count": 5 } ] } ], "pages": [ { "id": 1, "title": "Shop", "url": "/shop", "content_summary": "Browse our selection of massage devices", "page_type": "shop" } ], "posts": [ { "id": 1, "title": "Benefits of Foot Massage", "url": "/blog/foot-massage-benefits", "content_summary": "Learn why foot massage is beneficial...", "categories": ["Health"], "tags": ["foot", "massage"], "publish_date": "2026-03-15" } ], "menus": [ { "id": 1, "title": "Main Menu", "items": [ { "id": 1, "title": "Shop", "url": "/shop", "target": "_self", "parent_id": null } ] } ], "collected_at": "2026-03-23T14:30:00Z" } ``` **Error Responses:** - 400: Invalid query parameters - 401: Invalid or missing API token - 500: Plugin error (logged on WordPress side) **Performance:** - Response time target: <5 seconds for sites with <500 products - Data is cached for 1 hour (configurable via `cache_ttl`) - Uses WordPress transients API for caching --- ## 4. Implementation Steps ### Phase 1: Plugin Enhancement (Week 1) **Tasks:** 1. Create collector classes in `plugins/igny8-sync/includes/collectors/` - ProductCollector - CategoryCollector - TaxonomyCollector - AttributeCollector - PageCollector - PostCollector - MenuCollector - PluginCollector 2. Implement `DataCollectorInterface` - `collect()` method (fetches raw data) - `sanitize()` method (removes PII, normalizes format) - Error handling per collector 3. Add `/wp-json/igny8/v1/sag/site-analysis` endpoint - Route definition - Parameter validation - Response formatting - Caching logic 4. Add unit tests for collectors - Mock data tests - Error condition tests - Performance tests **Acceptance Criteria:** - Endpoint returns valid JSON payload matching schema - All 8 collectors implemented and tested - Response time <5 seconds for 500 products - Caching works correctly - Error handling tested --- ### Phase 2: AI Attribute Extraction (Week 1-2) **Tasks:** 1. Implement `attribute_extraction.py` - Text analysis functions - Pattern recognition logic - Confidence scoring - Validation against sector templates 2. Register with LLM framework - Implement `extract_site_attributes` function - Add input/output validation - Error handling (retry logic) 3. Create data models - DiscoveredAttribute - AttributeValue - TemplateValidation - AttributeExtractionResult 4. Add unit and integration tests - Mock LLM responses - Test with real site data - Confidence scoring validation - Performance tests (2-5 minute runtime) **Acceptance Criteria:** - Extracts 5-20 attributes from sample site data - Confidence scores accurate and meaningful - Sector template validation works - Low-confidence discoveries flagged - Results auditable (model used, reasoning provided) --- ### Phase 3: Gap Analysis Service (Week 2) **Tasks:** 1. Implement `gap_analysis_service.py` - GapAnalysisService class - analyze_gap() method - All 7 gap dimensions analyzed 2. Create gap analysis models - GapAnalysisReport - Recommendation structures - Detail sections 3. Integrate with blueprint comparison - Query SAG blueprint - Compare against site data - Calculate gap percentages 4. Add unit tests - Test each gap dimension - Test recommendation generation - Test report structure **Acceptance Criteria:** - All 7 gap dimensions analyzed - Report clearly identifies missing elements - Actionable recommendations provided - Report generated in <1 second --- ### Phase 4: API Endpoints (Week 2-3) **Tasks:** 1. Implement analysis trigger endpoint - POST /api/v1/sag/sites/{site_id}/analyze/ - Celery task queueing - Webhook support 2. Implement status check endpoint - GET /api/v1/sag/sites/{site_id}/analysis-status/ - Real-time progress updates 3. Implement results retrieval endpoint - GET /api/v1/sag/sites/{site_id}/analysis-results/ - Caching of results 4. Implement blueprint confirmation endpoint - POST /api/v1/sag/sites/{site_id}/confirm-analysis/ - Attribute approval logic - Blueprint creation 5. Add request/response validation - Marshmallow schemas - Error responses 6. Add authentication/authorization checks - API token validation - User site ownership verification **Acceptance Criteria:** - All 4 endpoints implemented - Endpoints return correct status codes - Validation working - Authentication required and checked - Error responses follow standard format --- ### Phase 5: Product Auto-Tagging (Week 3) **Tasks:** 1. Implement `auto_tagger_service.py` - ProductAutoTagger class - generate_tag_suggestions() method - Confidence scoring 2. Create auto-tagging endpoints - GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/ - POST /api/v1/sag/sites/{site_id}/auto-tag/apply/ - GET /api/v1/sag/sites/{site_id}/auto-tag/status/ 3. Implement Celery task for bulk tagging - Batch product processing - Conflict detection - Error handling 4. Add unit tests - Test suggestion generation - Test bulk tagging - Test conflict detection **Acceptance Criteria:** - Suggestions endpoint returns valid suggestions - Confidence scores reasonable (0.6+) - Bulk tagging applies tags correctly to products - Progress tracking works - 47+ products can be tagged in <2 minutes --- ### Phase 6: Frontend Components — React + TypeScript (Week 3-4) > **Tech Stack:** React ^19.0.0, TypeScript ~5.7.2, Vite ^6.1.0, Zustand ^5.0.8, Tailwind ^4.0.8 > All components are `.tsx` files in the `frontend/src/` directory. **Tasks:** 1. Implement SiteAnalysisPanel - Trigger analysis button - Progress indicator - Error messaging 2. Implement DiscoveredAttributesReview - Display discovered attributes - Show confidence scores - Allow approval/rejection per attribute - Show example products 3. Implement GapAnalysisReport - Visual representation of gaps - Actionable recommendations - Priority ordering 4. Implement AutoTagReviewPanel - Display product suggestions - Batch selection/deselection - Apply tags button - Progress tracking 5. Add styling and UX polish - Responsive design - Loading states - Error states - Success confirmations **Acceptance Criteria:** - All 4 components implemented - Responsive on desktop/tablet - Accessible (WCAG 2.1 AA) - User can complete workflow without errors - Loading/error states clearly communicated --- ### Phase 7: Integration & Testing (Week 4) **Tasks:** 1. End-to-end testing - Connect real WordPress site - Run full analysis workflow - Confirm blueprint created - Verify auto-tagging works 2. Performance testing - Benchmark analysis with various site sizes - Optimize slow operations - Load testing on API endpoints 3. Documentation - API documentation (OpenAPI/Swagger) - Plugin setup guide - User guide for Case 1 workflow - Developer setup guide 4. Bug fixing and refinement - Fix integration issues - Refine UI/UX based on testing - Improve error messages **Acceptance Criteria:** - End-to-end workflow works without errors - Performance meets targets (analysis <5 min for 500 products) - Documentation complete - All bugs fixed - Ready for beta testing --- ## 5. Acceptance Criteria ### 5.1 Functional Requirements **Site Data Collection:** - Plugin collects all 8 data types (products, categories, taxonomies, pages, posts, menus, attributes, metadata) - Data is valid JSON matching defined schema - All product titles/descriptions included - Custom attribute values extracted correctly - Menu hierarchy preserved **Attribute Extraction:** - AI identifies 5-20 attributes from site data - Confidence scores meaningful and accurate - Low-confidence discoveries flagged - Sector template validation working - Results include frequency counts and example products **Gap Analysis:** - All 7 gap dimensions analyzed - Missing hubs, term pages, blog posts clearly identified - Product attribute coverage calculated - Internal linking gaps identified - Actionable recommendations provided **Blueprint Creation:** - Confirmed analysis creates valid SAGBlueprint - Attributes and values recorded correctly - Gap analysis linked to blueprint - Blueprint feeds into cluster formation (01C) **Product Auto-Tagging:** - Suggestions generated for 90%+ of products - Confidence scores reasonable (0.6+) - Bulk tagging applies tags correctly - No data loss or corruption - Existing tags not overwritten (configurable) **API Endpoints:** - All 4 analysis endpoints implemented - All 3 auto-tagging endpoints implemented - Correct HTTP status codes - Valid error responses - Authentication required **Frontend Components:** - SiteAnalysisPanel triggers analysis and shows progress - DiscoveredAttributesReview allows attribute approval - GapAnalysisReport displays gaps clearly - AutoTagReviewPanel allows batch product tagging - All components responsive and accessible ### 5.2 Non-Functional Requirements **Performance:** - Site analysis completes in <5 minutes for typical sites (50-500 products) - WordPress plugin endpoint responds in <5 seconds - API endpoints respond in <2 seconds - Frontend components load in <3 seconds **Reliability:** - Plugin handles errors gracefully (missing products, etc.) - Partial failures return partial data with warnings - Celery tasks have retry logic - Webhook notifications reliable **Security:** - API token authentication required - User can only access own sites - No PII in logs - HTTPS enforced - Input validation on all endpoints **Scalability:** - Plugin handles 1000+ products - API handles 100+ concurrent analysis requests - Database indexes optimized for queries - Caching prevents redundant processing **Data Quality:** - Analysis results auditable (model used, timestamps, reasoning) - No duplicate attribute suggestions - Confidence scores calibrated - Low-confidence results flagged for review ### 5.3 User Experience Requirements **Clarity:** - User understands analysis process and time required - Gap analysis clearly shows what's missing - Recommendations are actionable - Error messages explain what went wrong **Simplicity:** - Workflow is 4-5 steps (analyze → review → confirm → auto-tag → apply) - One button to trigger analysis - Clear next steps after each stage **Feedback:** - Real-time progress updates during analysis - Success/error notifications - Ability to view raw analysis results - Audit trail of approvals --- ## 6. Claude Code Instructions ### 6.1 Skill Development **Skill Name:** `igny8-case1-analysis` **Version:** 2.0 **Prerequisites:** IGNY8 platform deployed, WordPress plugin v2.0+, Celery configured **Skill Workflow:** ```yaml Trigger: User connects existing WordPress site to IGNY8 Step 1: Collect Site Data - Call: POST /api/v1/sag/sites/{site_id}/analyze/ - Wait: Poll /api/v1/sag/sites/{site_id}/analysis-status/ every 10 seconds - Timeout: 5 minutes - Output: task_id for tracking Step 2: Retrieve Analysis Results - Call: GET /api/v1/sag/sites/{site_id}/analysis-results/ - Parse: extracted_attributes, gap_analysis - Display: DiscoveredAttributesReview panel - User action: Approve/reject attributes Step 3: Confirm Analysis - Call: POST /api/v1/sag/sites/{site_id}/confirm-analysis/ - Payload: approved_attributes from user review - Output: blueprint_id - Display: Gap analysis report - Next: Show auto-tagging recommendations Step 4: Generate Auto-Tag Suggestions - Call: GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/?blueprint_id={blueprint_id} - Display: AutoTagReviewPanel - User action: Select products to tag Step 5: Apply Auto-Tags - Call: POST /api/v1/sag/sites/{site_id}/auto-tag/apply/ - Wait: Poll /api/v1/sag/sites/{site_id}/auto-tag/status/ every 5 seconds - Timeout: 10 minutes - Output: Number of tags applied, products tagged Step 6: Complete & Next Steps - Display: Success message - Recommendations: Run cluster formation (01C), start content pipeline (01E) - Links: View blueprint, view gap report, start cluster creation ``` ### 6.2 Development Checklist **Code Quality:** - [ ] All functions have docstrings - [ ] Type hints on all function parameters and returns - [ ] Logging at DEBUG, INFO, WARNING levels as appropriate - [ ] Error handling with specific exception types - [ ] No hardcoded values (use config/env vars) **Testing:** - [ ] Unit tests for each service (>80% coverage) - [ ] Integration tests for API endpoints - [ ] Fixtures for sample site data - [ ] Mock LLM responses for deterministic tests - [ ] Performance tests for analysis (time and memory) **Documentation:** - [ ] Docstrings follow Google style - [ ] README with setup instructions - [ ] API documentation in OpenAPI format - [ ] Example requests/responses for each endpoint - [ ] Troubleshooting guide for common errors **Security:** - [ ] API token validation on all endpoints - [ ] User ownership checks before accessing site data - [ ] Input validation with Marshmallow - [ ] SQL injection prevention (use ORM) - [ ] No credentials in logs or errors **Performance:** - [ ] Database queries indexed - [ ] Caching implemented for plugin endpoint - [ ] Celery task optimization - [ ] LLM API call batching - [ ] Frontend component lazy loading ### 6.3 Debugging & Troubleshooting **Common Issues:** **Issue:** Analysis hangs or times out - Check: Celery worker status (`celery -A igny8_core inspect active`) - Check: Redis/message queue status - Check: LLM API rate limits - Solution: Reduce product limit, retry analysis **Issue:** Plugin endpoint returns partial data - Check: Specific collector failure (check logs) - Solution: Fix collector, re-run analysis (uses cache bypass) - Note: Partial data is returned if one collector fails **Issue:** Auto-tagging misses products - Check: Product title/description quality (missing keywords) - Check: Confidence threshold (lower if needed) - Solution: Review low-confidence suggestions, adjust threshold **Issue:** Gap analysis shows 100% gaps - Check: Blueprint created correctly - Check: Gap analysis query (verify site_id matches) - Solution: Re-run analysis, confirm blueprint ### 6.4 Integration Checkpoints **Integration with 01A (SAGBlueprint):** - Confirmed analysis creates SAGBlueprint via POST /api/v1/sag/sites/{site_id}/confirm-analysis/ - Blueprint includes extracted attributes and values - Blueprint links to analysis for audit trail - Blueprint ready for cluster formation (01C) **Integration with 01B (Sector Templates):** - Attribute extraction uses sector template for validation (optional parameter) - Alignment scores show how closely discovered attributes match template - Low-confidence discoveries flagged if they don't align with template - Template selection based on site category detection **Integration with 01C (Cluster Formation):** - Blueprint created from Case 1 analysis feeds into cluster formation - Attributes and values used to create cluster hierarchies - Cluster formation references blueprint_id for traceability - Can override clusters if needed **Integration with 01E (Content Pipeline):** - Blueprint creation triggers content pipeline pre-planning - Gap analysis informs content prioritization - Hub page templates created for missing clusters - Blog post outlines generated for content gaps **Integration with 01G (Health Monitoring):** - Analysis metrics stored for health dashboard - Gap analysis metrics tracked over time - Product attribute coverage tracked - Auto-tagging success rate monitored --- ## 7. Related Documents - **01A:** SAGBlueprint Definition — Output of Case 1 analysis - **01B:** Sector Templates — Used for attribute validation - **01C:** Cluster Formation — Consumes SAGBlueprint from Case 1 - **01D:** Case 2 Wizard — Alternative path for new sites - **01E:** Content Pipeline — Feeds blueprint and gap analysis - **01G:** Health Monitoring — Tracks analysis and enrichment metrics --- ## 8. Glossary - **SAG:** Semantic Attribute Grid — the structured product attribute framework - **Attribute:** A dimension of product information (e.g., "Target Area," "Device Type") - **Attribute Value:** A specific instance of an attribute (e.g., "Foot" for Target Area) - **Cluster:** A group of related attribute values forming a content hub - **Gap:** Missing element compared to SAG blueprint (hub pages, term pages, blog posts, etc.) - **Confidence Score:** AI's confidence in discovered attribute (0.0-1.0) - **Dimension:** Priority level of attribute (Primary, Secondary, Tertiary) - **Term Landing Page:** Single-page optimized for specific attribute value - **Hub Page:** Authority page for entire attribute cluster - **Auto-Tagging:** Bulk assignment of attributes to products --- **Document Status:** Ready for Development **Last Review:** 2026-03-23 **Next Review:** Post-Phase 2 Development