Files
igny8/v2/V2-Execution-Docs/01F-existing-site-analysis-case1.md
IGNY8 VPS (Salman) 128b186865 temproary docs uplaoded
2026-03-23 09:02:49 +00:00

1541 lines
41 KiB
Markdown

# 01F: IGNY8 Phase 1 — Existing Site Analysis (Case 1)
**Document Type:** Build Specification
**Phase:** Phase 1: Existing Site Analysis
**Use Case:** Case 1 (Users with existing sites)
**Status:** Active Development
**Last Updated:** 2026-03-23
---
## 1. Current State
### 1.1 Existing IGNY8 WordPress Plugin
The IGNY8 WordPress plugin is currently operational with the following capabilities:
**Current Data Collection:**
- Post status tracking
- Site metadata (domain, WordPress version, plugin count, theme)
- Keyword mapping and analysis
- Site structure analysis
- Taxonomy sync across registered taxonomies
- 7 active cron jobs managing periodic data updates
**Current Plugin Endpoint:**
- `GET /wp-json/igny8/v1/health` — basic health check
- Plugin location: WordPress plugins directory
- Sync frequency: Configurable via cron (daily default)
**Limitations:**
- Does not collect detailed product data (WooCommerce stores)
- Does not analyze product descriptions for attribute patterns
- No collection of custom attribute assignments
- No menu structure analysis
- No blog content summary extraction
- No confidence scoring for discovered patterns
- Manual attribute creation required post-analysis
### 1.2 Case 1 User Journey
**Trigger:** User logs into IGNY8 platform with existing WordPress site (WooCommerce-based)
**Current Flow:**
1. User connects WordPress site via API key
2. Plugin syncs basic site data
3. User manually creates SAG blueprint
4. User manually defines attributes
5. User manually tags existing products
**Desired Flow:**
1. User connects WordPress site via API key
2. Plugin collects comprehensive site data (products, categories, content)
3. AI automatically extracts attributes from product titles/descriptions
4. System generates SAG blueprint with discovered attributes
5. System performs gap analysis (what's missing vs. SAG template)
6. User reviews and confirms blueprint
7. System auto-tags existing products
8. Blueprint feeds into content pipeline (01E) and cluster formation (01C)
### 1.3 Dependencies & Prerequisites
- WordPress 5.8+ with WooCommerce 5.0+
- IGNY8 plugin v2.0+ installed and activated
- OpenAI API or compatible LLM for attribute extraction
- Celery for async task processing (analysis may take 2-5 minutes)
- Database schema supports site analysis metadata storage
- Sector templates (01B) available for validation
---
## 2. What to Build
### 2.1 Enhanced Plugin: Site Data Collection
**Objective:** Extend WordPress plugin to collect comprehensive site data for SAG analysis.
**New Plugin Endpoint:**
```
GET /wp-json/igny8/v1/sag/site-analysis
Headers: Authorization: Bearer {IGNY8_API_TOKEN}
Query Parameters:
- limit_products: 500 (max products to analyze; default 500)
- include_drafts: false (include draft products; default false)
- cache_ttl: 3600 (cache results for N seconds; default 3600)
Response: 200 OK with payload (see section 2.3)
```
**Data Collection Modules:**
| Module | Responsibility | Data Returned |
|--------|-----------------|----------------|
| ProductCollector | Extract all products with metadata | titles, descriptions, prices, categories, tags, images, custom attributes, sku |
| CategoryCollector | Map product category hierarchy | names, slugs, parent-child hierarchy, descriptions, product counts |
| TaxonomyCollector | Enumerate all custom taxonomies | taxonomy names, all registered terms, term hierarchies, term metadata |
| AttributeCollector | Extract WooCommerce attributes | attribute names, attribute types (select/text/color), all values, product assignments |
| PageCollector | Identify key pages | titles, URLs, content summaries (first 500 chars), page type detection |
| PostCollector | Extract blog posts | titles, URLs, content summaries, categories, tags, publish date |
| MenuCollector | Analyze navigation structure | menu items, hierarchy, target URLs/categories |
| PluginCollector | Document site technical stack | active plugins, theme, WordPress version, WooCommerce version |
**Implementation:**
- Location: `plugins/igny8-sync/includes/collectors/`
- Each collector implements `DataCollectorInterface` with `collect()` and `sanitize()` methods
- Data sanitization: Remove PII, HTML tags, limit text length
- Error handling: Log failures per collector, return partial data if one collector fails
- Performance: Optimize queries to avoid site slowdown (use transients, batch operations)
**Plugin Cron Job Addition:**
- New job: `igny8_sync_sag_site_analysis` (optional, runs if user triggers analysis)
- Frequency: On-demand via API call, not scheduled
- Timeout: 60 seconds (analysis itself happens server-side via Celery)
### 2.2 AI Attribute Extraction Service
**File:** `sag/ai_functions/attribute_extraction.py`
**Register Key:** `extract_site_attributes`
**Input Type:** SiteAnalysisPayload
**Output Type:** AttributeExtractionResult
**Function Signature:**
```python
def extract_site_attributes(
site_data: SiteAnalysisPayload,
sector_template: Optional[SectorTemplate] = None,
confidence_threshold: float = 0.6,
max_attributes: int = 20
) -> AttributeExtractionResult:
"""
Analyze site data to discover attributes.
Args:
site_data: Raw site data from WordPress plugin
sector_template: Optional sector template for validation
confidence_threshold: Min confidence to include attribute (0.0-1.0)
max_attributes: Max attributes to return
Returns:
AttributeExtractionResult with discovered attributes, frequencies, confidence scores
"""
```
**Algorithm:**
1. **Text Analysis Phase**
- Concatenate product titles and descriptions
- Apply tokenization and noun phrase extraction
- Identify recurring modifiers and descriptors
- Extract from category names and tags
- Extract from custom attribute values (if any exist)
2. **Pattern Recognition Phase**
- Group similar terms (e.g., "back pain" + "back relief" + "lower back" → "back/spine")
- Calculate frequency across product dataset
- Identify dimensional axes (e.g., "target area," "device type")
- Score statistical significance
3. **Validation Phase**
- Cross-reference against sector template (if provided)
- Validate against common attribute taxonomies
- Flag conflicting or ambiguous discoveries
- Assign confidence scores based on:
- Frequency (how often appears)
- Consistency (appears across multiple products)
- Specificity (not too vague)
- Template alignment (matches known attributes)
4. **Ranking Phase**
- Rank by frequency and confidence
- Assign dimensionality (Primary/Secondary/Tertiary)
- Cap results at `max_attributes`
**Output Structure:**
```json
{
"analysis_id": "uuid",
"site_id": "uuid",
"timestamp": "2026-03-23T14:30:00Z",
"analysis_confidence": 0.82,
"attributes": [
{
"name": "Target Area",
"dimension": "Primary",
"confidence": 0.95,
"frequency": 32,
"discovered_from": ["product_titles", "product_descriptions", "categories"],
"values": [
{
"value": "Neck",
"frequency": 12,
"example_products": ["Product A", "Product B"]
},
{
"value": "Back",
"frequency": 8,
"example_products": ["Product C"]
},
{
"value": "Foot",
"frequency": 25,
"example_products": ["Product D", "Product E"]
}
],
"template_validation": {
"matched_sector": "massage_devices",
"matched_attribute": "body_region",
"alignment_score": 0.98
}
},
{
"name": "Device Type",
"dimension": "Primary",
"confidence": 0.88,
"frequency": 28,
"discovered_from": ["product_titles", "product_descriptions"],
"values": [
{
"value": "Shiatsu",
"frequency": 18,
"example_products": ["Product F"]
},
{
"value": "EMS",
"frequency": 7,
"example_products": ["Product G"]
},
{
"value": "Percussion",
"frequency": 3,
"example_products": ["Product H"]
}
],
"template_validation": {
"matched_sector": "massage_devices",
"matched_attribute": "therapy_type",
"alignment_score": 0.91
}
},
{
"name": "Heat Setting",
"dimension": "Secondary",
"confidence": 0.72,
"frequency": 15,
"discovered_from": ["product_descriptions"],
"values": [
{
"value": "Heated",
"frequency": 15,
"example_products": ["Product I", "Product J"]
}
],
"template_validation": {
"matched_sector": "massage_devices",
"matched_attribute": "heat_enabled",
"alignment_score": 0.85
}
}
],
"low_confidence_discoveries": [
{
"name": "Brand",
"confidence": 0.55,
"reason": "High variability, many single-mention values"
}
],
"analysis_notes": {
"total_products_analyzed": 50,
"total_categories": 8,
"total_tags": 23,
"extraction_method": "llm_analysis",
"model_used": "gpt-4-turbo"
}
}
```
**Error Handling:**
- Insufficient data: Log warning, return empty attributes list
- LLM API failure: Retry with exponential backoff (3 retries)
- Timeout (>5 minutes): Abort and return partial results
- Invalid sector template: Log error, continue analysis without validation
**Performance Considerations:**
- Cache sector templates in memory
- Batch LLM calls (process 5-10 products per API call)
- Store extraction results in database for audit trail
- Return results within 2-5 minutes for typical sites
### 2.3 Data Models
#### SiteAnalysisPayload
```python
from dataclasses import dataclass
from typing import List, Dict, Optional
@dataclass
class Product:
id: str
title: str
description: str
sku: str
price: float
categories: List[str]
tags: List[str]
custom_attributes: Dict[str, List[str]]
image_urls: List[str]
@dataclass
class Category:
id: str
name: str
slug: str
parent_id: Optional[str]
description: str
product_count: int
@dataclass
class Taxonomy:
name: str
label: str
is_hierarchical: bool
terms: List['Term']
@dataclass
class Term:
id: str
name: str
slug: str
parent_id: Optional[str]
description: str
count: int
@dataclass
class Page:
id: str
title: str
url: str
content_summary: str
page_type: str # e.g., "shop", "landing", "faq"
@dataclass
class Post:
id: str
title: str
url: str
content_summary: str
categories: List[str]
tags: List[str]
publish_date: str
@dataclass
class MenuItem:
id: str
title: str
url: str
target: str
parent_id: Optional[str]
@dataclass
class SiteMetadata:
site_id: str
domain: str
wordpress_version: str
woocommerce_version: str
total_products: int
total_categories: int
total_pages: int
total_posts: int
active_plugins: List[str]
theme: str
@dataclass
class SiteAnalysisPayload:
metadata: SiteMetadata
products: List[Product]
categories: List[Category]
taxonomies: List[Taxonomy]
pages: List[Page]
posts: List[Post]
menus: List[MenuItem]
collected_at: str # ISO 8601 timestamp
```
#### AttributeExtractionResult
```python
@dataclass
class AttributeValue:
value: str
frequency: int
example_products: List[str]
@dataclass
class TemplateValidation:
matched_sector: str
matched_attribute: str
alignment_score: float
@dataclass
class DiscoveredAttribute:
name: str
dimension: str # "Primary", "Secondary", "Tertiary"
confidence: float # 0.0-1.0
frequency: int
discovered_from: List[str] # ["product_titles", "product_descriptions", ...]
values: List[AttributeValue]
template_validation: Optional[TemplateValidation]
@dataclass
class LowConfideryDiscovery:
name: str
confidence: float
reason: str
@dataclass
class AnalysisNotes:
total_products_analyzed: int
total_categories: int
total_tags: int
extraction_method: str
model_used: str
@dataclass
class AttributeExtractionResult:
analysis_id: str
site_id: str
timestamp: str
analysis_confidence: float
attributes: List[DiscoveredAttribute]
low_confidence_discoveries: List[LowConfideryDiscovery]
analysis_notes: AnalysisNotes
```
### 2.4 Gap Analysis Service
**File:** `sag/services/gap_analysis_service.py`
**Class:** `GapAnalysisService`
**Method:** `analyze_gap(site_data: SiteAnalysisPayload, blueprint: SAGBlueprint) -> GapAnalysisReport`
**Purpose:** Compare existing site structure against SAG blueprint to identify gaps.
**Analysis Dimensions:**
1. **Attribute Coverage Gap**
- SAG blueprint specifies X attributes
- Site currently has Y custom attributes assigned to products
- Gap: Missing attributes or low coverage (% of products with attribute values)
2. **Hub Page Gap**
- Blueprint specifies Z cluster hubs
- Site analysis identifies M existing pages
- Gap: Missing hub pages (authority pages for attribute clusters)
3. **Term Landing Page Gap**
- Blueprint specifies N attribute values requiring term landing pages
- Site has existing category/tag pages
- Gap: Missing term landing pages (one per attribute value)
4. **Blog Content Gap**
- Blueprint specifies recommended blog posts per cluster
- Site has P existing blog posts
- Gap: Blog content aligned to clusters and keyword targets
5. **Internal Linking Gap**
- Blueprint specifies internal linking strategy
- Site has current internal link structure
- Gap: Missing cross-cluster and term-to-hub links
6. **Product Enrichment Gap**
- Products lacking attribute assignments
- Products missing description optimization
- Products missing images
7. **Technical SEO Gap**
- Missing schema markup for products
- Category pages lacking optimization
- Menu structure not optimized for crawlability
**Output Structure:**
```json
{
"analysis_id": "uuid",
"site_id": "uuid",
"blueprint_id": "uuid",
"timestamp": "2026-03-23T14:30:00Z",
"summary": {
"products_current": 50,
"products_gap": 0,
"attributes_current": 3,
"attributes_blueprint": 8,
"attributes_gap": 5,
"hub_pages_current": 2,
"hub_pages_blueprint": 4,
"hub_pages_gap": 2,
"term_pages_current": 12,
"term_pages_blueprint": 35,
"term_pages_gap": 23,
"blog_posts_current": 8,
"blog_posts_blueprint": 24,
"blog_posts_gap": 16,
"overall_gap_percentage": 62
},
"attributes_gap_detail": [
{
"attribute": "Target Area",
"coverage_current": "100% (50/50)",
"coverage_blueprint": "100% (50/50)",
"gap": "None — attribute well-covered"
},
{
"attribute": "Device Type",
"coverage_current": "80% (40/50)",
"coverage_blueprint": "100% (50/50)",
"gap": "10 products missing Device Type assignment"
}
],
"hub_pages_gap_detail": [
{
"cluster": "Foot Massagers",
"status": "EXISTS",
"url": "/shop/foot-massagers",
"optimization_notes": "Good; consider adding testimonials section"
},
{
"cluster": "Neck & Shoulder Relief",
"status": "MISSING",
"recommendation": "Create hub page at /neck-shoulder-relief"
}
],
"term_pages_gap_detail": [
{
"attribute": "Target Area",
"term": "Neck",
"status": "MISSING",
"recommendation": "Create term page at /target-area/neck (products filter + blog links)"
}
],
"blog_posts_gap_detail": [
{
"cluster": "Foot Massagers",
"recommended_posts": [
"Best Foot Massagers for Neuropathy",
"How to Use Shiatsu Foot Massagers",
"Foot Massage Benefits"
],
"existing_posts": [
"Foot Massage 101"
],
"gap": 2
}
],
"internal_linking_gap": {
"status": "High gaps identified",
"recommendation": "Blueprint specifies 3-5 internal links per hub page; current average: 1.2",
"priority_links": [
"Neck hub → Foot hub (shared body region cluster)",
"Device Type pages → Hub pages",
"Blog posts → Related term pages"
]
},
"actionable_recommendations": [
"IMMEDIATE: Assign Device Type to 10 untagged products",
"WEEK 1: Create 2 missing hub pages",
"WEEK 2: Create 23 term landing pages via script",
"WEEK 3: Bulk create 16 blog posts (outline + AI generation)",
"WEEK 4: Implement internal linking strategy"
]
}
```
### 2.5 Product Auto-Tagging Service
**File:** `sag/services/auto_tagger_service.py`
**Class:** `ProductAutoTagger`
**Method:** `generate_tag_suggestions(products: List[Product], attributes: List[DiscoveredAttribute], blueprint: SAGBlueprint) -> List[TagSuggestion]`
**Purpose:** Generate batch product-to-attribute assignments based on product titles/descriptions.
**Algorithm:**
1. For each product:
- Extract key terms from title and description
- Match against attribute values (fuzzy matching allowed)
- Score confidence for each attribute assignment
- Rank by confidence
2. For each attribute:
- Verify assignment makes semantic sense
- Check for conflicting assignments (e.g., can't be both "Shiatsu" and "EMS")
- Return ranked list
3. Group by product for review UI
**Output Structure:**
```json
{
"batch_id": "uuid",
"site_id": "uuid",
"blueprint_id": "uuid",
"timestamp": "2026-03-23T14:30:00Z",
"total_products": 50,
"total_suggestions": 87,
"suggestions": [
{
"product_id": "woo_123",
"product_title": "Nekteck Foot Massager with Heat",
"proposed_tags": [
{
"attribute": "Target Area",
"value": "Foot",
"confidence": 0.98,
"reasoning": "Title contains 'Foot Massager'"
},
{
"attribute": "Device Type",
"value": "Shiatsu",
"confidence": 0.82,
"reasoning": "Description mentions shiatsu nodes"
},
{
"attribute": "Heat Setting",
"value": "Heated",
"confidence": 0.95,
"reasoning": "Title explicitly states 'with Heat'"
}
],
"status": "pending_review"
}
],
"summary": {
"high_confidence_suggestions": 72,
"medium_confidence_suggestions": 12,
"low_confidence_suggestions": 3,
"conflicts_detected": 0,
"ready_to_apply": true
}
}
```
---
## 3. APIs & Endpoints
### 3.1 Backend API Endpoints
All endpoints are authenticated via `Authorization: Bearer {IGNY8_API_TOKEN}` header.
#### POST /api/v1/sag/sites/{site_id}/analyze/
**Purpose:** Trigger comprehensive site analysis (async).
**Request:**
```json
{
"include_draft_products": false,
"product_limit": 500,
"sector_template_id": "optional_uuid",
"webhook_url": "optional_https_url_for_completion_notification"
}
```
**Response:** 202 Accepted
```json
{
"task_id": "celery_task_uuid",
"site_id": "site_uuid",
"status": "queued",
"estimated_duration_seconds": 120,
"check_status_url": "/api/v1/sag/sites/{site_id}/analysis-status/?task_id={task_id}"
}
```
**Error Responses:**
- 400: Invalid parameters
- 401: Unauthorized
- 404: Site not found
- 429: Rate limited (max 1 analysis per 30 minutes per site)
---
#### GET /api/v1/sag/sites/{site_id}/analysis-status/
**Purpose:** Check analysis progress.
**Query Parameters:**
- `task_id` (required): Celery task ID from analysis trigger
**Response:** 200 OK
```json
{
"task_id": "celery_task_uuid",
"site_id": "site_uuid",
"status": "processing",
"progress_percent": 45,
"current_step": "Analyzing product attributes",
"elapsed_seconds": 32,
"estimated_remaining_seconds": 48
}
```
**Status Values:**
- `queued` — waiting to start
- `processing` — actively analyzing
- `complete` — analysis finished
- `failed` — analysis error (see error message)
---
#### GET /api/v1/sag/sites/{site_id}/analysis-results/
**Purpose:** Retrieve completed analysis results.
**Response:** 200 OK
```json
{
"analysis_id": "uuid",
"site_id": "site_uuid",
"timestamp": "2026-03-23T14:30:00Z",
"site_data_summary": {
"total_products": 50,
"total_categories": 8,
"total_pages": 12,
"total_posts": 8
},
"extracted_attributes": {
"analysis_confidence": 0.82,
"attributes_count": 8,
"attributes": [
{ "name": "Target Area", "dimension": "Primary", "confidence": 0.95, ... }
]
},
"gap_analysis": {
"overall_gap_percentage": 62,
"summary": { ... }
},
"status": "ready_for_review"
}
```
**Status Values:**
- `ready_for_review` — user should review before confirming
- `confirmed` — user has accepted analysis
- `archived` — superceded by newer analysis
---
#### POST /api/v1/sag/sites/{site_id}/confirm-analysis/
**Purpose:** User confirms analysis; creates SAG blueprint.
**Request:**
```json
{
"analysis_id": "uuid",
"approved_attributes": [
{
"name": "Target Area",
"approved_values": ["Neck", "Back", "Foot"],
"exclude_values": []
}
],
"confirmed_by_user_id": "user_uuid"
}
```
**Response:** 201 Created
```json
{
"blueprint_id": "uuid",
"site_id": "site_uuid",
"analysis_id": "uuid",
"status": "created",
"attributes_count": 8,
"attribute_values_count": 45,
"created_at": "2026-03-23T14:32:00Z",
"next_steps": [
"Review auto-tagging suggestions",
"Approve product tags",
"Start content pipeline (01E)"
]
}
```
---
#### GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/
**Purpose:** Retrieve product auto-tagging suggestions.
**Query Parameters:**
- `blueprint_id` (required): ID of confirmed blueprint
- `confidence_min` (optional): Filter by minimum confidence (0.0-1.0, default 0.6)
- `limit` (optional): Max suggestions per product (default 5)
**Response:** 200 OK
```json
{
"batch_id": "uuid",
"blueprint_id": "blueprint_uuid",
"total_suggestions": 87,
"suggestions": [
{
"product_id": "woo_123",
"product_title": "Nekteck Foot Massager",
"proposed_tags": [
{
"attribute": "Target Area",
"value": "Foot",
"confidence": 0.98,
"reasoning": "Title contains 'Foot Massager'"
}
]
}
]
}
```
---
#### POST /api/v1/sag/sites/{site_id}/auto-tag/apply/
**Purpose:** Apply approved product tags to site (async bulk operation).
**Request:**
```json
{
"blueprint_id": "uuid",
"approved_suggestions": [
{
"product_id": "woo_123",
"approved_tags": [
{
"attribute": "Target Area",
"value": "Foot"
}
]
}
],
"skip_existing_values": true
}
```
**Response:** 202 Accepted
```json
{
"task_id": "celery_task_uuid",
"site_id": "site_uuid",
"blueprint_id": "blueprint_uuid",
"status": "processing",
"products_to_tag": 47,
"tags_to_apply": 87,
"check_status_url": "/api/v1/sag/sites/{site_id}/auto-tag/status/?task_id={task_id}"
}
```
---
#### GET /api/v1/sag/sites/{site_id}/auto-tag/status/
**Purpose:** Check auto-tagging progress.
**Query Parameters:**
- `task_id` (required): Celery task ID
**Response:** 200 OK
```json
{
"task_id": "celery_task_uuid",
"site_id": "site_uuid",
"status": "processing",
"progress_percent": 62,
"products_tagged": 29,
"total_products": 47,
"tags_applied": 54,
"estimated_remaining_seconds": 30
}
```
---
### 3.2 WordPress Plugin Endpoint
#### GET /wp-json/igny8/v1/sag/site-analysis
**Purpose:** Collect comprehensive site data for analysis.
**Headers:**
- `Authorization: Bearer {IGNY8_API_TOKEN}`
- `X-IGNY8-Request-ID: {uuid}` (optional, for request tracking)
**Query Parameters:**
- `limit_products`: int (1-1000, default 500)
- `include_drafts`: boolean (default false)
- `cache_ttl`: int (seconds, default 3600)
**Response:** 200 OK
```json
{
"metadata": {
"site_id": "uuid",
"domain": "example-store.com",
"wordpress_version": "6.4.2",
"woocommerce_version": "8.5.0",
"total_products": 50,
"total_categories": 8,
"total_pages": 12,
"total_posts": 8,
"active_plugins": ["woocommerce", "yoast-seo", ...],
"theme": "storefront"
},
"products": [
{
"id": "woo_123",
"title": "Nekteck Foot Massager with Heat",
"description": "Premium foot massage device...",
"sku": "NEKTECK-FM-001",
"price": 79.99,
"categories": ["Foot Massagers", "Massage Devices"],
"tags": ["heated", "cordless"],
"custom_attributes": {
"brand": ["Nekteck"],
"color": ["Black"],
"warranty": ["2 Year"]
},
"image_urls": ["image1.jpg", "image2.jpg"]
}
],
"categories": [
{
"id": "cat_1",
"name": "Foot Massagers",
"slug": "foot-massagers",
"parent_id": null,
"description": "Electronic foot massage devices",
"product_count": 12
}
],
"taxonomies": [
{
"name": "brand",
"label": "Brand",
"is_hierarchical": false,
"terms": [
{
"id": "brand_1",
"name": "Nekteck",
"slug": "nekteck",
"parent_id": null,
"description": "",
"count": 5
}
]
}
],
"pages": [
{
"id": "page_1",
"title": "Shop",
"url": "/shop",
"content_summary": "Browse our selection of massage devices",
"page_type": "shop"
}
],
"posts": [
{
"id": "post_1",
"title": "Benefits of Foot Massage",
"url": "/blog/foot-massage-benefits",
"content_summary": "Learn why foot massage is beneficial...",
"categories": ["Health"],
"tags": ["foot", "massage"],
"publish_date": "2026-03-15"
}
],
"menus": [
{
"id": "menu_1",
"title": "Main Menu",
"items": [
{
"id": "item_1",
"title": "Shop",
"url": "/shop",
"target": "_self",
"parent_id": null
}
]
}
],
"collected_at": "2026-03-23T14:30:00Z"
}
```
**Error Responses:**
- 400: Invalid query parameters
- 401: Invalid or missing API token
- 500: Plugin error (logged on WordPress side)
**Performance:**
- Response time target: <5 seconds for sites with <500 products
- Data is cached for 1 hour (configurable via `cache_ttl`)
- Uses WordPress transients API for caching
---
## 4. Implementation Steps
### Phase 1: Plugin Enhancement (Week 1)
**Tasks:**
1. Create collector classes in `plugins/igny8-sync/includes/collectors/`
- ProductCollector
- CategoryCollector
- TaxonomyCollector
- AttributeCollector
- PageCollector
- PostCollector
- MenuCollector
- PluginCollector
2. Implement `DataCollectorInterface`
- `collect()` method (fetches raw data)
- `sanitize()` method (removes PII, normalizes format)
- Error handling per collector
3. Add `/wp-json/igny8/v1/sag/site-analysis` endpoint
- Route definition
- Parameter validation
- Response formatting
- Caching logic
4. Add unit tests for collectors
- Mock data tests
- Error condition tests
- Performance tests
**Acceptance Criteria:**
- Endpoint returns valid JSON payload matching schema
- All 8 collectors implemented and tested
- Response time <5 seconds for 500 products
- Caching works correctly
- Error handling tested
---
### Phase 2: AI Attribute Extraction (Week 1-2)
**Tasks:**
1. Implement `attribute_extraction.py`
- Text analysis functions
- Pattern recognition logic
- Confidence scoring
- Validation against sector templates
2. Register with LLM framework
- Implement `extract_site_attributes` function
- Add input/output validation
- Error handling (retry logic)
3. Create data models
- DiscoveredAttribute
- AttributeValue
- TemplateValidation
- AttributeExtractionResult
4. Add unit and integration tests
- Mock LLM responses
- Test with real site data
- Confidence scoring validation
- Performance tests (2-5 minute runtime)
**Acceptance Criteria:**
- Extracts 5-20 attributes from sample site data
- Confidence scores accurate and meaningful
- Sector template validation works
- Low-confidence discoveries flagged
- Results auditable (model used, reasoning provided)
---
### Phase 3: Gap Analysis Service (Week 2)
**Tasks:**
1. Implement `gap_analysis_service.py`
- GapAnalysisService class
- analyze_gap() method
- All 7 gap dimensions analyzed
2. Create gap analysis models
- GapAnalysisReport
- Recommendation structures
- Detail sections
3. Integrate with blueprint comparison
- Query SAG blueprint
- Compare against site data
- Calculate gap percentages
4. Add unit tests
- Test each gap dimension
- Test recommendation generation
- Test report structure
**Acceptance Criteria:**
- All 7 gap dimensions analyzed
- Report clearly identifies missing elements
- Actionable recommendations provided
- Report generated in <1 second
---
### Phase 4: API Endpoints (Week 2-3)
**Tasks:**
1. Implement analysis trigger endpoint
- POST /api/v1/sag/sites/{site_id}/analyze/
- Celery task queueing
- Webhook support
2. Implement status check endpoint
- GET /api/v1/sag/sites/{site_id}/analysis-status/
- Real-time progress updates
3. Implement results retrieval endpoint
- GET /api/v1/sag/sites/{site_id}/analysis-results/
- Caching of results
4. Implement blueprint confirmation endpoint
- POST /api/v1/sag/sites/{site_id}/confirm-analysis/
- Attribute approval logic
- Blueprint creation
5. Add request/response validation
- Marshmallow schemas
- Error responses
6. Add authentication/authorization checks
- API token validation
- User site ownership verification
**Acceptance Criteria:**
- All 4 endpoints implemented
- Endpoints return correct status codes
- Validation working
- Authentication required and checked
- Error responses follow standard format
---
### Phase 5: Product Auto-Tagging (Week 3)
**Tasks:**
1. Implement `auto_tagger_service.py`
- ProductAutoTagger class
- generate_tag_suggestions() method
- Confidence scoring
2. Create auto-tagging endpoints
- GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/
- POST /api/v1/sag/sites/{site_id}/auto-tag/apply/
- GET /api/v1/sag/sites/{site_id}/auto-tag/status/
3. Implement Celery task for bulk tagging
- Batch product processing
- Conflict detection
- Error handling
4. Add unit tests
- Test suggestion generation
- Test bulk tagging
- Test conflict detection
**Acceptance Criteria:**
- Suggestions endpoint returns valid suggestions
- Confidence scores reasonable (0.6+)
- Bulk tagging applies tags correctly to products
- Progress tracking works
- 47+ products can be tagged in <2 minutes
---
### Phase 6: Frontend Components (Week 3-4)
**Tasks:**
1. Implement SiteAnalysisPanel
- Trigger analysis button
- Progress indicator
- Error messaging
2. Implement DiscoveredAttributesReview
- Display discovered attributes
- Show confidence scores
- Allow approval/rejection per attribute
- Show example products
3. Implement GapAnalysisReport
- Visual representation of gaps
- Actionable recommendations
- Priority ordering
4. Implement AutoTagReviewPanel
- Display product suggestions
- Batch selection/deselection
- Apply tags button
- Progress tracking
5. Add styling and UX polish
- Responsive design
- Loading states
- Error states
- Success confirmations
**Acceptance Criteria:**
- All 4 components implemented
- Responsive on desktop/tablet
- Accessible (WCAG 2.1 AA)
- User can complete workflow without errors
- Loading/error states clearly communicated
---
### Phase 7: Integration & Testing (Week 4)
**Tasks:**
1. End-to-end testing
- Connect real WordPress site
- Run full analysis workflow
- Confirm blueprint created
- Verify auto-tagging works
2. Performance testing
- Benchmark analysis with various site sizes
- Optimize slow operations
- Load testing on API endpoints
3. Documentation
- API documentation (OpenAPI/Swagger)
- Plugin setup guide
- User guide for Case 1 workflow
- Developer setup guide
4. Bug fixing and refinement
- Fix integration issues
- Refine UI/UX based on testing
- Improve error messages
**Acceptance Criteria:**
- End-to-end workflow works without errors
- Performance meets targets (analysis <5 min for 500 products)
- Documentation complete
- All bugs fixed
- Ready for beta testing
---
## 5. Acceptance Criteria
### 5.1 Functional Requirements
**Site Data Collection:**
- Plugin collects all 8 data types (products, categories, taxonomies, pages, posts, menus, attributes, metadata)
- Data is valid JSON matching defined schema
- All product titles/descriptions included
- Custom attribute values extracted correctly
- Menu hierarchy preserved
**Attribute Extraction:**
- AI identifies 5-20 attributes from site data
- Confidence scores meaningful and accurate
- Low-confidence discoveries flagged
- Sector template validation working
- Results include frequency counts and example products
**Gap Analysis:**
- All 7 gap dimensions analyzed
- Missing hubs, term pages, blog posts clearly identified
- Product attribute coverage calculated
- Internal linking gaps identified
- Actionable recommendations provided
**Blueprint Creation:**
- Confirmed analysis creates valid SAGBlueprint
- Attributes and values recorded correctly
- Gap analysis linked to blueprint
- Blueprint feeds into cluster formation (01C)
**Product Auto-Tagging:**
- Suggestions generated for 90%+ of products
- Confidence scores reasonable (0.6+)
- Bulk tagging applies tags correctly
- No data loss or corruption
- Existing tags not overwritten (configurable)
**API Endpoints:**
- All 4 analysis endpoints implemented
- All 3 auto-tagging endpoints implemented
- Correct HTTP status codes
- Valid error responses
- Authentication required
**Frontend Components:**
- SiteAnalysisPanel triggers analysis and shows progress
- DiscoveredAttributesReview allows attribute approval
- GapAnalysisReport displays gaps clearly
- AutoTagReviewPanel allows batch product tagging
- All components responsive and accessible
### 5.2 Non-Functional Requirements
**Performance:**
- Site analysis completes in <5 minutes for typical sites (50-500 products)
- WordPress plugin endpoint responds in <5 seconds
- API endpoints respond in <2 seconds
- Frontend components load in <3 seconds
**Reliability:**
- Plugin handles errors gracefully (missing products, etc.)
- Partial failures return partial data with warnings
- Celery tasks have retry logic
- Webhook notifications reliable
**Security:**
- API token authentication required
- User can only access own sites
- No PII in logs
- HTTPS enforced
- Input validation on all endpoints
**Scalability:**
- Plugin handles 1000+ products
- API handles 100+ concurrent analysis requests
- Database indexes optimized for queries
- Caching prevents redundant processing
**Data Quality:**
- Analysis results auditable (model used, timestamps, reasoning)
- No duplicate attribute suggestions
- Confidence scores calibrated
- Low-confidence results flagged for review
### 5.3 User Experience Requirements
**Clarity:**
- User understands analysis process and time required
- Gap analysis clearly shows what's missing
- Recommendations are actionable
- Error messages explain what went wrong
**Simplicity:**
- Workflow is 4-5 steps (analyze → review → confirm → auto-tag → apply)
- One button to trigger analysis
- Clear next steps after each stage
**Feedback:**
- Real-time progress updates during analysis
- Success/error notifications
- Ability to view raw analysis results
- Audit trail of approvals
---
## 6. Claude Code Instructions
### 6.1 Skill Development
**Skill Name:** `igny8-case1-analysis`
**Version:** 2.0
**Prerequisites:** IGNY8 platform deployed, WordPress plugin v2.0+, Celery configured
**Skill Workflow:**
```yaml
Trigger: User connects existing WordPress site to IGNY8
Step 1: Collect Site Data
- Call: POST /api/v1/sag/sites/{site_id}/analyze/
- Wait: Poll /api/v1/sag/sites/{site_id}/analysis-status/ every 10 seconds
- Timeout: 5 minutes
- Output: task_id for tracking
Step 2: Retrieve Analysis Results
- Call: GET /api/v1/sag/sites/{site_id}/analysis-results/
- Parse: extracted_attributes, gap_analysis
- Display: DiscoveredAttributesReview panel
- User action: Approve/reject attributes
Step 3: Confirm Analysis
- Call: POST /api/v1/sag/sites/{site_id}/confirm-analysis/
- Payload: approved_attributes from user review
- Output: blueprint_id
- Display: Gap analysis report
- Next: Show auto-tagging recommendations
Step 4: Generate Auto-Tag Suggestions
- Call: GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/?blueprint_id={blueprint_id}
- Display: AutoTagReviewPanel
- User action: Select products to tag
Step 5: Apply Auto-Tags
- Call: POST /api/v1/sag/sites/{site_id}/auto-tag/apply/
- Wait: Poll /api/v1/sag/sites/{site_id}/auto-tag/status/ every 5 seconds
- Timeout: 10 minutes
- Output: Number of tags applied, products tagged
Step 6: Complete & Next Steps
- Display: Success message
- Recommendations: Run cluster formation (01C), start content pipeline (01E)
- Links: View blueprint, view gap report, start cluster creation
```
### 6.2 Development Checklist
**Code Quality:**
- [ ] All functions have docstrings
- [ ] Type hints on all function parameters and returns
- [ ] Logging at DEBUG, INFO, WARNING levels as appropriate
- [ ] Error handling with specific exception types
- [ ] No hardcoded values (use config/env vars)
**Testing:**
- [ ] Unit tests for each service (>80% coverage)
- [ ] Integration tests for API endpoints
- [ ] Fixtures for sample site data
- [ ] Mock LLM responses for deterministic tests
- [ ] Performance tests for analysis (time and memory)
**Documentation:**
- [ ] Docstrings follow Google style
- [ ] README with setup instructions
- [ ] API documentation in OpenAPI format
- [ ] Example requests/responses for each endpoint
- [ ] Troubleshooting guide for common errors
**Security:**
- [ ] API token validation on all endpoints
- [ ] User ownership checks before accessing site data
- [ ] Input validation with Marshmallow
- [ ] SQL injection prevention (use ORM)
- [ ] No credentials in logs or errors
**Performance:**
- [ ] Database queries indexed
- [ ] Caching implemented for plugin endpoint
- [ ] Celery task optimization
- [ ] LLM API call batching
- [ ] Frontend component lazy loading
### 6.3 Debugging & Troubleshooting
**Common Issues:**
**Issue:** Analysis hangs or times out
- Check: Celery worker status (`celery -A sag inspect active`)
- Check: Redis/message queue status
- Check: LLM API rate limits
- Solution: Reduce product limit, retry analysis
**Issue:** Plugin endpoint returns partial data
- Check: Specific collector failure (check logs)
- Solution: Fix collector, re-run analysis (uses cache bypass)
- Note: Partial data is returned if one collector fails
**Issue:** Auto-tagging misses products
- Check: Product title/description quality (missing keywords)
- Check: Confidence threshold (lower if needed)
- Solution: Review low-confidence suggestions, adjust threshold
**Issue:** Gap analysis shows 100% gaps
- Check: Blueprint created correctly
- Check: Gap analysis query (verify site_id matches)
- Solution: Re-run analysis, confirm blueprint
### 6.4 Integration Checkpoints
**Integration with 01A (SAGBlueprint):**
- Confirmed analysis creates SAGBlueprint via POST /api/v1/sag/sites/{site_id}/confirm-analysis/
- Blueprint includes extracted attributes and values
- Blueprint links to analysis for audit trail
- Blueprint ready for cluster formation (01C)
**Integration with 01B (Sector Templates):**
- Attribute extraction uses sector template for validation (optional parameter)
- Alignment scores show how closely discovered attributes match template
- Low-confidence discoveries flagged if they don't align with template
- Template selection based on site category detection
**Integration with 01C (Cluster Formation):**
- Blueprint created from Case 1 analysis feeds into cluster formation
- Attributes and values used to create cluster hierarchies
- Cluster formation references blueprint_id for traceability
- Can override clusters if needed
**Integration with 01E (Content Pipeline):**
- Blueprint creation triggers content pipeline pre-planning
- Gap analysis informs content prioritization
- Hub page templates created for missing clusters
- Blog post outlines generated for content gaps
**Integration with 01G (Health Monitoring):**
- Analysis metrics stored for health dashboard
- Gap analysis metrics tracked over time
- Product attribute coverage tracked
- Auto-tagging success rate monitored
---
## 7. Related Documents
- **01A:** SAGBlueprint Definition — Output of Case 1 analysis
- **01B:** Sector Templates — Used for attribute validation
- **01C:** Cluster Formation — Consumes SAGBlueprint from Case 1
- **01D:** Case 2 Wizard — Alternative path for new sites
- **01E:** Content Pipeline — Feeds blueprint and gap analysis
- **01G:** Health Monitoring — Tracks analysis and enrichment metrics
---
## 8. Glossary
- **SAG:** Semantic Attribute Grid — the structured product attribute framework
- **Attribute:** A dimension of product information (e.g., "Target Area," "Device Type")
- **Attribute Value:** A specific instance of an attribute (e.g., "Foot" for Target Area)
- **Cluster:** A group of related attribute values forming a content hub
- **Gap:** Missing element compared to SAG blueprint (hub pages, term pages, blog posts, etc.)
- **Confidence Score:** AI's confidence in discovered attribute (0.0-1.0)
- **Dimension:** Priority level of attribute (Primary, Secondary, Tertiary)
- **Term Landing Page:** Single-page optimized for specific attribute value
- **Hub Page:** Authority page for entire attribute cluster
- **Auto-Tagging:** Bulk assignment of attributes to products
---
**Document Status:** Ready for Development
**Last Review:** 2026-03-23
**Next Review:** Post-Phase 2 Development