1541 lines
41 KiB
Markdown
1541 lines
41 KiB
Markdown
# 01F: IGNY8 Phase 1 — Existing Site Analysis (Case 1)
|
|
|
|
**Document Type:** Build Specification
|
|
**Phase:** Phase 1: Existing Site Analysis
|
|
**Use Case:** Case 1 (Users with existing sites)
|
|
**Status:** Active Development
|
|
**Last Updated:** 2026-03-23
|
|
|
|
---
|
|
|
|
## 1. Current State
|
|
|
|
### 1.1 Existing IGNY8 WordPress Plugin
|
|
|
|
The IGNY8 WordPress plugin is currently operational with the following capabilities:
|
|
|
|
**Current Data Collection:**
|
|
- Post status tracking
|
|
- Site metadata (domain, WordPress version, plugin count, theme)
|
|
- Keyword mapping and analysis
|
|
- Site structure analysis
|
|
- Taxonomy sync across registered taxonomies
|
|
- 7 active cron jobs managing periodic data updates
|
|
|
|
**Current Plugin Endpoint:**
|
|
- `GET /wp-json/igny8/v1/health` — basic health check
|
|
- Plugin location: WordPress plugins directory
|
|
- Sync frequency: Configurable via cron (daily default)
|
|
|
|
**Limitations:**
|
|
- Does not collect detailed product data (WooCommerce stores)
|
|
- Does not analyze product descriptions for attribute patterns
|
|
- No collection of custom attribute assignments
|
|
- No menu structure analysis
|
|
- No blog content summary extraction
|
|
- No confidence scoring for discovered patterns
|
|
- Manual attribute creation required post-analysis
|
|
|
|
### 1.2 Case 1 User Journey
|
|
|
|
**Trigger:** User logs into IGNY8 platform with existing WordPress site (WooCommerce-based)
|
|
|
|
**Current Flow:**
|
|
1. User connects WordPress site via API key
|
|
2. Plugin syncs basic site data
|
|
3. User manually creates SAG blueprint
|
|
4. User manually defines attributes
|
|
5. User manually tags existing products
|
|
|
|
**Desired Flow:**
|
|
1. User connects WordPress site via API key
|
|
2. Plugin collects comprehensive site data (products, categories, content)
|
|
3. AI automatically extracts attributes from product titles/descriptions
|
|
4. System generates SAG blueprint with discovered attributes
|
|
5. System performs gap analysis (what's missing vs. SAG template)
|
|
6. User reviews and confirms blueprint
|
|
7. System auto-tags existing products
|
|
8. Blueprint feeds into content pipeline (01E) and cluster formation (01C)
|
|
|
|
### 1.3 Dependencies & Prerequisites
|
|
|
|
- WordPress 5.8+ with WooCommerce 5.0+
|
|
- IGNY8 plugin v2.0+ installed and activated
|
|
- OpenAI API or compatible LLM for attribute extraction
|
|
- Celery for async task processing (analysis may take 2-5 minutes)
|
|
- Database schema supports site analysis metadata storage
|
|
- Sector templates (01B) available for validation
|
|
|
|
---
|
|
|
|
## 2. What to Build
|
|
|
|
### 2.1 Enhanced Plugin: Site Data Collection
|
|
|
|
**Objective:** Extend WordPress plugin to collect comprehensive site data for SAG analysis.
|
|
|
|
**New Plugin Endpoint:**
|
|
|
|
```
|
|
GET /wp-json/igny8/v1/sag/site-analysis
|
|
Headers: Authorization: Bearer {IGNY8_API_TOKEN}
|
|
Query Parameters:
|
|
- limit_products: 500 (max products to analyze; default 500)
|
|
- include_drafts: false (include draft products; default false)
|
|
- cache_ttl: 3600 (cache results for N seconds; default 3600)
|
|
|
|
Response: 200 OK with payload (see section 2.3)
|
|
```
|
|
|
|
**Data Collection Modules:**
|
|
|
|
| Module | Responsibility | Data Returned |
|
|
|--------|-----------------|----------------|
|
|
| ProductCollector | Extract all products with metadata | titles, descriptions, prices, categories, tags, images, custom attributes, sku |
|
|
| CategoryCollector | Map product category hierarchy | names, slugs, parent-child hierarchy, descriptions, product counts |
|
|
| TaxonomyCollector | Enumerate all custom taxonomies | taxonomy names, all registered terms, term hierarchies, term metadata |
|
|
| AttributeCollector | Extract WooCommerce attributes | attribute names, attribute types (select/text/color), all values, product assignments |
|
|
| PageCollector | Identify key pages | titles, URLs, content summaries (first 500 chars), page type detection |
|
|
| PostCollector | Extract blog posts | titles, URLs, content summaries, categories, tags, publish date |
|
|
| MenuCollector | Analyze navigation structure | menu items, hierarchy, target URLs/categories |
|
|
| PluginCollector | Document site technical stack | active plugins, theme, WordPress version, WooCommerce version |
|
|
|
|
**Implementation:**
|
|
- Location: `plugins/igny8-sync/includes/collectors/`
|
|
- Each collector implements `DataCollectorInterface` with `collect()` and `sanitize()` methods
|
|
- Data sanitization: Remove PII, HTML tags, limit text length
|
|
- Error handling: Log failures per collector, return partial data if one collector fails
|
|
- Performance: Optimize queries to avoid site slowdown (use transients, batch operations)
|
|
|
|
**Plugin Cron Job Addition:**
|
|
- New job: `igny8_sync_sag_site_analysis` (optional, runs if user triggers analysis)
|
|
- Frequency: On-demand via API call, not scheduled
|
|
- Timeout: 60 seconds (analysis itself happens server-side via Celery)
|
|
|
|
### 2.2 AI Attribute Extraction Service
|
|
|
|
**File:** `sag/ai_functions/attribute_extraction.py`
|
|
**Register Key:** `extract_site_attributes`
|
|
**Input Type:** SiteAnalysisPayload
|
|
**Output Type:** AttributeExtractionResult
|
|
|
|
**Function Signature:**
|
|
|
|
```python
|
|
def extract_site_attributes(
|
|
site_data: SiteAnalysisPayload,
|
|
sector_template: Optional[SectorTemplate] = None,
|
|
confidence_threshold: float = 0.6,
|
|
max_attributes: int = 20
|
|
) -> AttributeExtractionResult:
|
|
"""
|
|
Analyze site data to discover attributes.
|
|
|
|
Args:
|
|
site_data: Raw site data from WordPress plugin
|
|
sector_template: Optional sector template for validation
|
|
confidence_threshold: Min confidence to include attribute (0.0-1.0)
|
|
max_attributes: Max attributes to return
|
|
|
|
Returns:
|
|
AttributeExtractionResult with discovered attributes, frequencies, confidence scores
|
|
"""
|
|
```
|
|
|
|
**Algorithm:**
|
|
|
|
1. **Text Analysis Phase**
|
|
- Concatenate product titles and descriptions
|
|
- Apply tokenization and noun phrase extraction
|
|
- Identify recurring modifiers and descriptors
|
|
- Extract from category names and tags
|
|
- Extract from custom attribute values (if any exist)
|
|
|
|
2. **Pattern Recognition Phase**
|
|
- Group similar terms (e.g., "back pain" + "back relief" + "lower back" → "back/spine")
|
|
- Calculate frequency across product dataset
|
|
- Identify dimensional axes (e.g., "target area," "device type")
|
|
- Score statistical significance
|
|
|
|
3. **Validation Phase**
|
|
- Cross-reference against sector template (if provided)
|
|
- Validate against common attribute taxonomies
|
|
- Flag conflicting or ambiguous discoveries
|
|
- Assign confidence scores based on:
|
|
- Frequency (how often appears)
|
|
- Consistency (appears across multiple products)
|
|
- Specificity (not too vague)
|
|
- Template alignment (matches known attributes)
|
|
|
|
4. **Ranking Phase**
|
|
- Rank by frequency and confidence
|
|
- Assign dimensionality (Primary/Secondary/Tertiary)
|
|
- Cap results at `max_attributes`
|
|
|
|
**Output Structure:**
|
|
|
|
```json
|
|
{
|
|
"analysis_id": "uuid",
|
|
"site_id": "uuid",
|
|
"timestamp": "2026-03-23T14:30:00Z",
|
|
"analysis_confidence": 0.82,
|
|
"attributes": [
|
|
{
|
|
"name": "Target Area",
|
|
"dimension": "Primary",
|
|
"confidence": 0.95,
|
|
"frequency": 32,
|
|
"discovered_from": ["product_titles", "product_descriptions", "categories"],
|
|
"values": [
|
|
{
|
|
"value": "Neck",
|
|
"frequency": 12,
|
|
"example_products": ["Product A", "Product B"]
|
|
},
|
|
{
|
|
"value": "Back",
|
|
"frequency": 8,
|
|
"example_products": ["Product C"]
|
|
},
|
|
{
|
|
"value": "Foot",
|
|
"frequency": 25,
|
|
"example_products": ["Product D", "Product E"]
|
|
}
|
|
],
|
|
"template_validation": {
|
|
"matched_sector": "massage_devices",
|
|
"matched_attribute": "body_region",
|
|
"alignment_score": 0.98
|
|
}
|
|
},
|
|
{
|
|
"name": "Device Type",
|
|
"dimension": "Primary",
|
|
"confidence": 0.88,
|
|
"frequency": 28,
|
|
"discovered_from": ["product_titles", "product_descriptions"],
|
|
"values": [
|
|
{
|
|
"value": "Shiatsu",
|
|
"frequency": 18,
|
|
"example_products": ["Product F"]
|
|
},
|
|
{
|
|
"value": "EMS",
|
|
"frequency": 7,
|
|
"example_products": ["Product G"]
|
|
},
|
|
{
|
|
"value": "Percussion",
|
|
"frequency": 3,
|
|
"example_products": ["Product H"]
|
|
}
|
|
],
|
|
"template_validation": {
|
|
"matched_sector": "massage_devices",
|
|
"matched_attribute": "therapy_type",
|
|
"alignment_score": 0.91
|
|
}
|
|
},
|
|
{
|
|
"name": "Heat Setting",
|
|
"dimension": "Secondary",
|
|
"confidence": 0.72,
|
|
"frequency": 15,
|
|
"discovered_from": ["product_descriptions"],
|
|
"values": [
|
|
{
|
|
"value": "Heated",
|
|
"frequency": 15,
|
|
"example_products": ["Product I", "Product J"]
|
|
}
|
|
],
|
|
"template_validation": {
|
|
"matched_sector": "massage_devices",
|
|
"matched_attribute": "heat_enabled",
|
|
"alignment_score": 0.85
|
|
}
|
|
}
|
|
],
|
|
"low_confidence_discoveries": [
|
|
{
|
|
"name": "Brand",
|
|
"confidence": 0.55,
|
|
"reason": "High variability, many single-mention values"
|
|
}
|
|
],
|
|
"analysis_notes": {
|
|
"total_products_analyzed": 50,
|
|
"total_categories": 8,
|
|
"total_tags": 23,
|
|
"extraction_method": "llm_analysis",
|
|
"model_used": "gpt-4-turbo"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Error Handling:**
|
|
- Insufficient data: Log warning, return empty attributes list
|
|
- LLM API failure: Retry with exponential backoff (3 retries)
|
|
- Timeout (>5 minutes): Abort and return partial results
|
|
- Invalid sector template: Log error, continue analysis without validation
|
|
|
|
**Performance Considerations:**
|
|
- Cache sector templates in memory
|
|
- Batch LLM calls (process 5-10 products per API call)
|
|
- Store extraction results in database for audit trail
|
|
- Return results within 2-5 minutes for typical sites
|
|
|
|
### 2.3 Data Models
|
|
|
|
#### SiteAnalysisPayload
|
|
|
|
```python
|
|
from dataclasses import dataclass
|
|
from typing import List, Dict, Optional
|
|
|
|
@dataclass
|
|
class Product:
|
|
id: str
|
|
title: str
|
|
description: str
|
|
sku: str
|
|
price: float
|
|
categories: List[str]
|
|
tags: List[str]
|
|
custom_attributes: Dict[str, List[str]]
|
|
image_urls: List[str]
|
|
|
|
@dataclass
|
|
class Category:
|
|
id: str
|
|
name: str
|
|
slug: str
|
|
parent_id: Optional[str]
|
|
description: str
|
|
product_count: int
|
|
|
|
@dataclass
|
|
class Taxonomy:
|
|
name: str
|
|
label: str
|
|
is_hierarchical: bool
|
|
terms: List['Term']
|
|
|
|
@dataclass
|
|
class Term:
|
|
id: str
|
|
name: str
|
|
slug: str
|
|
parent_id: Optional[str]
|
|
description: str
|
|
count: int
|
|
|
|
@dataclass
|
|
class Page:
|
|
id: str
|
|
title: str
|
|
url: str
|
|
content_summary: str
|
|
page_type: str # e.g., "shop", "landing", "faq"
|
|
|
|
@dataclass
|
|
class Post:
|
|
id: str
|
|
title: str
|
|
url: str
|
|
content_summary: str
|
|
categories: List[str]
|
|
tags: List[str]
|
|
publish_date: str
|
|
|
|
@dataclass
|
|
class MenuItem:
|
|
id: str
|
|
title: str
|
|
url: str
|
|
target: str
|
|
parent_id: Optional[str]
|
|
|
|
@dataclass
|
|
class SiteMetadata:
|
|
site_id: str
|
|
domain: str
|
|
wordpress_version: str
|
|
woocommerce_version: str
|
|
total_products: int
|
|
total_categories: int
|
|
total_pages: int
|
|
total_posts: int
|
|
active_plugins: List[str]
|
|
theme: str
|
|
|
|
@dataclass
|
|
class SiteAnalysisPayload:
|
|
metadata: SiteMetadata
|
|
products: List[Product]
|
|
categories: List[Category]
|
|
taxonomies: List[Taxonomy]
|
|
pages: List[Page]
|
|
posts: List[Post]
|
|
menus: List[MenuItem]
|
|
collected_at: str # ISO 8601 timestamp
|
|
```
|
|
|
|
#### AttributeExtractionResult
|
|
|
|
```python
|
|
@dataclass
|
|
class AttributeValue:
|
|
value: str
|
|
frequency: int
|
|
example_products: List[str]
|
|
|
|
@dataclass
|
|
class TemplateValidation:
|
|
matched_sector: str
|
|
matched_attribute: str
|
|
alignment_score: float
|
|
|
|
@dataclass
|
|
class DiscoveredAttribute:
|
|
name: str
|
|
dimension: str # "Primary", "Secondary", "Tertiary"
|
|
confidence: float # 0.0-1.0
|
|
frequency: int
|
|
discovered_from: List[str] # ["product_titles", "product_descriptions", ...]
|
|
values: List[AttributeValue]
|
|
template_validation: Optional[TemplateValidation]
|
|
|
|
@dataclass
|
|
class LowConfideryDiscovery:
|
|
name: str
|
|
confidence: float
|
|
reason: str
|
|
|
|
@dataclass
|
|
class AnalysisNotes:
|
|
total_products_analyzed: int
|
|
total_categories: int
|
|
total_tags: int
|
|
extraction_method: str
|
|
model_used: str
|
|
|
|
@dataclass
|
|
class AttributeExtractionResult:
|
|
analysis_id: str
|
|
site_id: str
|
|
timestamp: str
|
|
analysis_confidence: float
|
|
attributes: List[DiscoveredAttribute]
|
|
low_confidence_discoveries: List[LowConfideryDiscovery]
|
|
analysis_notes: AnalysisNotes
|
|
```
|
|
|
|
### 2.4 Gap Analysis Service
|
|
|
|
**File:** `sag/services/gap_analysis_service.py`
|
|
**Class:** `GapAnalysisService`
|
|
**Method:** `analyze_gap(site_data: SiteAnalysisPayload, blueprint: SAGBlueprint) -> GapAnalysisReport`
|
|
|
|
**Purpose:** Compare existing site structure against SAG blueprint to identify gaps.
|
|
|
|
**Analysis Dimensions:**
|
|
|
|
1. **Attribute Coverage Gap**
|
|
- SAG blueprint specifies X attributes
|
|
- Site currently has Y custom attributes assigned to products
|
|
- Gap: Missing attributes or low coverage (% of products with attribute values)
|
|
|
|
2. **Hub Page Gap**
|
|
- Blueprint specifies Z cluster hubs
|
|
- Site analysis identifies M existing pages
|
|
- Gap: Missing hub pages (authority pages for attribute clusters)
|
|
|
|
3. **Term Landing Page Gap**
|
|
- Blueprint specifies N attribute values requiring term landing pages
|
|
- Site has existing category/tag pages
|
|
- Gap: Missing term landing pages (one per attribute value)
|
|
|
|
4. **Blog Content Gap**
|
|
- Blueprint specifies recommended blog posts per cluster
|
|
- Site has P existing blog posts
|
|
- Gap: Blog content aligned to clusters and keyword targets
|
|
|
|
5. **Internal Linking Gap**
|
|
- Blueprint specifies internal linking strategy
|
|
- Site has current internal link structure
|
|
- Gap: Missing cross-cluster and term-to-hub links
|
|
|
|
6. **Product Enrichment Gap**
|
|
- Products lacking attribute assignments
|
|
- Products missing description optimization
|
|
- Products missing images
|
|
|
|
7. **Technical SEO Gap**
|
|
- Missing schema markup for products
|
|
- Category pages lacking optimization
|
|
- Menu structure not optimized for crawlability
|
|
|
|
**Output Structure:**
|
|
|
|
```json
|
|
{
|
|
"analysis_id": "uuid",
|
|
"site_id": "uuid",
|
|
"blueprint_id": "uuid",
|
|
"timestamp": "2026-03-23T14:30:00Z",
|
|
"summary": {
|
|
"products_current": 50,
|
|
"products_gap": 0,
|
|
"attributes_current": 3,
|
|
"attributes_blueprint": 8,
|
|
"attributes_gap": 5,
|
|
"hub_pages_current": 2,
|
|
"hub_pages_blueprint": 4,
|
|
"hub_pages_gap": 2,
|
|
"term_pages_current": 12,
|
|
"term_pages_blueprint": 35,
|
|
"term_pages_gap": 23,
|
|
"blog_posts_current": 8,
|
|
"blog_posts_blueprint": 24,
|
|
"blog_posts_gap": 16,
|
|
"overall_gap_percentage": 62
|
|
},
|
|
"attributes_gap_detail": [
|
|
{
|
|
"attribute": "Target Area",
|
|
"coverage_current": "100% (50/50)",
|
|
"coverage_blueprint": "100% (50/50)",
|
|
"gap": "None — attribute well-covered"
|
|
},
|
|
{
|
|
"attribute": "Device Type",
|
|
"coverage_current": "80% (40/50)",
|
|
"coverage_blueprint": "100% (50/50)",
|
|
"gap": "10 products missing Device Type assignment"
|
|
}
|
|
],
|
|
"hub_pages_gap_detail": [
|
|
{
|
|
"cluster": "Foot Massagers",
|
|
"status": "EXISTS",
|
|
"url": "/shop/foot-massagers",
|
|
"optimization_notes": "Good; consider adding testimonials section"
|
|
},
|
|
{
|
|
"cluster": "Neck & Shoulder Relief",
|
|
"status": "MISSING",
|
|
"recommendation": "Create hub page at /neck-shoulder-relief"
|
|
}
|
|
],
|
|
"term_pages_gap_detail": [
|
|
{
|
|
"attribute": "Target Area",
|
|
"term": "Neck",
|
|
"status": "MISSING",
|
|
"recommendation": "Create term page at /target-area/neck (products filter + blog links)"
|
|
}
|
|
],
|
|
"blog_posts_gap_detail": [
|
|
{
|
|
"cluster": "Foot Massagers",
|
|
"recommended_posts": [
|
|
"Best Foot Massagers for Neuropathy",
|
|
"How to Use Shiatsu Foot Massagers",
|
|
"Foot Massage Benefits"
|
|
],
|
|
"existing_posts": [
|
|
"Foot Massage 101"
|
|
],
|
|
"gap": 2
|
|
}
|
|
],
|
|
"internal_linking_gap": {
|
|
"status": "High gaps identified",
|
|
"recommendation": "Blueprint specifies 3-5 internal links per hub page; current average: 1.2",
|
|
"priority_links": [
|
|
"Neck hub → Foot hub (shared body region cluster)",
|
|
"Device Type pages → Hub pages",
|
|
"Blog posts → Related term pages"
|
|
]
|
|
},
|
|
"actionable_recommendations": [
|
|
"IMMEDIATE: Assign Device Type to 10 untagged products",
|
|
"WEEK 1: Create 2 missing hub pages",
|
|
"WEEK 2: Create 23 term landing pages via script",
|
|
"WEEK 3: Bulk create 16 blog posts (outline + AI generation)",
|
|
"WEEK 4: Implement internal linking strategy"
|
|
]
|
|
}
|
|
```
|
|
|
|
### 2.5 Product Auto-Tagging Service
|
|
|
|
**File:** `sag/services/auto_tagger_service.py`
|
|
**Class:** `ProductAutoTagger`
|
|
**Method:** `generate_tag_suggestions(products: List[Product], attributes: List[DiscoveredAttribute], blueprint: SAGBlueprint) -> List[TagSuggestion]`
|
|
|
|
**Purpose:** Generate batch product-to-attribute assignments based on product titles/descriptions.
|
|
|
|
**Algorithm:**
|
|
|
|
1. For each product:
|
|
- Extract key terms from title and description
|
|
- Match against attribute values (fuzzy matching allowed)
|
|
- Score confidence for each attribute assignment
|
|
- Rank by confidence
|
|
|
|
2. For each attribute:
|
|
- Verify assignment makes semantic sense
|
|
- Check for conflicting assignments (e.g., can't be both "Shiatsu" and "EMS")
|
|
- Return ranked list
|
|
|
|
3. Group by product for review UI
|
|
|
|
**Output Structure:**
|
|
|
|
```json
|
|
{
|
|
"batch_id": "uuid",
|
|
"site_id": "uuid",
|
|
"blueprint_id": "uuid",
|
|
"timestamp": "2026-03-23T14:30:00Z",
|
|
"total_products": 50,
|
|
"total_suggestions": 87,
|
|
"suggestions": [
|
|
{
|
|
"product_id": "woo_123",
|
|
"product_title": "Nekteck Foot Massager with Heat",
|
|
"proposed_tags": [
|
|
{
|
|
"attribute": "Target Area",
|
|
"value": "Foot",
|
|
"confidence": 0.98,
|
|
"reasoning": "Title contains 'Foot Massager'"
|
|
},
|
|
{
|
|
"attribute": "Device Type",
|
|
"value": "Shiatsu",
|
|
"confidence": 0.82,
|
|
"reasoning": "Description mentions shiatsu nodes"
|
|
},
|
|
{
|
|
"attribute": "Heat Setting",
|
|
"value": "Heated",
|
|
"confidence": 0.95,
|
|
"reasoning": "Title explicitly states 'with Heat'"
|
|
}
|
|
],
|
|
"status": "pending_review"
|
|
}
|
|
],
|
|
"summary": {
|
|
"high_confidence_suggestions": 72,
|
|
"medium_confidence_suggestions": 12,
|
|
"low_confidence_suggestions": 3,
|
|
"conflicts_detected": 0,
|
|
"ready_to_apply": true
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 3. APIs & Endpoints
|
|
|
|
### 3.1 Backend API Endpoints
|
|
|
|
All endpoints are authenticated via `Authorization: Bearer {IGNY8_API_TOKEN}` header.
|
|
|
|
#### POST /api/v1/sag/sites/{site_id}/analyze/
|
|
|
|
**Purpose:** Trigger comprehensive site analysis (async).
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"include_draft_products": false,
|
|
"product_limit": 500,
|
|
"sector_template_id": "optional_uuid",
|
|
"webhook_url": "optional_https_url_for_completion_notification"
|
|
}
|
|
```
|
|
|
|
**Response:** 202 Accepted
|
|
```json
|
|
{
|
|
"task_id": "celery_task_uuid",
|
|
"site_id": "site_uuid",
|
|
"status": "queued",
|
|
"estimated_duration_seconds": 120,
|
|
"check_status_url": "/api/v1/sag/sites/{site_id}/analysis-status/?task_id={task_id}"
|
|
}
|
|
```
|
|
|
|
**Error Responses:**
|
|
- 400: Invalid parameters
|
|
- 401: Unauthorized
|
|
- 404: Site not found
|
|
- 429: Rate limited (max 1 analysis per 30 minutes per site)
|
|
|
|
---
|
|
|
|
#### GET /api/v1/sag/sites/{site_id}/analysis-status/
|
|
|
|
**Purpose:** Check analysis progress.
|
|
|
|
**Query Parameters:**
|
|
- `task_id` (required): Celery task ID from analysis trigger
|
|
|
|
**Response:** 200 OK
|
|
```json
|
|
{
|
|
"task_id": "celery_task_uuid",
|
|
"site_id": "site_uuid",
|
|
"status": "processing",
|
|
"progress_percent": 45,
|
|
"current_step": "Analyzing product attributes",
|
|
"elapsed_seconds": 32,
|
|
"estimated_remaining_seconds": 48
|
|
}
|
|
```
|
|
|
|
**Status Values:**
|
|
- `queued` — waiting to start
|
|
- `processing` — actively analyzing
|
|
- `complete` — analysis finished
|
|
- `failed` — analysis error (see error message)
|
|
|
|
---
|
|
|
|
#### GET /api/v1/sag/sites/{site_id}/analysis-results/
|
|
|
|
**Purpose:** Retrieve completed analysis results.
|
|
|
|
**Response:** 200 OK
|
|
```json
|
|
{
|
|
"analysis_id": "uuid",
|
|
"site_id": "site_uuid",
|
|
"timestamp": "2026-03-23T14:30:00Z",
|
|
"site_data_summary": {
|
|
"total_products": 50,
|
|
"total_categories": 8,
|
|
"total_pages": 12,
|
|
"total_posts": 8
|
|
},
|
|
"extracted_attributes": {
|
|
"analysis_confidence": 0.82,
|
|
"attributes_count": 8,
|
|
"attributes": [
|
|
{ "name": "Target Area", "dimension": "Primary", "confidence": 0.95, ... }
|
|
]
|
|
},
|
|
"gap_analysis": {
|
|
"overall_gap_percentage": 62,
|
|
"summary": { ... }
|
|
},
|
|
"status": "ready_for_review"
|
|
}
|
|
```
|
|
|
|
**Status Values:**
|
|
- `ready_for_review` — user should review before confirming
|
|
- `confirmed` — user has accepted analysis
|
|
- `archived` — superceded by newer analysis
|
|
|
|
---
|
|
|
|
#### POST /api/v1/sag/sites/{site_id}/confirm-analysis/
|
|
|
|
**Purpose:** User confirms analysis; creates SAG blueprint.
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"analysis_id": "uuid",
|
|
"approved_attributes": [
|
|
{
|
|
"name": "Target Area",
|
|
"approved_values": ["Neck", "Back", "Foot"],
|
|
"exclude_values": []
|
|
}
|
|
],
|
|
"confirmed_by_user_id": "user_uuid"
|
|
}
|
|
```
|
|
|
|
**Response:** 201 Created
|
|
```json
|
|
{
|
|
"blueprint_id": "uuid",
|
|
"site_id": "site_uuid",
|
|
"analysis_id": "uuid",
|
|
"status": "created",
|
|
"attributes_count": 8,
|
|
"attribute_values_count": 45,
|
|
"created_at": "2026-03-23T14:32:00Z",
|
|
"next_steps": [
|
|
"Review auto-tagging suggestions",
|
|
"Approve product tags",
|
|
"Start content pipeline (01E)"
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
#### GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/
|
|
|
|
**Purpose:** Retrieve product auto-tagging suggestions.
|
|
|
|
**Query Parameters:**
|
|
- `blueprint_id` (required): ID of confirmed blueprint
|
|
- `confidence_min` (optional): Filter by minimum confidence (0.0-1.0, default 0.6)
|
|
- `limit` (optional): Max suggestions per product (default 5)
|
|
|
|
**Response:** 200 OK
|
|
```json
|
|
{
|
|
"batch_id": "uuid",
|
|
"blueprint_id": "blueprint_uuid",
|
|
"total_suggestions": 87,
|
|
"suggestions": [
|
|
{
|
|
"product_id": "woo_123",
|
|
"product_title": "Nekteck Foot Massager",
|
|
"proposed_tags": [
|
|
{
|
|
"attribute": "Target Area",
|
|
"value": "Foot",
|
|
"confidence": 0.98,
|
|
"reasoning": "Title contains 'Foot Massager'"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
#### POST /api/v1/sag/sites/{site_id}/auto-tag/apply/
|
|
|
|
**Purpose:** Apply approved product tags to site (async bulk operation).
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"blueprint_id": "uuid",
|
|
"approved_suggestions": [
|
|
{
|
|
"product_id": "woo_123",
|
|
"approved_tags": [
|
|
{
|
|
"attribute": "Target Area",
|
|
"value": "Foot"
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"skip_existing_values": true
|
|
}
|
|
```
|
|
|
|
**Response:** 202 Accepted
|
|
```json
|
|
{
|
|
"task_id": "celery_task_uuid",
|
|
"site_id": "site_uuid",
|
|
"blueprint_id": "blueprint_uuid",
|
|
"status": "processing",
|
|
"products_to_tag": 47,
|
|
"tags_to_apply": 87,
|
|
"check_status_url": "/api/v1/sag/sites/{site_id}/auto-tag/status/?task_id={task_id}"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
#### GET /api/v1/sag/sites/{site_id}/auto-tag/status/
|
|
|
|
**Purpose:** Check auto-tagging progress.
|
|
|
|
**Query Parameters:**
|
|
- `task_id` (required): Celery task ID
|
|
|
|
**Response:** 200 OK
|
|
```json
|
|
{
|
|
"task_id": "celery_task_uuid",
|
|
"site_id": "site_uuid",
|
|
"status": "processing",
|
|
"progress_percent": 62,
|
|
"products_tagged": 29,
|
|
"total_products": 47,
|
|
"tags_applied": 54,
|
|
"estimated_remaining_seconds": 30
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 3.2 WordPress Plugin Endpoint
|
|
|
|
#### GET /wp-json/igny8/v1/sag/site-analysis
|
|
|
|
**Purpose:** Collect comprehensive site data for analysis.
|
|
|
|
**Headers:**
|
|
- `Authorization: Bearer {IGNY8_API_TOKEN}`
|
|
- `X-IGNY8-Request-ID: {uuid}` (optional, for request tracking)
|
|
|
|
**Query Parameters:**
|
|
- `limit_products`: int (1-1000, default 500)
|
|
- `include_drafts`: boolean (default false)
|
|
- `cache_ttl`: int (seconds, default 3600)
|
|
|
|
**Response:** 200 OK
|
|
```json
|
|
{
|
|
"metadata": {
|
|
"site_id": "uuid",
|
|
"domain": "example-store.com",
|
|
"wordpress_version": "6.4.2",
|
|
"woocommerce_version": "8.5.0",
|
|
"total_products": 50,
|
|
"total_categories": 8,
|
|
"total_pages": 12,
|
|
"total_posts": 8,
|
|
"active_plugins": ["woocommerce", "yoast-seo", ...],
|
|
"theme": "storefront"
|
|
},
|
|
"products": [
|
|
{
|
|
"id": "woo_123",
|
|
"title": "Nekteck Foot Massager with Heat",
|
|
"description": "Premium foot massage device...",
|
|
"sku": "NEKTECK-FM-001",
|
|
"price": 79.99,
|
|
"categories": ["Foot Massagers", "Massage Devices"],
|
|
"tags": ["heated", "cordless"],
|
|
"custom_attributes": {
|
|
"brand": ["Nekteck"],
|
|
"color": ["Black"],
|
|
"warranty": ["2 Year"]
|
|
},
|
|
"image_urls": ["image1.jpg", "image2.jpg"]
|
|
}
|
|
],
|
|
"categories": [
|
|
{
|
|
"id": "cat_1",
|
|
"name": "Foot Massagers",
|
|
"slug": "foot-massagers",
|
|
"parent_id": null,
|
|
"description": "Electronic foot massage devices",
|
|
"product_count": 12
|
|
}
|
|
],
|
|
"taxonomies": [
|
|
{
|
|
"name": "brand",
|
|
"label": "Brand",
|
|
"is_hierarchical": false,
|
|
"terms": [
|
|
{
|
|
"id": "brand_1",
|
|
"name": "Nekteck",
|
|
"slug": "nekteck",
|
|
"parent_id": null,
|
|
"description": "",
|
|
"count": 5
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"pages": [
|
|
{
|
|
"id": "page_1",
|
|
"title": "Shop",
|
|
"url": "/shop",
|
|
"content_summary": "Browse our selection of massage devices",
|
|
"page_type": "shop"
|
|
}
|
|
],
|
|
"posts": [
|
|
{
|
|
"id": "post_1",
|
|
"title": "Benefits of Foot Massage",
|
|
"url": "/blog/foot-massage-benefits",
|
|
"content_summary": "Learn why foot massage is beneficial...",
|
|
"categories": ["Health"],
|
|
"tags": ["foot", "massage"],
|
|
"publish_date": "2026-03-15"
|
|
}
|
|
],
|
|
"menus": [
|
|
{
|
|
"id": "menu_1",
|
|
"title": "Main Menu",
|
|
"items": [
|
|
{
|
|
"id": "item_1",
|
|
"title": "Shop",
|
|
"url": "/shop",
|
|
"target": "_self",
|
|
"parent_id": null
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"collected_at": "2026-03-23T14:30:00Z"
|
|
}
|
|
```
|
|
|
|
**Error Responses:**
|
|
- 400: Invalid query parameters
|
|
- 401: Invalid or missing API token
|
|
- 500: Plugin error (logged on WordPress side)
|
|
|
|
**Performance:**
|
|
- Response time target: <5 seconds for sites with <500 products
|
|
- Data is cached for 1 hour (configurable via `cache_ttl`)
|
|
- Uses WordPress transients API for caching
|
|
|
|
---
|
|
|
|
## 4. Implementation Steps
|
|
|
|
### Phase 1: Plugin Enhancement (Week 1)
|
|
|
|
**Tasks:**
|
|
1. Create collector classes in `plugins/igny8-sync/includes/collectors/`
|
|
- ProductCollector
|
|
- CategoryCollector
|
|
- TaxonomyCollector
|
|
- AttributeCollector
|
|
- PageCollector
|
|
- PostCollector
|
|
- MenuCollector
|
|
- PluginCollector
|
|
|
|
2. Implement `DataCollectorInterface`
|
|
- `collect()` method (fetches raw data)
|
|
- `sanitize()` method (removes PII, normalizes format)
|
|
- Error handling per collector
|
|
|
|
3. Add `/wp-json/igny8/v1/sag/site-analysis` endpoint
|
|
- Route definition
|
|
- Parameter validation
|
|
- Response formatting
|
|
- Caching logic
|
|
|
|
4. Add unit tests for collectors
|
|
- Mock data tests
|
|
- Error condition tests
|
|
- Performance tests
|
|
|
|
**Acceptance Criteria:**
|
|
- Endpoint returns valid JSON payload matching schema
|
|
- All 8 collectors implemented and tested
|
|
- Response time <5 seconds for 500 products
|
|
- Caching works correctly
|
|
- Error handling tested
|
|
|
|
---
|
|
|
|
### Phase 2: AI Attribute Extraction (Week 1-2)
|
|
|
|
**Tasks:**
|
|
1. Implement `attribute_extraction.py`
|
|
- Text analysis functions
|
|
- Pattern recognition logic
|
|
- Confidence scoring
|
|
- Validation against sector templates
|
|
|
|
2. Register with LLM framework
|
|
- Implement `extract_site_attributes` function
|
|
- Add input/output validation
|
|
- Error handling (retry logic)
|
|
|
|
3. Create data models
|
|
- DiscoveredAttribute
|
|
- AttributeValue
|
|
- TemplateValidation
|
|
- AttributeExtractionResult
|
|
|
|
4. Add unit and integration tests
|
|
- Mock LLM responses
|
|
- Test with real site data
|
|
- Confidence scoring validation
|
|
- Performance tests (2-5 minute runtime)
|
|
|
|
**Acceptance Criteria:**
|
|
- Extracts 5-20 attributes from sample site data
|
|
- Confidence scores accurate and meaningful
|
|
- Sector template validation works
|
|
- Low-confidence discoveries flagged
|
|
- Results auditable (model used, reasoning provided)
|
|
|
|
---
|
|
|
|
### Phase 3: Gap Analysis Service (Week 2)
|
|
|
|
**Tasks:**
|
|
1. Implement `gap_analysis_service.py`
|
|
- GapAnalysisService class
|
|
- analyze_gap() method
|
|
- All 7 gap dimensions analyzed
|
|
|
|
2. Create gap analysis models
|
|
- GapAnalysisReport
|
|
- Recommendation structures
|
|
- Detail sections
|
|
|
|
3. Integrate with blueprint comparison
|
|
- Query SAG blueprint
|
|
- Compare against site data
|
|
- Calculate gap percentages
|
|
|
|
4. Add unit tests
|
|
- Test each gap dimension
|
|
- Test recommendation generation
|
|
- Test report structure
|
|
|
|
**Acceptance Criteria:**
|
|
- All 7 gap dimensions analyzed
|
|
- Report clearly identifies missing elements
|
|
- Actionable recommendations provided
|
|
- Report generated in <1 second
|
|
|
|
---
|
|
|
|
### Phase 4: API Endpoints (Week 2-3)
|
|
|
|
**Tasks:**
|
|
1. Implement analysis trigger endpoint
|
|
- POST /api/v1/sag/sites/{site_id}/analyze/
|
|
- Celery task queueing
|
|
- Webhook support
|
|
|
|
2. Implement status check endpoint
|
|
- GET /api/v1/sag/sites/{site_id}/analysis-status/
|
|
- Real-time progress updates
|
|
|
|
3. Implement results retrieval endpoint
|
|
- GET /api/v1/sag/sites/{site_id}/analysis-results/
|
|
- Caching of results
|
|
|
|
4. Implement blueprint confirmation endpoint
|
|
- POST /api/v1/sag/sites/{site_id}/confirm-analysis/
|
|
- Attribute approval logic
|
|
- Blueprint creation
|
|
|
|
5. Add request/response validation
|
|
- Marshmallow schemas
|
|
- Error responses
|
|
|
|
6. Add authentication/authorization checks
|
|
- API token validation
|
|
- User site ownership verification
|
|
|
|
**Acceptance Criteria:**
|
|
- All 4 endpoints implemented
|
|
- Endpoints return correct status codes
|
|
- Validation working
|
|
- Authentication required and checked
|
|
- Error responses follow standard format
|
|
|
|
---
|
|
|
|
### Phase 5: Product Auto-Tagging (Week 3)
|
|
|
|
**Tasks:**
|
|
1. Implement `auto_tagger_service.py`
|
|
- ProductAutoTagger class
|
|
- generate_tag_suggestions() method
|
|
- Confidence scoring
|
|
|
|
2. Create auto-tagging endpoints
|
|
- GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/
|
|
- POST /api/v1/sag/sites/{site_id}/auto-tag/apply/
|
|
- GET /api/v1/sag/sites/{site_id}/auto-tag/status/
|
|
|
|
3. Implement Celery task for bulk tagging
|
|
- Batch product processing
|
|
- Conflict detection
|
|
- Error handling
|
|
|
|
4. Add unit tests
|
|
- Test suggestion generation
|
|
- Test bulk tagging
|
|
- Test conflict detection
|
|
|
|
**Acceptance Criteria:**
|
|
- Suggestions endpoint returns valid suggestions
|
|
- Confidence scores reasonable (0.6+)
|
|
- Bulk tagging applies tags correctly to products
|
|
- Progress tracking works
|
|
- 47+ products can be tagged in <2 minutes
|
|
|
|
---
|
|
|
|
### Phase 6: Frontend Components (Week 3-4)
|
|
|
|
**Tasks:**
|
|
1. Implement SiteAnalysisPanel
|
|
- Trigger analysis button
|
|
- Progress indicator
|
|
- Error messaging
|
|
|
|
2. Implement DiscoveredAttributesReview
|
|
- Display discovered attributes
|
|
- Show confidence scores
|
|
- Allow approval/rejection per attribute
|
|
- Show example products
|
|
|
|
3. Implement GapAnalysisReport
|
|
- Visual representation of gaps
|
|
- Actionable recommendations
|
|
- Priority ordering
|
|
|
|
4. Implement AutoTagReviewPanel
|
|
- Display product suggestions
|
|
- Batch selection/deselection
|
|
- Apply tags button
|
|
- Progress tracking
|
|
|
|
5. Add styling and UX polish
|
|
- Responsive design
|
|
- Loading states
|
|
- Error states
|
|
- Success confirmations
|
|
|
|
**Acceptance Criteria:**
|
|
- All 4 components implemented
|
|
- Responsive on desktop/tablet
|
|
- Accessible (WCAG 2.1 AA)
|
|
- User can complete workflow without errors
|
|
- Loading/error states clearly communicated
|
|
|
|
---
|
|
|
|
### Phase 7: Integration & Testing (Week 4)
|
|
|
|
**Tasks:**
|
|
1. End-to-end testing
|
|
- Connect real WordPress site
|
|
- Run full analysis workflow
|
|
- Confirm blueprint created
|
|
- Verify auto-tagging works
|
|
|
|
2. Performance testing
|
|
- Benchmark analysis with various site sizes
|
|
- Optimize slow operations
|
|
- Load testing on API endpoints
|
|
|
|
3. Documentation
|
|
- API documentation (OpenAPI/Swagger)
|
|
- Plugin setup guide
|
|
- User guide for Case 1 workflow
|
|
- Developer setup guide
|
|
|
|
4. Bug fixing and refinement
|
|
- Fix integration issues
|
|
- Refine UI/UX based on testing
|
|
- Improve error messages
|
|
|
|
**Acceptance Criteria:**
|
|
- End-to-end workflow works without errors
|
|
- Performance meets targets (analysis <5 min for 500 products)
|
|
- Documentation complete
|
|
- All bugs fixed
|
|
- Ready for beta testing
|
|
|
|
---
|
|
|
|
## 5. Acceptance Criteria
|
|
|
|
### 5.1 Functional Requirements
|
|
|
|
**Site Data Collection:**
|
|
- Plugin collects all 8 data types (products, categories, taxonomies, pages, posts, menus, attributes, metadata)
|
|
- Data is valid JSON matching defined schema
|
|
- All product titles/descriptions included
|
|
- Custom attribute values extracted correctly
|
|
- Menu hierarchy preserved
|
|
|
|
**Attribute Extraction:**
|
|
- AI identifies 5-20 attributes from site data
|
|
- Confidence scores meaningful and accurate
|
|
- Low-confidence discoveries flagged
|
|
- Sector template validation working
|
|
- Results include frequency counts and example products
|
|
|
|
**Gap Analysis:**
|
|
- All 7 gap dimensions analyzed
|
|
- Missing hubs, term pages, blog posts clearly identified
|
|
- Product attribute coverage calculated
|
|
- Internal linking gaps identified
|
|
- Actionable recommendations provided
|
|
|
|
**Blueprint Creation:**
|
|
- Confirmed analysis creates valid SAGBlueprint
|
|
- Attributes and values recorded correctly
|
|
- Gap analysis linked to blueprint
|
|
- Blueprint feeds into cluster formation (01C)
|
|
|
|
**Product Auto-Tagging:**
|
|
- Suggestions generated for 90%+ of products
|
|
- Confidence scores reasonable (0.6+)
|
|
- Bulk tagging applies tags correctly
|
|
- No data loss or corruption
|
|
- Existing tags not overwritten (configurable)
|
|
|
|
**API Endpoints:**
|
|
- All 4 analysis endpoints implemented
|
|
- All 3 auto-tagging endpoints implemented
|
|
- Correct HTTP status codes
|
|
- Valid error responses
|
|
- Authentication required
|
|
|
|
**Frontend Components:**
|
|
- SiteAnalysisPanel triggers analysis and shows progress
|
|
- DiscoveredAttributesReview allows attribute approval
|
|
- GapAnalysisReport displays gaps clearly
|
|
- AutoTagReviewPanel allows batch product tagging
|
|
- All components responsive and accessible
|
|
|
|
### 5.2 Non-Functional Requirements
|
|
|
|
**Performance:**
|
|
- Site analysis completes in <5 minutes for typical sites (50-500 products)
|
|
- WordPress plugin endpoint responds in <5 seconds
|
|
- API endpoints respond in <2 seconds
|
|
- Frontend components load in <3 seconds
|
|
|
|
**Reliability:**
|
|
- Plugin handles errors gracefully (missing products, etc.)
|
|
- Partial failures return partial data with warnings
|
|
- Celery tasks have retry logic
|
|
- Webhook notifications reliable
|
|
|
|
**Security:**
|
|
- API token authentication required
|
|
- User can only access own sites
|
|
- No PII in logs
|
|
- HTTPS enforced
|
|
- Input validation on all endpoints
|
|
|
|
**Scalability:**
|
|
- Plugin handles 1000+ products
|
|
- API handles 100+ concurrent analysis requests
|
|
- Database indexes optimized for queries
|
|
- Caching prevents redundant processing
|
|
|
|
**Data Quality:**
|
|
- Analysis results auditable (model used, timestamps, reasoning)
|
|
- No duplicate attribute suggestions
|
|
- Confidence scores calibrated
|
|
- Low-confidence results flagged for review
|
|
|
|
### 5.3 User Experience Requirements
|
|
|
|
**Clarity:**
|
|
- User understands analysis process and time required
|
|
- Gap analysis clearly shows what's missing
|
|
- Recommendations are actionable
|
|
- Error messages explain what went wrong
|
|
|
|
**Simplicity:**
|
|
- Workflow is 4-5 steps (analyze → review → confirm → auto-tag → apply)
|
|
- One button to trigger analysis
|
|
- Clear next steps after each stage
|
|
|
|
**Feedback:**
|
|
- Real-time progress updates during analysis
|
|
- Success/error notifications
|
|
- Ability to view raw analysis results
|
|
- Audit trail of approvals
|
|
|
|
---
|
|
|
|
## 6. Claude Code Instructions
|
|
|
|
### 6.1 Skill Development
|
|
|
|
**Skill Name:** `igny8-case1-analysis`
|
|
**Version:** 2.0
|
|
**Prerequisites:** IGNY8 platform deployed, WordPress plugin v2.0+, Celery configured
|
|
|
|
**Skill Workflow:**
|
|
|
|
```yaml
|
|
Trigger: User connects existing WordPress site to IGNY8
|
|
|
|
Step 1: Collect Site Data
|
|
- Call: POST /api/v1/sag/sites/{site_id}/analyze/
|
|
- Wait: Poll /api/v1/sag/sites/{site_id}/analysis-status/ every 10 seconds
|
|
- Timeout: 5 minutes
|
|
- Output: task_id for tracking
|
|
|
|
Step 2: Retrieve Analysis Results
|
|
- Call: GET /api/v1/sag/sites/{site_id}/analysis-results/
|
|
- Parse: extracted_attributes, gap_analysis
|
|
- Display: DiscoveredAttributesReview panel
|
|
- User action: Approve/reject attributes
|
|
|
|
Step 3: Confirm Analysis
|
|
- Call: POST /api/v1/sag/sites/{site_id}/confirm-analysis/
|
|
- Payload: approved_attributes from user review
|
|
- Output: blueprint_id
|
|
- Display: Gap analysis report
|
|
- Next: Show auto-tagging recommendations
|
|
|
|
Step 4: Generate Auto-Tag Suggestions
|
|
- Call: GET /api/v1/sag/sites/{site_id}/auto-tag/suggestions/?blueprint_id={blueprint_id}
|
|
- Display: AutoTagReviewPanel
|
|
- User action: Select products to tag
|
|
|
|
Step 5: Apply Auto-Tags
|
|
- Call: POST /api/v1/sag/sites/{site_id}/auto-tag/apply/
|
|
- Wait: Poll /api/v1/sag/sites/{site_id}/auto-tag/status/ every 5 seconds
|
|
- Timeout: 10 minutes
|
|
- Output: Number of tags applied, products tagged
|
|
|
|
Step 6: Complete & Next Steps
|
|
- Display: Success message
|
|
- Recommendations: Run cluster formation (01C), start content pipeline (01E)
|
|
- Links: View blueprint, view gap report, start cluster creation
|
|
```
|
|
|
|
### 6.2 Development Checklist
|
|
|
|
**Code Quality:**
|
|
- [ ] All functions have docstrings
|
|
- [ ] Type hints on all function parameters and returns
|
|
- [ ] Logging at DEBUG, INFO, WARNING levels as appropriate
|
|
- [ ] Error handling with specific exception types
|
|
- [ ] No hardcoded values (use config/env vars)
|
|
|
|
**Testing:**
|
|
- [ ] Unit tests for each service (>80% coverage)
|
|
- [ ] Integration tests for API endpoints
|
|
- [ ] Fixtures for sample site data
|
|
- [ ] Mock LLM responses for deterministic tests
|
|
- [ ] Performance tests for analysis (time and memory)
|
|
|
|
**Documentation:**
|
|
- [ ] Docstrings follow Google style
|
|
- [ ] README with setup instructions
|
|
- [ ] API documentation in OpenAPI format
|
|
- [ ] Example requests/responses for each endpoint
|
|
- [ ] Troubleshooting guide for common errors
|
|
|
|
**Security:**
|
|
- [ ] API token validation on all endpoints
|
|
- [ ] User ownership checks before accessing site data
|
|
- [ ] Input validation with Marshmallow
|
|
- [ ] SQL injection prevention (use ORM)
|
|
- [ ] No credentials in logs or errors
|
|
|
|
**Performance:**
|
|
- [ ] Database queries indexed
|
|
- [ ] Caching implemented for plugin endpoint
|
|
- [ ] Celery task optimization
|
|
- [ ] LLM API call batching
|
|
- [ ] Frontend component lazy loading
|
|
|
|
### 6.3 Debugging & Troubleshooting
|
|
|
|
**Common Issues:**
|
|
|
|
**Issue:** Analysis hangs or times out
|
|
- Check: Celery worker status (`celery -A sag inspect active`)
|
|
- Check: Redis/message queue status
|
|
- Check: LLM API rate limits
|
|
- Solution: Reduce product limit, retry analysis
|
|
|
|
**Issue:** Plugin endpoint returns partial data
|
|
- Check: Specific collector failure (check logs)
|
|
- Solution: Fix collector, re-run analysis (uses cache bypass)
|
|
- Note: Partial data is returned if one collector fails
|
|
|
|
**Issue:** Auto-tagging misses products
|
|
- Check: Product title/description quality (missing keywords)
|
|
- Check: Confidence threshold (lower if needed)
|
|
- Solution: Review low-confidence suggestions, adjust threshold
|
|
|
|
**Issue:** Gap analysis shows 100% gaps
|
|
- Check: Blueprint created correctly
|
|
- Check: Gap analysis query (verify site_id matches)
|
|
- Solution: Re-run analysis, confirm blueprint
|
|
|
|
### 6.4 Integration Checkpoints
|
|
|
|
**Integration with 01A (SAGBlueprint):**
|
|
- Confirmed analysis creates SAGBlueprint via POST /api/v1/sag/sites/{site_id}/confirm-analysis/
|
|
- Blueprint includes extracted attributes and values
|
|
- Blueprint links to analysis for audit trail
|
|
- Blueprint ready for cluster formation (01C)
|
|
|
|
**Integration with 01B (Sector Templates):**
|
|
- Attribute extraction uses sector template for validation (optional parameter)
|
|
- Alignment scores show how closely discovered attributes match template
|
|
- Low-confidence discoveries flagged if they don't align with template
|
|
- Template selection based on site category detection
|
|
|
|
**Integration with 01C (Cluster Formation):**
|
|
- Blueprint created from Case 1 analysis feeds into cluster formation
|
|
- Attributes and values used to create cluster hierarchies
|
|
- Cluster formation references blueprint_id for traceability
|
|
- Can override clusters if needed
|
|
|
|
**Integration with 01E (Content Pipeline):**
|
|
- Blueprint creation triggers content pipeline pre-planning
|
|
- Gap analysis informs content prioritization
|
|
- Hub page templates created for missing clusters
|
|
- Blog post outlines generated for content gaps
|
|
|
|
**Integration with 01G (Health Monitoring):**
|
|
- Analysis metrics stored for health dashboard
|
|
- Gap analysis metrics tracked over time
|
|
- Product attribute coverage tracked
|
|
- Auto-tagging success rate monitored
|
|
|
|
---
|
|
|
|
## 7. Related Documents
|
|
|
|
- **01A:** SAGBlueprint Definition — Output of Case 1 analysis
|
|
- **01B:** Sector Templates — Used for attribute validation
|
|
- **01C:** Cluster Formation — Consumes SAGBlueprint from Case 1
|
|
- **01D:** Case 2 Wizard — Alternative path for new sites
|
|
- **01E:** Content Pipeline — Feeds blueprint and gap analysis
|
|
- **01G:** Health Monitoring — Tracks analysis and enrichment metrics
|
|
|
|
---
|
|
|
|
## 8. Glossary
|
|
|
|
- **SAG:** Semantic Attribute Grid — the structured product attribute framework
|
|
- **Attribute:** A dimension of product information (e.g., "Target Area," "Device Type")
|
|
- **Attribute Value:** A specific instance of an attribute (e.g., "Foot" for Target Area)
|
|
- **Cluster:** A group of related attribute values forming a content hub
|
|
- **Gap:** Missing element compared to SAG blueprint (hub pages, term pages, blog posts, etc.)
|
|
- **Confidence Score:** AI's confidence in discovered attribute (0.0-1.0)
|
|
- **Dimension:** Priority level of attribute (Primary, Secondary, Tertiary)
|
|
- **Term Landing Page:** Single-page optimized for specific attribute value
|
|
- **Hub Page:** Authority page for entire attribute cluster
|
|
- **Auto-Tagging:** Bulk assignment of attributes to products
|
|
|
|
---
|
|
|
|
**Document Status:** Ready for Development
|
|
**Last Review:** 2026-03-23
|
|
**Next Review:** Post-Phase 2 Development
|