1619 lines
58 KiB
Markdown
1619 lines
58 KiB
Markdown
# IGNY8 Phase 1: Cluster Formation & Keyword Engine (Doc 01C)
|
||
|
||
**Document Version:** 1.0
|
||
**Date:** 2026-03-23
|
||
**Phase:** Phase 1 - Foundation & Intelligence
|
||
**Status:** Build Ready
|
||
|
||
---
|
||
|
||
## 1. Current State
|
||
|
||
### Existing Components
|
||
- **SAGBlueprint** (01A): Data model with status tracking, blueprint lifecycle management
|
||
- **SAGAttribute** & **SAGCluster** models (01A): Schema definitions for attributes and topic clusters
|
||
- **SectorAttributeTemplate** (01B): Pre-configured attribute framework with keyword templates per site_type
|
||
- **Setup Wizard** (01D): Collects sector, site_type, and populated attribute values from user
|
||
- **Blueprint Service** (01G - earlier iteration): Basic blueprint assembly, denormalization
|
||
|
||
### Current Limitations
|
||
- No automated cluster formation from attribute intersection logic
|
||
- No keyword generation from templates
|
||
- No conflict resolution for multi-cluster keyword assignments
|
||
- No cluster type classification (product, condition, feature, etc.)
|
||
- No validation of cluster viability (size, coherence, user demand)
|
||
- No hub title and supporting content plan generation
|
||
|
||
### Dependencies Ready
|
||
- ✅ Sector attribute templates loaded with keyword templates
|
||
- ✅ Setup wizard populates attributes
|
||
- ✅ Data models support cluster and keyword storage
|
||
- ✅ Blueprint lifecycle framework exists
|
||
|
||
---
|
||
|
||
## 2. What to Build
|
||
|
||
### 2.1 Cluster Formation AI Function
|
||
**File:** `sag/ai_functions/cluster_formation.py`
|
||
**Register Key:** `'form_clusters'`
|
||
**Triggering Context:** After user populates attributes in setup wizard; before keyword assignment
|
||
|
||
#### Input Contract
|
||
```python
|
||
{
|
||
"populated_attributes": [
|
||
{"name": "Target Audience", "values": ["Pet Owners", "Veterinarians"]},
|
||
{"name": "Pet Type", "values": ["Dogs", "Cats"]},
|
||
{"name": "Health Condition", "values": ["Allergies", "Arthritis", "Obesity"]}
|
||
],
|
||
"sector_context": {
|
||
"sector_id": str,
|
||
"site_type": "ecommerce|saas|blog|local_service",
|
||
"sector_name": str
|
||
},
|
||
"constraints": {
|
||
"max_clusters": 50, # hard cap per sector
|
||
"min_keywords_per_cluster": 5,
|
||
"max_keywords_per_cluster": 20,
|
||
"optimal_keywords_per_cluster": 7-15
|
||
}
|
||
}
|
||
```
|
||
|
||
#### Output Contract
|
||
```python
|
||
{
|
||
"clusters": [
|
||
{
|
||
"id": "cluster_001",
|
||
"title": "Dog Arthritis Relief Solutions",
|
||
"type": "product_category", # or condition_problem, feature, brand, informational, comparison
|
||
"dimensions": {
|
||
"primary": ["Pet Type: Dogs", "Health Condition: Arthritis"],
|
||
"secondary": ["Target Audience: Pet Owners"]
|
||
},
|
||
"intersection_depth": 3, # count of dimensional intersections
|
||
"viability_score": 0.92, # 0-1 based on coherence + demand assessment
|
||
"hub_title": "Best Arthritis Treatments for Dogs",
|
||
"supporting_content_plan": [
|
||
"Senior Dog Arthritis: Causes & Prevention",
|
||
"Dog Arthritis Medications: Complete Guide",
|
||
"Physical Therapy Exercises for Dogs with Arthritis",
|
||
"Diet Changes to Support Joint Health",
|
||
"When to See a Vet About Dog Joint Pain"
|
||
],
|
||
"keywords": [], # populated in keyword generation phase
|
||
"dimension_count": 3,
|
||
"validation": {
|
||
"is_real_topical_ecosystem": true,
|
||
"has_search_demand": true,
|
||
"can_support_content_plan": true,
|
||
"sufficient_differentiation": true
|
||
}
|
||
},
|
||
// ... more clusters
|
||
],
|
||
"summary": {
|
||
"total_clusters_formed": 12,
|
||
"type_distribution": {
|
||
"product_category": 6,
|
||
"condition_problem": 4,
|
||
"feature": 1,
|
||
"brand": 0,
|
||
"informational": 1,
|
||
"comparison": 0
|
||
},
|
||
"avg_intersection_depth": 2.3,
|
||
"clusters_below_viability_threshold": 0
|
||
}
|
||
}
|
||
```
|
||
|
||
#### Algorithm (Pseudocode)
|
||
|
||
```
|
||
FUNCTION form_clusters(populated_attributes, sector_context):
|
||
|
||
# STEP 1: Generate all 2-value intersections
|
||
all_intersections = []
|
||
for each attribute_pair in populated_attributes:
|
||
for value1 in attribute_pair[0].values:
|
||
for value2 in attribute_pair[1].values:
|
||
intersection = {
|
||
"dimensions": [value1, value2],
|
||
"attribute_names": [attribute_pair[0].name, attribute_pair[1].name]
|
||
}
|
||
all_intersections.append(intersection)
|
||
|
||
# Also generate 3-value intersections for strong coherence
|
||
for attribute_triplet in populated_attributes (size=3):
|
||
for value1 in attribute_triplet[0].values:
|
||
for value2 in attribute_triplet[1].values:
|
||
for value3 in attribute_triplet[2].values:
|
||
intersection = {
|
||
"dimensions": [value1, value2, value3],
|
||
"attribute_names": [name[0], name[1], name[2]]
|
||
}
|
||
all_intersections.append(intersection)
|
||
|
||
# STEP 2: AI evaluates each intersection
|
||
valid_clusters = []
|
||
for intersection in all_intersections:
|
||
evaluation = AI_EVALUATE_INTERSECTION(intersection, sector_context):
|
||
- Is this a real topical ecosystem?
|
||
- Would users search for this combination?
|
||
- Can we build a hub + 3-10 supporting articles?
|
||
- Is there sufficient differentiation from other clusters?
|
||
- Does the combination make semantic sense?
|
||
|
||
if evaluation.is_valid:
|
||
# STEP 3: Classify cluster type
|
||
cluster_type = AI_CLASSIFY_TYPE(intersection)
|
||
→ product_category, condition_problem, feature, brand,
|
||
informational, comparison
|
||
|
||
# STEP 4: Generate hub title + supporting content plan
|
||
hub_title = AI_GENERATE_HUB_TITLE(intersection, sector_context)
|
||
supporting_titles = AI_GENERATE_SUPPORTING_TITLES(
|
||
hub_title,
|
||
intersection,
|
||
count=5-8
|
||
)
|
||
|
||
# Create cluster object
|
||
cluster = {
|
||
"dimensions": intersection.dimensions,
|
||
"type": cluster_type,
|
||
"viability_score": evaluation.confidence_score,
|
||
"hub_title": hub_title,
|
||
"supporting_content_plan": supporting_titles,
|
||
"validation": evaluation
|
||
}
|
||
valid_clusters.append(cluster)
|
||
|
||
# STEP 4: Apply constraints & filtering
|
||
sorted_clusters = SORT_BY_VIABILITY_SCORE(valid_clusters)
|
||
final_clusters = sorted_clusters[0:max_clusters]
|
||
|
||
# STEP 5: Validate distribution & completeness
|
||
distribution = CALCULATE_TYPE_DISTRIBUTION(final_clusters)
|
||
|
||
# Flag if any type is severely under-represented
|
||
if distribution.imbalance > THRESHOLD:
|
||
LOG_WARNING("Type distribution may be suboptimal")
|
||
|
||
# STEP 6: Return with summary
|
||
return {
|
||
"clusters": final_clusters,
|
||
"summary": {
|
||
"total_clusters": len(final_clusters),
|
||
"type_distribution": distribution,
|
||
"viability_threshold_met": all clusters have score >= 0.70
|
||
}
|
||
}
|
||
|
||
END FUNCTION
|
||
```
|
||
|
||
#### AI Evaluation Criteria
|
||
For each intersection, the AI must answer:
|
||
|
||
1. **Real Topical Ecosystem?**
|
||
- Do the dimensions naturally connect in user intent?
|
||
- Is there an existing product/service/solution category?
|
||
- Example: YES - "Dog Arthritis Relief" (real problem + real solutions)
|
||
- Example: NO - "Vegetarian Chainsaw" (nonsensical combination)
|
||
|
||
2. **User Search Demand?**
|
||
- Would users actively search for this combination?
|
||
- Check: keyword templates, search volume patterns, user forums
|
||
- Target: ≥500 monthly searches for hub keyword
|
||
|
||
3. **Content Support?**
|
||
- Can we create 1 hub + 3-10 supporting articles?
|
||
- Is there enough subtopic depth?
|
||
- Example: YES - "Dog Arthritis" can have medication, exercise, diet, vet visits
|
||
- Example: NO - "Red Dog Collar" (too niche, limited subtopics)
|
||
|
||
4. **Sufficient Differentiation?**
|
||
- Does this cluster stand apart from others?
|
||
- Avoid near-duplicate clusters (e.g., "Dog Joint Health" vs "Dog Arthritis")
|
||
- Decision: merge or reject the weaker one
|
||
|
||
5. **Dimensional Clarity**
|
||
- Do all dimensions contribute meaningfully?
|
||
- Remove secondary dimensions that don't add coherence
|
||
|
||
#### Hard Constraints
|
||
- **Maximum Clusters:** 50 per sector (enforce in sorting/filtering)
|
||
- **Minimum Keywords per Cluster:** 5 (checked in keyword generation)
|
||
- **Maximum Keywords per Cluster:** 20 (checked in keyword generation)
|
||
- **Optimal Range:** 7-15 keywords per cluster
|
||
- **No Keyword Duplication:** Each keyword in exactly one cluster (enforced in conflict resolution)
|
||
- **Type Distribution Target:**
|
||
- Product/Service Type: 40-50%
|
||
- Condition/Problem: 20-30%
|
||
- Feature: 10-15%
|
||
- Brand: 5-10%
|
||
- Life Stage/Audience: 5-10%
|
||
|
||
---
|
||
|
||
### 2.2 Keyword Auto-Generation AI Function
|
||
**File:** `sag/ai_functions/keyword_generation.py`
|
||
**Register Key:** `'generate_keywords'`
|
||
**Triggering Context:** After cluster formation; before blueprint assembly
|
||
|
||
#### Input Contract
|
||
```python
|
||
{
|
||
"clusters": [ # output from cluster_formation
|
||
{
|
||
"id": "cluster_001",
|
||
"dimensions": ["Pet Type: Dogs", "Health Condition: Arthritis"],
|
||
"hub_title": "Best Arthritis Treatments for Dogs",
|
||
"supporting_content_plan": [...]
|
||
}
|
||
],
|
||
"sector_context": {
|
||
"sector_id": str,
|
||
"site_type": "ecommerce|saas|blog|local_service",
|
||
"site_intent": "sell|inform|book|download"
|
||
},
|
||
"keyword_templates": { # loaded from SectorAttributeTemplate
|
||
"template_001": "best {health_condition} for {pet_type}",
|
||
"template_002": "{pet_type} {health_condition} treatment",
|
||
// ... more templates
|
||
},
|
||
"constraints": {
|
||
"min_keywords_per_cluster": 10,
|
||
"max_keywords_per_cluster": 25,
|
||
"total_target": "300-500"
|
||
}
|
||
}
|
||
```
|
||
|
||
#### Output Contract
|
||
```python
|
||
{
|
||
"keywords_per_cluster": {
|
||
"cluster_001": {
|
||
"keywords": [
|
||
{
|
||
"keyword": "best arthritis treatment for dogs",
|
||
"search_volume": 1200,
|
||
"difficulty": "medium",
|
||
"intent": "informational",
|
||
"generated_from": "template_001",
|
||
"variant_type": "long_tail"
|
||
},
|
||
{
|
||
"keyword": "dog arthritis remedies",
|
||
"search_volume": 800,
|
||
"difficulty": "easy",
|
||
"intent": "informational",
|
||
"generated_from": "template_002",
|
||
"variant_type": "base"
|
||
},
|
||
// ... 13-23 more keywords
|
||
],
|
||
"keyword_count": 15,
|
||
"primary_intent": "informational",
|
||
"search_volume_total": 12500
|
||
}
|
||
},
|
||
"deduplication": {
|
||
"duplicates_removed": 8,
|
||
"flagged_conflicts": 3 # keywords fitting multiple clusters
|
||
},
|
||
"summary": {
|
||
"total_unique_keywords": 342,
|
||
"per_cluster_avg": 14.25,
|
||
"total_search_volume": 892000,
|
||
"within_constraints": true
|
||
}
|
||
}
|
||
```
|
||
|
||
#### Algorithm (Pseudocode)
|
||
|
||
```
|
||
FUNCTION generate_keywords(clusters, sector_context, keyword_templates):
|
||
|
||
all_keywords = {}
|
||
|
||
FOR EACH cluster IN clusters:
|
||
|
||
# STEP 1: Extract attribute values from cluster dimensions
|
||
attribute_values = EXTRACT_ATTRIBUTE_VALUES(cluster.dimensions)
|
||
# Output: {"Pet Type": "Dogs", "Health Condition": "Arthritis", ...}
|
||
|
||
cluster_keywords = []
|
||
|
||
# STEP 2: Substitute values into templates
|
||
FOR EACH template IN keyword_templates:
|
||
|
||
# Check if template requires all attribute values present
|
||
required_attrs = PARSE_TEMPLATE_VARIABLES(template)
|
||
if ALL_ATTRS_AVAILABLE(required_attrs, attribute_values):
|
||
|
||
# Substitute values
|
||
base_keyword = SUBSTITUTE_VALUES(template, attribute_values)
|
||
cluster_keywords.append({
|
||
"keyword": base_keyword,
|
||
"generated_from": template.id,
|
||
"variant_type": "base"
|
||
})
|
||
|
||
# STEP 3: Generate long-tail variants
|
||
long_tail_variants = []
|
||
|
||
FOR EACH base_keyword IN cluster_keywords:
|
||
|
||
# "best arthritis treatment for dogs"
|
||
variants = []
|
||
|
||
# Variant: Add "best"
|
||
variants.append("best " + base_keyword)
|
||
|
||
# Variant: Add "review"
|
||
variants.append(base_keyword + " review")
|
||
|
||
# Variant: Add "vs" (comparison)
|
||
if CLUSTER_TYPE in [product_category, comparison]:
|
||
variants.append(base_keyword + " vs alternatives")
|
||
|
||
# Variant: Add "for" (audience)
|
||
variants.append(base_keyword + " for seniors")
|
||
|
||
# Variant: Add "how to"
|
||
variants.append("how to " + base_keyword)
|
||
|
||
# Variant: Add "cost" (ecommerce intent)
|
||
if site_intent == "sell":
|
||
variants.append(base_keyword + " cost")
|
||
|
||
FOR EACH variant IN variants:
|
||
if NOT_DUPLICATE(variant, cluster_keywords):
|
||
cluster_keywords.append({
|
||
"keyword": variant,
|
||
"variant_type": "long_tail",
|
||
"parent": base_keyword
|
||
})
|
||
|
||
# STEP 4: Enrich keywords with metadata
|
||
enriched_keywords = []
|
||
FOR EACH kw IN cluster_keywords:
|
||
enriched = {
|
||
"keyword": kw.keyword,
|
||
"search_volume": ESTIMATE_SEARCH_VOLUME(kw.keyword, sector),
|
||
"difficulty": ESTIMATE_DIFFICULTY(kw.keyword, sector),
|
||
"intent": CLASSIFY_INTENT(kw.keyword), # informational, transactional, navigational
|
||
"generated_from": kw.generated_from,
|
||
"variant_type": kw.variant_type
|
||
}
|
||
enriched_keywords.append(enriched)
|
||
|
||
# STEP 5: Filter & sort
|
||
filtered_keywords = SORT_BY_SEARCH_VOLUME(enriched_keywords)
|
||
|
||
# Keep top 10-25 per cluster
|
||
cluster_keywords_final = filtered_keywords[0:25]
|
||
|
||
# Validate minimum
|
||
if LEN(cluster_keywords_final) < 10:
|
||
ADD_SUPPLEMENTARY_KEYWORDS(cluster_keywords_final, 5)
|
||
|
||
all_keywords[cluster.id] = {
|
||
"keywords": cluster_keywords_final,
|
||
"keyword_count": len(cluster_keywords_final),
|
||
"primary_intent": MODE(intent from all keywords),
|
||
"search_volume_total": SUM(all search volumes)
|
||
}
|
||
|
||
# STEP 6: Global deduplication
|
||
all_keywords_flat = FLATTEN(all_keywords)
|
||
duplicates = FIND_DUPLICATES(all_keywords_flat)
|
||
|
||
FOR EACH duplicate_set IN duplicates:
|
||
primary_cluster = PRIMARY_CLUSTER(duplicate_set) # best fit by dimensions
|
||
REASSIGN_DUPLICATES_TO_PRIMARY(duplicate_set, primary_cluster)
|
||
|
||
# STEP 7: Validate constraints
|
||
total_keywords = SUM(keyword_count for each cluster)
|
||
|
||
validation = {
|
||
"within_min_per_cluster": all clusters >= 10,
|
||
"within_max_per_cluster": all clusters <= 25,
|
||
"total_within_target": total_keywords between 300-500,
|
||
"no_duplicates": len(duplicates) == 0
|
||
}
|
||
|
||
if NOT validation.all_true:
|
||
LOG_WARNING("Keyword generation constraints not fully met")
|
||
|
||
# STEP 8: Return results
|
||
return {
|
||
"keywords_per_cluster": all_keywords,
|
||
"deduplication": {
|
||
"duplicates_removed": len(duplicates),
|
||
"flagged_conflicts": identify_multi_cluster_fits()
|
||
},
|
||
"summary": {
|
||
"total_unique_keywords": total_keywords,
|
||
"per_cluster_avg": total_keywords / len(clusters),
|
||
"total_search_volume": sum of all volumes,
|
||
"within_constraints": validation.all_true
|
||
}
|
||
}
|
||
|
||
END FUNCTION
|
||
```
|
||
|
||
#### Keyword Template Structure (from SectorAttributeTemplate, 01B)
|
||
```python
|
||
# Example for Pet Health ecommerce site
|
||
keyword_templates = {
|
||
"site_type": "ecommerce",
|
||
"templates": [
|
||
{
|
||
"id": "template_001",
|
||
"pattern": "best {health_condition} treatment for {pet_type}",
|
||
"weight": 5, # prioritize this template
|
||
"min_required_attrs": ["health_condition", "pet_type"]
|
||
},
|
||
{
|
||
"id": "template_002",
|
||
"pattern": "{pet_type} {health_condition} medication",
|
||
"weight": 4,
|
||
"min_required_attrs": ["pet_type", "health_condition"]
|
||
},
|
||
{
|
||
"id": "template_003",
|
||
"pattern": "affordable {health_condition} relief for {pet_type}",
|
||
"weight": 3,
|
||
"min_required_attrs": ["health_condition", "pet_type"]
|
||
},
|
||
// ... more templates
|
||
]
|
||
}
|
||
```
|
||
|
||
#### Long-tail Variant Rules
|
||
|
||
| Variant Type | Pattern | Use Case | Example |
|
||
|---|---|---|---|
|
||
| Base | {keyword} | All clusters | "dog arthritis relief" |
|
||
| Best/Top | best {keyword} | All clusters | "best dog arthritis relief" |
|
||
| Review | {keyword} review | Product clusters | "arthritis supplement for dogs review" |
|
||
| Comparison | {keyword} vs | Comparison intent | "arthritis medication vs supplement for dogs" |
|
||
| Audience | {keyword} for {audience} | Audience-specific | "dog arthritis relief for senior dogs" |
|
||
| How-to | how to {verb} {keyword} | Problem-solution | "how to manage dog arthritis" |
|
||
| Cost/Price | {keyword} cost | Ecommerce intent | "arthritis treatment for dogs cost" |
|
||
| Quick | {keyword} fast | Urgency-driven | "fast arthritis relief for dogs" |
|
||
|
||
---
|
||
|
||
### 2.3 Blueprint Assembly Service
|
||
**File:** `sag/services/blueprint_service.py`
|
||
**Primary Function:** `assemble_blueprint(site, attributes, clusters, keywords)`
|
||
**Triggering Context:** After keyword generation; creates SAGBlueprint (status=draft)
|
||
|
||
#### Input Contract
|
||
```python
|
||
assemble_blueprint(
|
||
site: Website, # from 01A
|
||
attributes: List[Tuple[name, values]], # user-populated
|
||
clusters: List[Dict], # from cluster_formation()
|
||
keywords: Dict[cluster_id, List[Dict]] # from generate_keywords()
|
||
)
|
||
```
|
||
|
||
#### Execution Steps
|
||
|
||
1. **Create SAGBlueprint Record**
|
||
```python
|
||
blueprint = SAGBlueprint.objects.create(
|
||
site=site,
|
||
status='draft',
|
||
phase='phase_1_foundation',
|
||
sector_id=site.sector_id,
|
||
created_by=current_user,
|
||
metadata={
|
||
'version': '1.0',
|
||
'created_date': now(),
|
||
'last_modified': now()
|
||
}
|
||
)
|
||
```
|
||
|
||
2. **Create SAGAttribute Records**
|
||
```python
|
||
FOR EACH (attribute_name, values) IN attributes:
|
||
attribute = SAGAttribute.objects.create(
|
||
blueprint=blueprint,
|
||
name=attribute_name,
|
||
values=values, # stored as JSON array
|
||
is_primary=DETERMINE_PRIMACY(attribute_name, site.site_type),
|
||
source='user_input'
|
||
)
|
||
```
|
||
|
||
3. **Create SAGCluster Records from Formed Clusters**
|
||
```python
|
||
FOR EACH cluster IN clusters:
|
||
db_cluster = SAGCluster.objects.create(
|
||
blueprint=blueprint,
|
||
cluster_key=cluster['id'],
|
||
title=cluster['hub_title'],
|
||
description=GENERATE_CLUSTER_DESC(cluster),
|
||
cluster_type=cluster['type'],
|
||
dimensions=cluster['dimensions'], # JSON
|
||
intersection_depth=cluster['intersection_depth'],
|
||
viability_score=cluster['viability_score'],
|
||
hub_title=cluster['hub_title'],
|
||
supporting_content_plan=cluster['supporting_content_plan'], # JSON array
|
||
status='draft',
|
||
keyword_count=0 # updated in next step
|
||
)
|
||
```
|
||
|
||
4. **Populate auto_generated_keywords on Each Cluster**
|
||
```python
|
||
FOR EACH (cluster_id, keyword_list) IN keywords.items():
|
||
cluster = SAGCluster.objects.get(cluster_key=cluster_id)
|
||
|
||
keyword_records = []
|
||
FOR EACH kw_data IN keyword_list:
|
||
keyword = SAGKeyword.objects.create(
|
||
cluster=cluster,
|
||
keyword_text=kw_data['keyword'],
|
||
search_volume=kw_data['search_volume'],
|
||
difficulty=kw_data['difficulty'],
|
||
intent=kw_data['intent'],
|
||
generated_from=kw_data['generated_from'],
|
||
variant_type=kw_data['variant_type'],
|
||
source='auto_generated'
|
||
)
|
||
keyword_records.append(keyword)
|
||
|
||
cluster.auto_generated_keywords.set(keyword_records)
|
||
cluster.keyword_count = len(keyword_records)
|
||
cluster.save()
|
||
```
|
||
|
||
5. **Generate Taxonomy Plan**
|
||
```python
|
||
taxonomy_plan = {
|
||
'wp_categories': [],
|
||
'wp_tags': [],
|
||
'hierarchy': {}
|
||
}
|
||
|
||
FOR EACH attribute IN blueprint.sagattribute_set.all():
|
||
if attribute.is_primary:
|
||
category = {
|
||
'name': attribute.name,
|
||
'slug': slugify(attribute.name),
|
||
'description': f"Posts about {attribute.name}"
|
||
}
|
||
taxonomy_plan['wp_categories'].append(category)
|
||
else:
|
||
tag = {
|
||
'name': v,
|
||
'slug': slugify(v),
|
||
'parent_category': primary_attr_name
|
||
}
|
||
FOR EACH v IN attribute.values:
|
||
taxonomy_plan['wp_tags'].append(tag)
|
||
|
||
blueprint.taxonomy_plan = taxonomy_plan # JSON field
|
||
```
|
||
|
||
6. **Generate Execution Priority (Phased Approach)**
|
||
```python
|
||
execution_priority = {
|
||
'phase': 'phase_1_hubs',
|
||
'content_sequence': []
|
||
}
|
||
|
||
# Phase 1: Hub pages (1 per cluster)
|
||
hub_items = []
|
||
FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'):
|
||
hub_items.append({
|
||
'type': 'hub_page',
|
||
'cluster_id': cluster.id,
|
||
'title': cluster.hub_title,
|
||
'priority': 1,
|
||
'estimated_effort': 'high',
|
||
'SEO_impact': 'critical'
|
||
})
|
||
|
||
execution_priority['content_sequence'].extend(hub_items)
|
||
|
||
# Phase 2: Supporting content (5-8 articles per cluster)
|
||
supporting_items = []
|
||
FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'):
|
||
FOR EACH content_title IN cluster.supporting_content_plan:
|
||
supporting_items.append({
|
||
'type': 'supporting_article',
|
||
'cluster_id': cluster.id,
|
||
'parent_hub': cluster.hub_title,
|
||
'title': content_title,
|
||
'priority': 2,
|
||
'estimated_effort': 'medium',
|
||
'SEO_impact': 'supporting'
|
||
})
|
||
|
||
execution_priority['content_sequence'].extend(supporting_items)
|
||
|
||
# Phase 3: Term/pillar pages (keywords + long-tail)
|
||
term_items = []
|
||
FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'):
|
||
FOR EACH keyword IN cluster.auto_generated_keywords.all():
|
||
term_items.append({
|
||
'type': 'term_page',
|
||
'cluster_id': cluster.id,
|
||
'keyword': keyword.keyword_text,
|
||
'priority': 3,
|
||
'estimated_effort': 'low',
|
||
'SEO_impact': 'supportive'
|
||
})
|
||
|
||
execution_priority['content_sequence'].extend(term_items)
|
||
|
||
blueprint.execution_priority = execution_priority # JSON field
|
||
```
|
||
|
||
7. **Populate Denormalized JSON Fields**
|
||
```python
|
||
blueprint.attributes_json = {
|
||
'total_attributes': blueprint.sagattribute_set.count(),
|
||
'summary': [
|
||
{
|
||
'name': attr.name,
|
||
'value_count': len(attr.values),
|
||
'values': attr.values,
|
||
'is_primary': attr.is_primary
|
||
}
|
||
FOR EACH attr IN blueprint.sagattribute_set.all()
|
||
]
|
||
}
|
||
|
||
blueprint.clusters_json = {
|
||
'total_clusters': blueprint.sagcluster_set.count(),
|
||
'summary': [
|
||
{
|
||
'id': cluster.cluster_key,
|
||
'title': cluster.title,
|
||
'type': cluster.cluster_type,
|
||
'keyword_count': cluster.keyword_count,
|
||
'viability_score': cluster.viability_score
|
||
}
|
||
FOR EACH cluster IN blueprint.sagcluster_set.all()
|
||
]
|
||
}
|
||
|
||
blueprint.save()
|
||
```
|
||
|
||
8. **Return Blueprint ID & Status**
|
||
```python
|
||
return {
|
||
'blueprint_id': blueprint.id,
|
||
'status': 'draft',
|
||
'created_at': blueprint.created_at,
|
||
'summary': {
|
||
'total_attributes': blueprint.sagattribute_set.count(),
|
||
'total_clusters': blueprint.sagcluster_set.count(),
|
||
'total_keywords': SAGKeyword.objects.filter(cluster__blueprint=blueprint).count(),
|
||
'next_step': 'review blueprint in 01E (Pipeline Configuration)'
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 2.4 Manual Keyword Supplementation (User Interface)
|
||
|
||
#### Feature: Add Keywords from Multiple Sources
|
||
|
||
1. **IGNY8 Library Integration**
|
||
- Users browse pre-curated keyword library per site_type
|
||
- Select keywords → auto-map to clusters by attribute match
|
||
- Unmatched keywords → flagged for review
|
||
|
||
2. **Manual Entry**
|
||
- Form field: paste or type keywords (comma-separated)
|
||
- System deduplicates against existing
|
||
- Prompts user to assign to cluster(s)
|
||
|
||
3. **CSV Import**
|
||
- Upload CSV with columns: keyword, search_volume (optional), difficulty (optional)
|
||
- Preview & validate before import
|
||
- Bulk assign to clusters or mark for review
|
||
|
||
4. **Keyword API Integration** (optional in Phase 1)
|
||
- Connect to SEMrush, Ahrefs, or similar
|
||
- Fetch keyword suggestions for cluster dimensions
|
||
- User approves additions
|
||
|
||
#### Keyword Mapping Logic
|
||
```python
|
||
FUNCTION map_keyword_to_clusters(new_keyword, clusters, threshold=0.70):
|
||
|
||
matches = []
|
||
|
||
FOR EACH cluster IN clusters:
|
||
|
||
# Extract all attribute values from cluster dimensions
|
||
cluster_attrs = EXTRACT_ATTRIBUTES(cluster.dimensions)
|
||
|
||
# Calculate semantic similarity
|
||
similarity = CALCULATE_SIMILARITY(new_keyword, cluster_attrs)
|
||
|
||
if similarity > threshold:
|
||
matches.append({
|
||
'cluster_id': cluster.id,
|
||
'cluster_title': cluster.title,
|
||
'similarity_score': similarity
|
||
})
|
||
|
||
return matches # May be 0, 1, or multiple matches
|
||
|
||
END FUNCTION
|
||
```
|
||
|
||
#### Conflict Resolution: Multi-Cluster Keyword Assignment
|
||
|
||
**Problem:** A keyword fits multiple clusters (e.g., "arthritis relief for pets" fits both Dog Cluster and Cat Cluster)
|
||
|
||
**Resolution Algorithm:**
|
||
|
||
1. **Identify Multi-Fit Keywords**
|
||
```python
|
||
potential_conflicts = []
|
||
FOR EACH new_keyword IN keywords_to_add:
|
||
matching_clusters = map_keyword_to_clusters(new_keyword, all_clusters)
|
||
if len(matching_clusters) > 1:
|
||
potential_conflicts.append({
|
||
'keyword': new_keyword,
|
||
'matching_clusters': matching_clusters
|
||
})
|
||
```
|
||
|
||
2. **Apply Decision Criteria (in order)**
|
||
- **Criterion 1: Dimensional Intersection Count**
|
||
- Assign to cluster with MOST dimensional intersections
|
||
- Example: "dog arthritis relief" → Dog cluster has 3 dimensions (pet type, condition, audience); Cat cluster has 2 → assign to Dog cluster
|
||
|
||
- **Criterion 2: Specificity**
|
||
- If tied on intersection count, assign to MORE SPECIFIC cluster
|
||
- Example: "arthritis relief" (general) vs "dog arthritis relief" (specific) → assign to Dog cluster
|
||
|
||
- **Criterion 3: Primary User Intent Match**
|
||
- If still tied, assign to cluster whose hub_title best matches user intent
|
||
- Example: Both Dog & Cat clusters have "arthritis relief" hub; Dog hub is "Best Arthritis Treatments for Dogs" → assign to Dog
|
||
|
||
- **Criterion 4: Last Resort - Create New Cluster**
|
||
- If keyword doesn't fit any cluster well, flag as "potential_new_cluster"
|
||
- User reviews and decides: split existing cluster, merge, or create new
|
||
|
||
3. **Implementation**
|
||
```python
|
||
FUNCTION resolve_keyword_conflict(keyword, matching_clusters):
|
||
|
||
# Step 1: Compare intersection depth
|
||
sorted_by_depth = SORT_BY(matching_clusters, 'intersection_depth', DESC)
|
||
best_by_depth = sorted_by_depth[0]
|
||
|
||
if sorted_by_depth[0].intersection_depth > sorted_by_depth[1].intersection_depth:
|
||
return best_by_depth
|
||
|
||
# Step 2: Compare specificity
|
||
specificity_scores = [CALC_SPECIFICITY(cluster, keyword) for cluster in sorted_by_depth]
|
||
best_by_specificity = sorted_by_depth[ARGMAX(specificity_scores)]
|
||
|
||
if specificity_scores[0] > specificity_scores[1]:
|
||
return best_by_specificity
|
||
|
||
# Step 3: Compare intent match
|
||
intent_scores = [CALC_INTENT_MATCH(cluster.hub_title, keyword) for cluster in sorted_by_depth]
|
||
best_by_intent = sorted_by_depth[ARGMAX(intent_scores)]
|
||
|
||
if intent_scores[0] > intent_scores[1]:
|
||
return best_by_intent
|
||
|
||
# Step 4: Flag for user review
|
||
return {
|
||
'status': 'flagged_for_review',
|
||
'keyword': keyword,
|
||
'candidates': matching_clusters,
|
||
'reason': 'ambiguous_assignment'
|
||
}
|
||
|
||
END FUNCTION
|
||
```
|
||
|
||
---
|
||
|
||
## 3. Data Models / APIs
|
||
|
||
### 3.1 Database Models (Django ORM)
|
||
|
||
#### SAGBlueprint (existing from 01A, extended)
|
||
```python
|
||
class SAGBlueprint(models.Model):
|
||
STATUS_CHOICES = (
|
||
('draft', 'Draft'),
|
||
('cluster_formation_complete', 'Cluster Formation Complete'),
|
||
('keyword_generation_complete', 'Keyword Generation Complete'),
|
||
('keyword_supplemented', 'Keywords Supplemented'),
|
||
('ready_for_pipeline', 'Ready for Pipeline'),
|
||
('published', 'Published'),
|
||
)
|
||
|
||
site = models.ForeignKey(Website, on_delete=models.CASCADE)
|
||
status = models.CharField(max_length=50, choices=STATUS_CHOICES, default='draft')
|
||
phase = models.CharField(max_length=50, default='phase_1_foundation')
|
||
sector_id = models.CharField(max_length=100)
|
||
|
||
# Denormalized JSON for fast access
|
||
attributes_json = models.JSONField(default=dict, blank=True)
|
||
clusters_json = models.JSONField(default=dict, blank=True)
|
||
taxonomy_plan = models.JSONField(default=dict, blank=True)
|
||
execution_priority = models.JSONField(default=dict, blank=True)
|
||
|
||
created_by = models.ForeignKey(User, on_delete=models.SET_NULL, null=True)
|
||
created_at = models.DateTimeField(auto_now_add=True)
|
||
updated_at = models.DateTimeField(auto_now=True)
|
||
|
||
class Meta:
|
||
db_table = 'sag_blueprint'
|
||
ordering = ['-created_at']
|
||
```
|
||
|
||
#### SAGAttribute (existing from 01A, no changes required)
|
||
```python
|
||
class SAGAttribute(models.Model):
|
||
blueprint = models.ForeignKey(SAGBlueprint, on_delete=models.CASCADE)
|
||
name = models.CharField(max_length=255)
|
||
values = models.JSONField() # array of strings
|
||
is_primary = models.BooleanField(default=False)
|
||
source = models.CharField(max_length=50) # 'user_input', 'template', 'api'
|
||
created_at = models.DateTimeField(auto_now_add=True)
|
||
|
||
class Meta:
|
||
db_table = 'sag_attribute'
|
||
unique_together = ('blueprint', 'name')
|
||
```
|
||
|
||
#### SAGCluster (existing from 01A, extended)
|
||
```python
|
||
class SAGCluster(models.Model):
|
||
TYPE_CHOICES = (
|
||
('product_category', 'Product/Service Category'),
|
||
('condition_problem', 'Condition/Problem'),
|
||
('feature', 'Feature'),
|
||
('brand', 'Brand'),
|
||
('informational', 'Informational'),
|
||
('comparison', 'Comparison'),
|
||
('life_stage', 'Life Stage/Audience'),
|
||
)
|
||
|
||
STATUS_CHOICES = (
|
||
('draft', 'Draft'),
|
||
('validated', 'Validated'),
|
||
('keyword_assigned', 'Keywords Assigned'),
|
||
('content_created', 'Content Created'),
|
||
)
|
||
|
||
blueprint = models.ForeignKey(SAGBlueprint, on_delete=models.CASCADE)
|
||
cluster_key = models.CharField(max_length=100) # unique ID from cluster formation
|
||
title = models.CharField(max_length=255)
|
||
description = models.TextField(blank=True)
|
||
|
||
cluster_type = models.CharField(max_length=50, choices=TYPE_CHOICES)
|
||
dimensions = models.JSONField() # ["dimension1", "dimension2", ...]
|
||
intersection_depth = models.IntegerField() # count of intersecting dimensions
|
||
viability_score = models.FloatField() # 0-1
|
||
|
||
hub_title = models.CharField(max_length=255)
|
||
supporting_content_plan = models.JSONField() # array of content titles
|
||
|
||
auto_generated_keywords = models.ManyToManyField(
|
||
'SAGKeyword',
|
||
related_name='clusters_auto',
|
||
blank=True
|
||
)
|
||
supplemented_keywords = models.ManyToManyField(
|
||
'SAGKeyword',
|
||
related_name='clusters_supplemented',
|
||
blank=True
|
||
)
|
||
|
||
keyword_count = models.IntegerField(default=0)
|
||
status = models.CharField(max_length=50, choices=STATUS_CHOICES, default='draft')
|
||
created_at = models.DateTimeField(auto_now_add=True)
|
||
updated_at = models.DateTimeField(auto_now=True)
|
||
|
||
class Meta:
|
||
db_table = 'sag_cluster'
|
||
unique_together = ('blueprint', 'cluster_key')
|
||
ordering = ['-viability_score']
|
||
```
|
||
|
||
#### SAGKeyword (new)
|
||
```python
|
||
class SAGKeyword(models.Model):
|
||
INTENT_CHOICES = (
|
||
('informational', 'Informational'),
|
||
('transactional', 'Transactional'),
|
||
('navigational', 'Navigational'),
|
||
('commercial', 'Commercial Intent'),
|
||
)
|
||
|
||
VARIANT_TYPES = (
|
||
('base', 'Base Keyword'),
|
||
('long_tail', 'Long-tail Variant'),
|
||
('brand', 'Brand Variant'),
|
||
('comparison', 'Comparison'),
|
||
('review', 'Review'),
|
||
('how_to', 'How-to'),
|
||
)
|
||
|
||
SOURCE_CHOICES = (
|
||
('auto_generated', 'Auto-Generated'),
|
||
('manual_entry', 'Manual Entry'),
|
||
('csv_import', 'CSV Import'),
|
||
('api_fetch', 'API Fetch'),
|
||
('library', 'IGNY8 Library'),
|
||
)
|
||
|
||
cluster = models.ForeignKey(
|
||
SAGCluster,
|
||
on_delete=models.CASCADE,
|
||
related_name='all_keywords'
|
||
)
|
||
keyword_text = models.CharField(max_length=255)
|
||
search_volume = models.IntegerField(null=True, blank=True)
|
||
difficulty = models.CharField(max_length=50, blank=True) # 'easy', 'medium', 'hard'
|
||
intent = models.CharField(max_length=50, choices=INTENT_CHOICES)
|
||
|
||
generated_from = models.CharField(max_length=100, blank=True) # template ID or source
|
||
variant_type = models.CharField(max_length=50, choices=VARIANT_TYPES)
|
||
source = models.CharField(max_length=50, choices=SOURCE_CHOICES)
|
||
|
||
cpc = models.FloatField(null=True, blank=True) # if available from API
|
||
competition = models.CharField(max_length=50, blank=True) # 'low', 'medium', 'high'
|
||
|
||
created_at = models.DateTimeField(auto_now_add=True)
|
||
updated_at = models.DateTimeField(auto_now=True)
|
||
|
||
class Meta:
|
||
db_table = 'sag_keyword'
|
||
unique_together = ('cluster', 'keyword_text')
|
||
ordering = ['-search_volume']
|
||
```
|
||
|
||
---
|
||
|
||
### 3.2 API Endpoints
|
||
|
||
#### POST /api/v1/blueprints/{blueprint_id}/clusters/form/
|
||
**Purpose:** Trigger cluster formation AI function
|
||
**Authentication:** Required (JWT)
|
||
**Input:**
|
||
```json
|
||
{
|
||
"populated_attributes": [
|
||
{"name": "Pet Type", "values": ["Dogs", "Cats"]},
|
||
{"name": "Health Condition", "values": ["Allergies", "Arthritis"]}
|
||
],
|
||
"max_clusters": 50
|
||
}
|
||
```
|
||
|
||
**Output:**
|
||
```json
|
||
{
|
||
"clusters": [...],
|
||
"summary": {
|
||
"total_clusters_formed": 12,
|
||
"type_distribution": {...}
|
||
},
|
||
"status": "success"
|
||
}
|
||
```
|
||
|
||
**Error Cases:**
|
||
- 400: Invalid attributes structure
|
||
- 403: Unauthorized (wrong blueprint owner)
|
||
- 422: Insufficient attributes for cluster formation (< 2 dimensions)
|
||
|
||
---
|
||
|
||
#### POST /api/v1/blueprints/{blueprint_id}/keywords/generate/
|
||
**Purpose:** Trigger keyword generation AI function
|
||
**Authentication:** Required
|
||
**Input:**
|
||
```json
|
||
{
|
||
"use_cluster_ids": ["cluster_001", "cluster_002"],
|
||
"target_keywords_per_cluster": 15,
|
||
"include_long_tail_variants": true
|
||
}
|
||
```
|
||
|
||
**Output:**
|
||
```json
|
||
{
|
||
"keywords_per_cluster": {...},
|
||
"deduplication": {
|
||
"duplicates_removed": 5
|
||
},
|
||
"summary": {
|
||
"total_unique_keywords": 180,
|
||
"within_constraints": true
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
#### POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/
|
||
**Purpose:** Add manual, CSV, library, or API-sourced keywords
|
||
**Authentication:** Required
|
||
**Input (Multiple Scenarios):**
|
||
|
||
**Scenario 1: Manual Entry**
|
||
```json
|
||
{
|
||
"source": "manual_entry",
|
||
"keywords": ["arthritis relief dogs", "joint pain dogs"],
|
||
"cluster_id": "cluster_001"
|
||
}
|
||
```
|
||
|
||
**Scenario 2: CSV Import**
|
||
```json
|
||
{
|
||
"source": "csv_import",
|
||
"csv_url": "https://example.com/keywords.csv",
|
||
"auto_cluster": true
|
||
}
|
||
```
|
||
|
||
**Scenario 3: Library Selection**
|
||
```json
|
||
{
|
||
"source": "library",
|
||
"library_keyword_ids": [123, 456, 789],
|
||
"auto_cluster": true
|
||
}
|
||
```
|
||
|
||
**Output:**
|
||
```json
|
||
{
|
||
"added_keywords": 10,
|
||
"auto_clustered": 9,
|
||
"flagged_for_review": 1,
|
||
"conflicts_resolved": {
|
||
"reassigned": 2,
|
||
"deferred": 1
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
#### POST /api/v1/blueprints/{blueprint_id}/assemble/
|
||
**Purpose:** Trigger blueprint assembly (create final SAGBlueprint with all records)
|
||
**Authentication:** Required
|
||
**Input:**
|
||
```json
|
||
{
|
||
"finalize_keyword_review": true,
|
||
"set_status": "ready_for_pipeline"
|
||
}
|
||
```
|
||
|
||
**Output:**
|
||
```json
|
||
{
|
||
"blueprint_id": 42,
|
||
"status": "ready_for_pipeline",
|
||
"summary": {
|
||
"total_attributes": 4,
|
||
"total_clusters": 12,
|
||
"total_keywords": 180,
|
||
"execution_priority_phases": 3
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
#### GET /api/v1/blueprints/{blueprint_id}/clusters/?status=draft&type=product_category
|
||
**Purpose:** List clusters with filtering
|
||
**Query Params:**
|
||
- `status`: draft, validated, keyword_assigned, content_created
|
||
- `type`: product_category, condition_problem, feature, brand, informational, comparison
|
||
- `min_viability`: 0.70
|
||
- `limit`: 50, `offset`: 0
|
||
|
||
**Output:**
|
||
```json
|
||
{
|
||
"results": [
|
||
{
|
||
"id": 1,
|
||
"cluster_key": "cluster_001",
|
||
"title": "Dog Arthritis Relief Solutions",
|
||
"hub_title": "Best Arthritis Treatments for Dogs",
|
||
"keyword_count": 15,
|
||
"viability_score": 0.92,
|
||
"type": "product_category"
|
||
}
|
||
],
|
||
"total_count": 12,
|
||
"total_keywords": 180
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
#### GET /api/v1/blueprints/{blueprint_id}/keywords/?cluster_id=cluster_001&source=auto_generated
|
||
**Purpose:** List keywords for a cluster
|
||
**Query Params:**
|
||
- `cluster_id`: filter by cluster
|
||
- `source`: auto_generated, manual_entry, csv_import, api_fetch, library
|
||
- `intent`: informational, transactional, navigational
|
||
- `min_search_volume`: 100
|
||
- `order_by`: search_volume (DESC), difficulty, intent
|
||
|
||
**Output:**
|
||
```json
|
||
{
|
||
"results": [
|
||
{
|
||
"id": 1,
|
||
"keyword_text": "best arthritis treatment for dogs",
|
||
"search_volume": 1200,
|
||
"difficulty": "medium",
|
||
"intent": "informational",
|
||
"variant_type": "long_tail",
|
||
"source": "auto_generated"
|
||
}
|
||
],
|
||
"total_count": 15
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
#### DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/
|
||
**Purpose:** Remove a keyword (before assembly)
|
||
**Authentication:** Required
|
||
**Status:** Only available if blueprint.status='draft' or 'keyword_generation_complete'
|
||
|
||
---
|
||
|
||
## 4. Implementation Steps
|
||
|
||
### Phase 1: AI Functions Development (Week 1-2)
|
||
|
||
#### Step 1.1: Set up cluster_formation.py structure
|
||
- [ ] Create `sag/ai_functions/cluster_formation.py`
|
||
- [ ] Define input/output contracts
|
||
- [ ] Implement intersection generation logic (2-value, 3-value)
|
||
- [ ] Stub out AI evaluation function (ready for Claude integration)
|
||
- [ ] Implement constraint filtering & sorting
|
||
|
||
#### Step 1.2: Implement cluster formation AI logic
|
||
- [ ] Integrate Claude AI API for cluster viability evaluation
|
||
- Real topical ecosystem check
|
||
- User search demand validation
|
||
- Content support assessment
|
||
- Differentiation evaluation
|
||
- [ ] Implement cluster type classification (using embeddings or rule-based logic)
|
||
- [ ] Implement hub title & supporting content plan generation
|
||
- [ ] Add viability scoring (0-1 scale)
|
||
- [ ] Implement distribution validation
|
||
|
||
#### Step 1.3: Unit tests for cluster formation
|
||
- [ ] Test intersection generation (2-value, 3-value)
|
||
- [ ] Test AI evaluation with mock responses
|
||
- [ ] Test constraint filtering (max 50 clusters)
|
||
- [ ] Test type distribution analysis
|
||
- [ ] Test handling of edge cases (0 intersections, all rejected, etc.)
|
||
|
||
#### Step 1.4: Create keyword_generation.py structure
|
||
- [ ] Create `sag/ai_functions/keyword_generation.py`
|
||
- [ ] Define input/output contracts
|
||
- [ ] Implement template substitution logic
|
||
- [ ] Implement long-tail variant generation
|
||
- [ ] Implement deduplication logic
|
||
|
||
#### Step 1.5: Implement keyword generation AI logic
|
||
- [ ] Integrate template loading from SectorAttributeTemplate (01B)
|
||
- [ ] Implement keyword enrichment (search volume, difficulty, intent)
|
||
- [ ] Implement filtering & sorting by search volume
|
||
- [ ] Implement constraint validation (10-25 per cluster, 300-500 total)
|
||
- [ ] Implement global deduplication & conflict resolution
|
||
|
||
#### Step 1.6: Unit tests for keyword generation
|
||
- [ ] Test template substitution with various attribute combinations
|
||
- [ ] Test long-tail variant generation
|
||
- [ ] Test deduplication across clusters
|
||
- [ ] Test constraint validation
|
||
- [ ] Test conflict resolution (multi-cluster keywords)
|
||
|
||
---
|
||
|
||
### Phase 2: Data Models & Service Layer (Week 2-3)
|
||
|
||
#### Step 2.1: Database migrations
|
||
- [ ] Create SAGKeyword model
|
||
- [ ] Add ManyToMany relations to SAGCluster (auto_generated_keywords, supplemented_keywords)
|
||
- [ ] Extend SAGBlueprint with denormalized JSON fields (attributes_json, clusters_json, taxonomy_plan, execution_priority)
|
||
- [ ] Extend SAGCluster with cluster_key, type, intersection_depth, viability_score, hub_title, supporting_content_plan
|
||
- [ ] Run and test migrations on dev database
|
||
|
||
#### Step 2.2: Implement blueprint_service.py
|
||
- [ ] Create `sag/services/blueprint_service.py`
|
||
- [ ] Implement assemble_blueprint() function with 8 steps
|
||
- [ ] Implement SAGBlueprint creation & status management
|
||
- [ ] Implement SAGAttribute creation from user input
|
||
- [ ] Implement SAGCluster creation from cluster formation results
|
||
- [ ] Implement SAGKeyword creation & assignment
|
||
- [ ] Implement taxonomy_plan generation
|
||
- [ ] Implement execution_priority generation
|
||
- [ ] Implement denormalized JSON population
|
||
|
||
#### Step 2.3: Unit tests for blueprint_service
|
||
- [ ] Test blueprint creation & status transitions
|
||
- [ ] Test attribute record creation
|
||
- [ ] Test cluster record creation with all fields
|
||
- [ ] Test keyword assignment to clusters
|
||
- [ ] Test taxonomy plan generation
|
||
- [ ] Test execution priority generation
|
||
- [ ] Test denormalized JSON accuracy
|
||
|
||
---
|
||
|
||
### Phase 3: API Endpoints & Integration (Week 3-4)
|
||
|
||
#### Step 3.1: Implement cluster formation API endpoint
|
||
- [ ] Create POST /api/v1/blueprints/{blueprint_id}/clusters/form/
|
||
- [ ] Validate input attributes
|
||
- [ ] Call cluster_formation() AI function
|
||
- [ ] Return results with summary
|
||
- [ ] Error handling (400, 403, 422)
|
||
|
||
#### Step 3.2: Implement keyword generation API endpoint
|
||
- [ ] Create POST /api/v1/blueprints/{blueprint_id}/keywords/generate/
|
||
- [ ] Validate input & cluster availability
|
||
- [ ] Call keyword_generation() AI function
|
||
- [ ] Return results with deduplication summary
|
||
- [ ] Error handling
|
||
|
||
#### Step 3.3: Implement keyword supplementation API endpoint
|
||
- [ ] Create POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/
|
||
- [ ] Support multiple input sources (manual, CSV, library, API)
|
||
- [ ] Implement auto-clustering via map_keyword_to_clusters()
|
||
- [ ] Implement conflict resolution via resolve_keyword_conflict()
|
||
- [ ] Return summary of added, clustered, flagged keywords
|
||
|
||
#### Step 3.4: Implement blueprint assembly API endpoint
|
||
- [ ] Create POST /api/v1/blueprints/{blueprint_id}/assemble/
|
||
- [ ] Call blueprint_service.assemble_blueprint()
|
||
- [ ] Manage status transitions
|
||
- [ ] Return blueprint summary with next steps
|
||
|
||
#### Step 3.5: Implement read endpoints
|
||
- [ ] Create GET /api/v1/blueprints/{blueprint_id}/clusters/?status=draft
|
||
- [ ] Create GET /api/v1/blueprints/{blueprint_id}/keywords/?cluster_id=...
|
||
- [ ] Implement filtering & pagination
|
||
- [ ] Add ordering options
|
||
|
||
#### Step 3.6: Implement keyword removal endpoint
|
||
- [ ] Create DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/
|
||
- [ ] Validate blueprint status (only draft)
|
||
- [ ] Cascade delete as needed
|
||
|
||
---
|
||
|
||
### Phase 4: Integration with 01D & Testing (Week 4-5)
|
||
|
||
#### Step 4.1: Integrate with Setup Wizard (01D)
|
||
- [ ] Call cluster_formation() after user populates attributes
|
||
- [ ] Display clusters to user for review (optional: allow edits)
|
||
- [ ] Call keyword_generation() if user confirms clusters
|
||
- [ ] Display keywords for review
|
||
- [ ] Allow manual supplementation before final assembly
|
||
|
||
#### Step 4.2: End-to-end testing
|
||
- [ ] Test full flow: attributes → clusters → keywords → blueprint
|
||
- [ ] Test with various sector/site_type combinations
|
||
- [ ] Test constraint enforcement
|
||
- [ ] Test conflict resolution with real scenarios
|
||
- [ ] Performance test with large attribute sets (100+ values)
|
||
|
||
#### Step 4.3: Integration with 01E (Pipeline Configuration)
|
||
- [ ] Verify blueprint is available to pipeline service
|
||
- [ ] Test taxonomy plan usage in content generation
|
||
- [ ] Test execution_priority ordering in pipeline
|
||
|
||
---
|
||
|
||
## 5. Acceptance Criteria
|
||
|
||
### Cluster Formation AI Function (01C-CF)
|
||
- [ ] **CF-1:** Generates all 2-value intersections from populated attributes
|
||
- [ ] **CF-2:** Generates relevant 3-value intersections (at least 50% of possible combinations)
|
||
- [ ] **CF-3:** AI evaluates each intersection on 5 decision criteria (ecosystem, demand, content support, differentiation, clarity)
|
||
- [ ] **CF-4:** Classification assigns correct cluster type (product_category, condition_problem, feature, brand, informational, comparison)
|
||
- [ ] **CF-5:** Hub titles are specific, actionable, and 5-12 words long
|
||
- [ ] **CF-6:** Supporting content plans contain 5-8 titles, semantically related to hub, covering different angles
|
||
- [ ] **CF-7:** Viability scores accurately reflect cluster strength (0-1 scale, with clear rationale)
|
||
- [ ] **CF-8:** Hard constraint enforced: max 50 clusters per sector, sorted by viability score
|
||
- [ ] **CF-9:** Type distribution meets targets: Product/Service 40-50%, Condition/Problem 20-30%, Feature 10-15%, Brand 5-10%, Life Stage 5-10%
|
||
- [ ] **CF-10:** Clusters have 3+ dimensional intersections for strong coherence
|
||
- [ ] **CF-11:** No duplicative clusters (semantic coherence check prevents near-duplicates like "Dog Joint Health" + "Dog Arthritis")
|
||
- [ ] **CF-12:** API response includes summary with cluster count, type distribution, avg intersection depth
|
||
|
||
### Keyword Generation AI Function (01C-KG)
|
||
- [ ] **KG-1:** Loads keyword templates from SectorAttributeTemplate for correct site_type
|
||
- [ ] **KG-2:** Substitutes attribute values into templates to generate base keywords
|
||
- [ ] **KG-3:** Generates long-tail variants (best, review, vs, for, how to) for each base keyword
|
||
- [ ] **KG-4:** Deduplicates keywords across all clusters (no keyword appears twice)
|
||
- [ ] **KG-5:** Global deduplication identifies multi-cluster keywords and reassigns via conflict resolution
|
||
- [ ] **KG-6:** Per-cluster keyword count: 10-25 keywords (soft target 15)
|
||
- [ ] **KG-7:** Total keyword count: 300-500+ for site (configurable per sector)
|
||
- [ ] **KG-8:** Keywords enriched with search volume, difficulty, intent classification
|
||
- [ ] **KG-9:** API response includes per-cluster breakdown, deduplication summary, total keyword count
|
||
- [ ] **KG-10:** Handles missing attribute values gracefully (skips template if required attrs not present)
|
||
|
||
### Keyword Conflict Resolution (01C-CR)
|
||
- [ ] **CR-1:** Identifies keywords matching multiple clusters (≥2 matches)
|
||
- [ ] **CR-2:** Decision Criterion 1: assigns to cluster with most dimensional intersections
|
||
- [ ] **CR-3:** Decision Criterion 2 (tiebreaker): assigns to more specific cluster
|
||
- [ ] **CR-4:** Decision Criterion 3 (tiebreaker): assigns by primary user intent match
|
||
- [ ] **CR-5:** Decision Criterion 4 (last resort): flags for user review with clear reasoning
|
||
- [ ] **CR-6:** Reassignment logic preserves keyword integrity (no loss, duplication, or orphaning)
|
||
|
||
### Blueprint Assembly Service (01C-BA)
|
||
- [ ] **BA-1:** Creates SAGBlueprint record with status='draft'
|
||
- [ ] **BA-2:** Creates SAGAttribute records from populated attributes (preserves name, values, is_primary flag)
|
||
- [ ] **BA-3:** Creates SAGCluster records from cluster formation output (all fields populated)
|
||
- [ ] **BA-4:** Creates SAGKeyword records from keyword generation output (all fields preserved)
|
||
- [ ] **BA-5:** Associates keywords to clusters via ManyToMany relations
|
||
- [ ] **BA-6:** Generates taxonomy_plan with WP categories (primary attributes) and tags (secondary)
|
||
- [ ] **BA-7:** Generates execution_priority with 3 phases: hubs first, supporting articles, term pages
|
||
- [ ] **BA-8:** Populates denormalized JSON fields (attributes_json, clusters_json) for fast queries
|
||
- [ ] **BA-9:** Returns blueprint ID and summary (attribute count, cluster count, keyword count, next steps)
|
||
- [ ] **BA-10:** Status transitions correctly: draft → ready_for_pipeline (or intermediate statuses as needed)
|
||
|
||
### Manual Keyword Supplementation (01C-MKS)
|
||
- [ ] **MKS-1:** Users can add keywords via: manual entry, CSV import, library selection, API fetch
|
||
- [ ] **MKS-2:** Manual entry accepts comma-separated keywords, validates against duplicates
|
||
- [ ] **MKS-3:** CSV import validates file structure (keyword, search_volume optional, difficulty optional)
|
||
- [ ] **MKS-4:** Library integration allows browsing & selection per site_type
|
||
- [ ] **MKS-5:** Auto-clustering maps new keywords to clusters via attribute similarity matching
|
||
- [ ] **MKS-6:** Unmatched keywords flagged for user review: gap analysis, potential new cluster, or outlier
|
||
- [ ] **MKS-7:** User can assign unmatched keywords to specific cluster or create new cluster
|
||
- [ ] **MKS-8:** API returns summary: added count, auto-clustered count, flagged count, conflicts resolved
|
||
|
||
### API Endpoints (01C-API)
|
||
- [ ] **API-1:** POST /api/v1/blueprints/{blueprint_id}/clusters/form/ returns 200 + cluster results
|
||
- [ ] **API-2:** POST /api/v1/blueprints/{blueprint_id}/keywords/generate/ returns 200 + keyword results
|
||
- [ ] **API-3:** POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/ returns 200 + supplementation summary
|
||
- [ ] **API-4:** POST /api/v1/blueprints/{blueprint_id}/assemble/ returns 200 + blueprint summary
|
||
- [ ] **API-5:** GET /api/v1/blueprints/{blueprint_id}/clusters/ supports status, type, min_viability filters
|
||
- [ ] **API-6:** GET /api/v1/blueprints/{blueprint_id}/keywords/ supports cluster_id, source, intent, min_search_volume filters
|
||
- [ ] **API-7:** DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/ only works on draft blueprints
|
||
- [ ] **API-8:** Error handling: 400 (bad input), 403 (unauthorized), 404 (not found), 422 (unprocessable)
|
||
|
||
### Data Integrity (01C-DI)
|
||
- [ ] **DI-1:** No keyword appears in multiple clusters (enforced via unique_together in SAGKeyword)
|
||
- [ ] **DI-2:** Deleted clusters cascade-delete associated keywords (no orphaned keywords)
|
||
- [ ] **DI-3:** Deleted blueprints cascade-delete all attributes, clusters, keywords
|
||
- [ ] **DI-4:** Blueprint status transitions prevent invalid operations (e.g., can't supplement keywords on published blueprint)
|
||
- [ ] **DI-5:** Denormalized JSON fields stay in sync with normalized records (updated on every change)
|
||
|
||
### Performance (01C-PERF)
|
||
- [ ] **PERF-1:** Cluster formation completes in <5 seconds for 100+ intersection combinations
|
||
- [ ] **PERF-2:** Keyword generation completes in <10 seconds for 50 clusters
|
||
- [ ] **PERF-3:** Blueprint assembly completes in <3 seconds (DB writes + JSON generation)
|
||
- [ ] **PERF-4:** GET endpoints with filters return results in <2 seconds
|
||
- [ ] **PERF-5:** CSV import (1000 keywords) completes in <15 seconds
|
||
|
||
---
|
||
|
||
## 6. Claude Code Instructions
|
||
|
||
### 6.1 Generating Cluster Formation Logic
|
||
|
||
**Prompt Template for Claude:**
|
||
```
|
||
Generate the cluster formation algorithm for an AI-powered content planning system.
|
||
|
||
Input:
|
||
- populated_attributes: List of attributes with values from user setup wizard
|
||
Example: [
|
||
{"name": "Pet Type", "values": ["Dogs", "Cats", "Birds"]},
|
||
{"name": "Health Condition", "values": ["Allergies", "Arthritis", "Obesity"]}
|
||
]
|
||
- sector_context: Information about the sector (e.g., "pet health e-commerce")
|
||
|
||
Task:
|
||
1. Generate all meaningful 2-value intersections (Pet Type × Health Condition, Pet Type × Pet Type, etc.)
|
||
2. For each intersection, use Claude's reasoning to evaluate:
|
||
- Is this a real topical ecosystem? (do the dimensions naturally fit together?)
|
||
- Would users search for this? (assess search demand)
|
||
- Can we build 1 hub + 3-8 supporting articles?
|
||
- Is it differentiated from other clusters?
|
||
3. Classify valid clusters by type: product_category, condition_problem, feature, brand, informational
|
||
4. Generate a compelling hub title and 5-8 supporting content titles
|
||
5. Assign a viability score (0-1) based on coherence, search demand, content potential
|
||
|
||
Output:
|
||
- clusters: Array of cluster objects with all fields from the spec
|
||
- summary: Total clusters, type distribution, viability analysis
|
||
|
||
Constraints:
|
||
- Max 50 clusters per sector
|
||
- Minimum 3 dimensional intersections for strong clusters
|
||
- Quality over quantity: prefer 5 strong clusters over 15 weak ones
|
||
```
|
||
|
||
### 6.2 Generating Keyword Generation Logic
|
||
|
||
**Prompt Template for Claude:**
|
||
```
|
||
Generate keywords for content clusters using templates and AI-driven expansion.
|
||
|
||
Input:
|
||
- clusters: Array of clusters from cluster formation (with dimensions and hub title)
|
||
- keyword_templates: Pre-configured templates for site_type
|
||
Example: [
|
||
"best {health_condition} for {pet_type}",
|
||
"{pet_type} {health_condition} treatment",
|
||
"affordable {health_condition} relief for {pet_type}"
|
||
]
|
||
- sector_context: Site type (ecommerce, blog, saas, etc.)
|
||
|
||
Task:
|
||
1. Load keyword templates filtered by sector site_type
|
||
2. For each cluster:
|
||
- Extract dimension values
|
||
- Substitute values into matching templates
|
||
- Generate long-tail variants: best, review, vs, for, how to
|
||
- Enrich with search volume, difficulty, intent (informational, transactional, etc.)
|
||
3. Deduplicate globally across all clusters
|
||
4. Identify multi-cluster keywords and resolve conflicts via:
|
||
- Highest dimensional intersection count
|
||
- Most specific cluster (tiebreaker)
|
||
- Primary user intent match (tiebreaker)
|
||
5. Validate constraints: 10-25 per cluster, 300-500 total
|
||
|
||
Output:
|
||
- keywords_per_cluster: Keywords organized by cluster ID
|
||
- deduplication: Count of duplicates removed, conflicts flagged
|
||
- summary: Total unique keywords, per-cluster average, search volume total
|
||
|
||
Constraints:
|
||
- Do NOT generate more than 25 keywords per cluster
|
||
- Do NOT allow duplicates
|
||
- Prioritize high search volume keywords
|
||
- Ensure diversity: mix of base keywords and long-tail variants
|
||
```
|
||
|
||
### 6.3 Integrating with Setup Wizard (01D)
|
||
|
||
**Implementation Notes:**
|
||
1. After user completes attribute population in wizard:
|
||
- Call `POST /api/v1/blueprints/{blueprint_id}/clusters/form/`
|
||
- Display clusters to user (preview mode)
|
||
- Allow user to: review, edit (rename hub titles, remove clusters), or confirm
|
||
|
||
2. After user confirms clusters:
|
||
- Call `POST /api/v1/blueprints/{blueprint_id}/keywords/generate/`
|
||
- Display keywords grouped by cluster (preview mode)
|
||
- Allow user to: supplement keywords, remove outliers, or confirm
|
||
|
||
3. Before finalizing blueprint:
|
||
- Optionally allow manual keyword supplementation (CSV, library, manual entry)
|
||
- Call `POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/` for each source
|
||
- Resolve conflicts (auto or manual)
|
||
- Call `POST /api/v1/blueprints/{blueprint_id}/assemble/` to finalize
|
||
|
||
### 6.4 Testing with Sample Data
|
||
|
||
**Test Case 1: Pet Health E-commerce Site**
|
||
```python
|
||
populated_attributes = [
|
||
{"name": "Pet Type", "values": ["Dogs", "Cats"]},
|
||
{"name": "Health Condition", "values": ["Arthritis", "Allergies", "Obesity"]},
|
||
{"name": "Target Audience", "values": ["Pet Owners", "Veterinarians"]}
|
||
]
|
||
|
||
sector_context = {
|
||
"sector_id": "pet_health",
|
||
"site_type": "ecommerce",
|
||
"sector_name": "Pet Health Products"
|
||
}
|
||
|
||
# Expected clusters:
|
||
# 1. Dog Arthritis Relief (product_category)
|
||
# 2. Cat Allergies Nutrition (product_category)
|
||
# 3. Senior Dog Joint Support (life_stage)
|
||
# ... etc.
|
||
```
|
||
|
||
**Test Case 2: Local Service (Veterinary Clinic)**
|
||
```python
|
||
populated_attributes = [
|
||
{"name": "Service Type", "values": ["Surgery", "Preventive Care", "Emergency"]},
|
||
{"name": "Pet Type", "values": ["Dogs", "Cats", "Exotic"]},
|
||
{"name": "Location", "values": ["Downtown", "Suburbs"]}
|
||
]
|
||
|
||
sector_context = {
|
||
"sector_id": "vet_clinic",
|
||
"site_type": "local_service",
|
||
"sector_name": "Veterinary Clinic"
|
||
}
|
||
|
||
# Expected clusters:
|
||
# 1. Emergency Dog Surgery Downtown (local_service + product_category)
|
||
# 2. Preventive Cat Care Suburbs (informational + local_service)
|
||
# ... etc.
|
||
```
|
||
|
||
---
|
||
|
||
## 7. Cross-Document References
|
||
|
||
### Upstream Dependencies
|
||
- **01A (SAG Master Data Models):** Provides SAGBlueprint, SAGAttribute, SAGCluster base models
|
||
- **01B (Sector Attribute Templates):** Provides attribute framework, keyword templates, site_type configurations
|
||
|
||
### Downstream Consumers
|
||
- **01D (Setup Wizard):** Triggers cluster formation & keyword generation after attribute population
|
||
- **01E (Blueprint-aware Pipeline):** Uses clusters, keywords, taxonomy_plan, execution_priority for content generation
|
||
- **01F (Existing Site Analysis):** May feed competitor/existing keywords into supplementation process
|
||
- **01G (Health Monitoring):** Tracks cluster completeness, keyword coverage, content generation progress against blueprint
|
||
|
||
---
|
||
|
||
## 8. Appendix: Algorithm Complexity & Performance Estimates
|
||
|
||
### Cluster Formation Complexity
|
||
- **Input:** N attributes with M average values each
|
||
- **Intersections Generated:** O(M²) for 2-value, O(M³) for 3-value
|
||
- **AI Evaluations:** O(M² or M³) function calls (largest cost)
|
||
- **Time Estimate:** ~1-2 seconds per 100 intersections (depending on Claude API latency)
|
||
- **Bottleneck:** Claude API response time for viability evaluation
|
||
|
||
### Keyword Generation Complexity
|
||
- **Input:** C clusters, T keyword templates per cluster
|
||
- **Base Keywords:** O(C × T) (template substitution)
|
||
- **Long-tail Variants:** O(C × T × V) where V ≈ 7 (base + 6 variants)
|
||
- **Deduplication:** O(K log K) where K = total keywords (sort-based)
|
||
- **Time Estimate:** ~3-5 seconds for 300+ keywords
|
||
|
||
### Blueprint Assembly Complexity
|
||
- **DB Writes:** O(A + C + K) where A=attributes, C=clusters, K=keywords
|
||
- **JSON Generation:** O(A + C + K) for denormalization
|
||
- **Time Estimate:** <1 second for typical blueprints (< 10 MB JSON)
|
||
|
||
---
|
||
|
||
**Document Complete**
|
||
**Status:** Ready for Development
|
||
**Next Step:** Implement Phase 1 (AI Functions) per Section 4
|