Files
igny8/v2/V2-Execution-Docs/01C-cluster-formation-keyword-engine.md
IGNY8 VPS (Salman) e78a41f11c v2-exece-docs
2026-03-23 10:30:51 +00:00

1624 lines
58 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# IGNY8 Phase 1: Cluster Formation & Keyword Engine (Doc 01C)
> **Version:** 1.1 (codebase-verified)
> **Source of Truth:** Codebase at `/data/app/igny8/backend/`
> **Last Verified:** 2025-07-14
**Document Version:** 1.1
**Date:** 2026-03-23
**Phase:** Phase 1 - Foundation & Intelligence
**Status:** Build Ready
---
## 1. Current State
### Existing Components
- **SAGBlueprint** (01A): Data model with status tracking, blueprint lifecycle management
- **SAGAttribute** & **SAGCluster** models (01A): Schema definitions for attributes and topic clusters
- **SectorAttributeTemplate** (01B): Pre-configured attribute framework with keyword templates per site_type
- **Setup Wizard** (01D): Collects sector, site_type, and populated attribute values from user
- **Blueprint Service** (01G - earlier iteration): Basic blueprint assembly, denormalization
### Current Limitations
- No automated cluster formation from attribute intersection logic
- No keyword generation from templates
- No conflict resolution for multi-cluster keyword assignments
- No cluster type classification (product, condition, feature, etc.)
- No validation of cluster viability (size, coherence, user demand)
- No hub title and supporting content plan generation
### Dependencies Ready
- ✅ Sector attribute templates loaded with keyword templates
- ✅ Setup wizard populates attributes
- ✅ Data models support cluster and keyword storage
- ✅ Blueprint lifecycle framework exists
---
## 2. What to Build
### 2.1 Cluster Formation AI Function
**File:** `sag/ai_functions/cluster_formation.py`
**Register Key:** `'form_clusters'`
**Triggering Context:** After user populates attributes in setup wizard; before keyword assignment
#### Input Contract
```python
{
"populated_attributes": [
{"name": "Target Audience", "values": ["Pet Owners", "Veterinarians"]},
{"name": "Pet Type", "values": ["Dogs", "Cats"]},
{"name": "Health Condition", "values": ["Allergies", "Arthritis", "Obesity"]}
],
"sector_context": {
"sector_id": int, # FK to igny8_core_auth.Sector (BigAutoField PK)
"site_type": "ecommerce|saas|blog|local_service",
"sector_name": str
},
"constraints": {
"max_clusters": 50, # hard cap per sector
"min_keywords_per_cluster": 5,
"max_keywords_per_cluster": 20,
"optimal_keywords_per_cluster": 7-15
}
}
```
#### Output Contract
```python
{
"clusters": [
{
"id": "cluster_001",
"title": "Dog Arthritis Relief Solutions",
"type": "product_category", # or condition_problem, feature, brand, informational, comparison
"dimensions": {
"primary": ["Pet Type: Dogs", "Health Condition: Arthritis"],
"secondary": ["Target Audience: Pet Owners"]
},
"intersection_depth": 3, # count of dimensional intersections
"viability_score": 0.92, # 0-1 based on coherence + demand assessment
"hub_title": "Best Arthritis Treatments for Dogs",
"supporting_content_plan": [
"Senior Dog Arthritis: Causes & Prevention",
"Dog Arthritis Medications: Complete Guide",
"Physical Therapy Exercises for Dogs with Arthritis",
"Diet Changes to Support Joint Health",
"When to See a Vet About Dog Joint Pain"
],
"keywords": [], # populated in keyword generation phase
"dimension_count": 3,
"validation": {
"is_real_topical_ecosystem": true,
"has_search_demand": true,
"can_support_content_plan": true,
"sufficient_differentiation": true
}
},
// ... more clusters
],
"summary": {
"total_clusters_formed": 12,
"type_distribution": {
"product_category": 6,
"condition_problem": 4,
"feature": 1,
"brand": 0,
"informational": 1,
"comparison": 0
},
"avg_intersection_depth": 2.3,
"clusters_below_viability_threshold": 0
}
}
```
#### Algorithm (Pseudocode)
```
FUNCTION form_clusters(populated_attributes, sector_context):
# STEP 1: Generate all 2-value intersections
all_intersections = []
for each attribute_pair in populated_attributes:
for value1 in attribute_pair[0].values:
for value2 in attribute_pair[1].values:
intersection = {
"dimensions": [value1, value2],
"attribute_names": [attribute_pair[0].name, attribute_pair[1].name]
}
all_intersections.append(intersection)
# Also generate 3-value intersections for strong coherence
for attribute_triplet in populated_attributes (size=3):
for value1 in attribute_triplet[0].values:
for value2 in attribute_triplet[1].values:
for value3 in attribute_triplet[2].values:
intersection = {
"dimensions": [value1, value2, value3],
"attribute_names": [name[0], name[1], name[2]]
}
all_intersections.append(intersection)
# STEP 2: AI evaluates each intersection
valid_clusters = []
for intersection in all_intersections:
evaluation = AI_EVALUATE_INTERSECTION(intersection, sector_context):
- Is this a real topical ecosystem?
- Would users search for this combination?
- Can we build a hub + 3-10 supporting articles?
- Is there sufficient differentiation from other clusters?
- Does the combination make semantic sense?
if evaluation.is_valid:
# STEP 3: Classify cluster type
cluster_type = AI_CLASSIFY_TYPE(intersection)
→ product_category, condition_problem, feature, brand,
informational, comparison
# STEP 4: Generate hub title + supporting content plan
hub_title = AI_GENERATE_HUB_TITLE(intersection, sector_context)
supporting_titles = AI_GENERATE_SUPPORTING_TITLES(
hub_title,
intersection,
count=5-8
)
# Create cluster object
cluster = {
"dimensions": intersection.dimensions,
"type": cluster_type,
"viability_score": evaluation.confidence_score,
"hub_title": hub_title,
"supporting_content_plan": supporting_titles,
"validation": evaluation
}
valid_clusters.append(cluster)
# STEP 4: Apply constraints & filtering
sorted_clusters = SORT_BY_VIABILITY_SCORE(valid_clusters)
final_clusters = sorted_clusters[0:max_clusters]
# STEP 5: Validate distribution & completeness
distribution = CALCULATE_TYPE_DISTRIBUTION(final_clusters)
# Flag if any type is severely under-represented
if distribution.imbalance > THRESHOLD:
LOG_WARNING("Type distribution may be suboptimal")
# STEP 6: Return with summary
return {
"clusters": final_clusters,
"summary": {
"total_clusters": len(final_clusters),
"type_distribution": distribution,
"viability_threshold_met": all clusters have score >= 0.70
}
}
END FUNCTION
```
#### AI Evaluation Criteria
For each intersection, the AI must answer:
1. **Real Topical Ecosystem?**
- Do the dimensions naturally connect in user intent?
- Is there an existing product/service/solution category?
- Example: YES - "Dog Arthritis Relief" (real problem + real solutions)
- Example: NO - "Vegetarian Chainsaw" (nonsensical combination)
2. **User Search Demand?**
- Would users actively search for this combination?
- Check: keyword templates, search volume patterns, user forums
- Target: ≥500 monthly searches for hub keyword
3. **Content Support?**
- Can we create 1 hub + 3-10 supporting articles?
- Is there enough subtopic depth?
- Example: YES - "Dog Arthritis" can have medication, exercise, diet, vet visits
- Example: NO - "Red Dog Collar" (too niche, limited subtopics)
4. **Sufficient Differentiation?**
- Does this cluster stand apart from others?
- Avoid near-duplicate clusters (e.g., "Dog Joint Health" vs "Dog Arthritis")
- Decision: merge or reject the weaker one
5. **Dimensional Clarity**
- Do all dimensions contribute meaningfully?
- Remove secondary dimensions that don't add coherence
#### Hard Constraints
- **Maximum Clusters:** 50 per sector (enforce in sorting/filtering)
- **Minimum Keywords per Cluster:** 5 (checked in keyword generation)
- **Maximum Keywords per Cluster:** 20 (checked in keyword generation)
- **Optimal Range:** 7-15 keywords per cluster
- **No Keyword Duplication:** Each keyword in exactly one cluster (enforced in conflict resolution)
- **Type Distribution Target:**
- Product/Service Type: 40-50%
- Condition/Problem: 20-30%
- Feature: 10-15%
- Brand: 5-10%
- Life Stage/Audience: 5-10%
---
### 2.2 Keyword Auto-Generation AI Function
**File:** `sag/ai_functions/keyword_generation.py`
**Register Key:** `'generate_keywords'`
**Triggering Context:** After cluster formation; before blueprint assembly
#### Input Contract
```python
{
"clusters": [ # output from cluster_formation
{
"id": "cluster_001",
"dimensions": ["Pet Type: Dogs", "Health Condition: Arthritis"],
"hub_title": "Best Arthritis Treatments for Dogs",
"supporting_content_plan": [...]
}
],
"sector_context": {
"sector_id": int, # FK to igny8_core_auth.Sector (BigAutoField PK)
"site_type": "ecommerce|saas|blog|local_service",
"site_intent": "sell|inform|book|download"
},
"keyword_templates": { # loaded from SectorAttributeTemplate
"template_001": "best {health_condition} for {pet_type}",
"template_002": "{pet_type} {health_condition} treatment",
// ... more templates
},
"constraints": {
"min_keywords_per_cluster": 10,
"max_keywords_per_cluster": 25,
"total_target": "300-500"
}
}
```
#### Output Contract
```python
{
"keywords_per_cluster": {
"cluster_001": {
"keywords": [
{
"keyword": "best arthritis treatment for dogs",
"search_volume": 1200,
"difficulty": "medium",
"intent": "informational",
"generated_from": "template_001",
"variant_type": "long_tail"
},
{
"keyword": "dog arthritis remedies",
"search_volume": 800,
"difficulty": "easy",
"intent": "informational",
"generated_from": "template_002",
"variant_type": "base"
},
// ... 13-23 more keywords
],
"keyword_count": 15,
"primary_intent": "informational",
"search_volume_total": 12500
}
},
"deduplication": {
"duplicates_removed": 8,
"flagged_conflicts": 3 # keywords fitting multiple clusters
},
"summary": {
"total_unique_keywords": 342,
"per_cluster_avg": 14.25,
"total_search_volume": 892000,
"within_constraints": true
}
}
```
#### Algorithm (Pseudocode)
```
FUNCTION generate_keywords(clusters, sector_context, keyword_templates):
all_keywords = {}
FOR EACH cluster IN clusters:
# STEP 1: Extract attribute values from cluster dimensions
attribute_values = EXTRACT_ATTRIBUTE_VALUES(cluster.dimensions)
# Output: {"Pet Type": "Dogs", "Health Condition": "Arthritis", ...}
cluster_keywords = []
# STEP 2: Substitute values into templates
FOR EACH template IN keyword_templates:
# Check if template requires all attribute values present
required_attrs = PARSE_TEMPLATE_VARIABLES(template)
if ALL_ATTRS_AVAILABLE(required_attrs, attribute_values):
# Substitute values
base_keyword = SUBSTITUTE_VALUES(template, attribute_values)
cluster_keywords.append({
"keyword": base_keyword,
"generated_from": template.id,
"variant_type": "base"
})
# STEP 3: Generate long-tail variants
long_tail_variants = []
FOR EACH base_keyword IN cluster_keywords:
# "best arthritis treatment for dogs"
variants = []
# Variant: Add "best"
variants.append("best " + base_keyword)
# Variant: Add "review"
variants.append(base_keyword + " review")
# Variant: Add "vs" (comparison)
if CLUSTER_TYPE in [product_category, comparison]:
variants.append(base_keyword + " vs alternatives")
# Variant: Add "for" (audience)
variants.append(base_keyword + " for seniors")
# Variant: Add "how to"
variants.append("how to " + base_keyword)
# Variant: Add "cost" (ecommerce intent)
if site_intent == "sell":
variants.append(base_keyword + " cost")
FOR EACH variant IN variants:
if NOT_DUPLICATE(variant, cluster_keywords):
cluster_keywords.append({
"keyword": variant,
"variant_type": "long_tail",
"parent": base_keyword
})
# STEP 4: Enrich keywords with metadata
enriched_keywords = []
FOR EACH kw IN cluster_keywords:
enriched = {
"keyword": kw.keyword,
"search_volume": ESTIMATE_SEARCH_VOLUME(kw.keyword, sector),
"difficulty": ESTIMATE_DIFFICULTY(kw.keyword, sector),
"intent": CLASSIFY_INTENT(kw.keyword), # informational, transactional, navigational
"generated_from": kw.generated_from,
"variant_type": kw.variant_type
}
enriched_keywords.append(enriched)
# STEP 5: Filter & sort
filtered_keywords = SORT_BY_SEARCH_VOLUME(enriched_keywords)
# Keep top 10-25 per cluster
cluster_keywords_final = filtered_keywords[0:25]
# Validate minimum
if LEN(cluster_keywords_final) < 10:
ADD_SUPPLEMENTARY_KEYWORDS(cluster_keywords_final, 5)
all_keywords[cluster.id] = {
"keywords": cluster_keywords_final,
"keyword_count": len(cluster_keywords_final),
"primary_intent": MODE(intent from all keywords),
"search_volume_total": SUM(all search volumes)
}
# STEP 6: Global deduplication
all_keywords_flat = FLATTEN(all_keywords)
duplicates = FIND_DUPLICATES(all_keywords_flat)
FOR EACH duplicate_set IN duplicates:
primary_cluster = PRIMARY_CLUSTER(duplicate_set) # best fit by dimensions
REASSIGN_DUPLICATES_TO_PRIMARY(duplicate_set, primary_cluster)
# STEP 7: Validate constraints
total_keywords = SUM(keyword_count for each cluster)
validation = {
"within_min_per_cluster": all clusters >= 10,
"within_max_per_cluster": all clusters <= 25,
"total_within_target": total_keywords between 300-500,
"no_duplicates": len(duplicates) == 0
}
if NOT validation.all_true:
LOG_WARNING("Keyword generation constraints not fully met")
# STEP 8: Return results
return {
"keywords_per_cluster": all_keywords,
"deduplication": {
"duplicates_removed": len(duplicates),
"flagged_conflicts": identify_multi_cluster_fits()
},
"summary": {
"total_unique_keywords": total_keywords,
"per_cluster_avg": total_keywords / len(clusters),
"total_search_volume": sum of all volumes,
"within_constraints": validation.all_true
}
}
END FUNCTION
```
#### Keyword Template Structure (from SectorAttributeTemplate, 01B)
```python
# Example for Pet Health ecommerce site
keyword_templates = {
"site_type": "ecommerce",
"templates": [
{
"id": "template_001",
"pattern": "best {health_condition} treatment for {pet_type}",
"weight": 5, # prioritize this template
"min_required_attrs": ["health_condition", "pet_type"]
},
{
"id": "template_002",
"pattern": "{pet_type} {health_condition} medication",
"weight": 4,
"min_required_attrs": ["pet_type", "health_condition"]
},
{
"id": "template_003",
"pattern": "affordable {health_condition} relief for {pet_type}",
"weight": 3,
"min_required_attrs": ["health_condition", "pet_type"]
},
// ... more templates
]
}
```
#### Long-tail Variant Rules
| Variant Type | Pattern | Use Case | Example |
|---|---|---|---|
| Base | {keyword} | All clusters | "dog arthritis relief" |
| Best/Top | best {keyword} | All clusters | "best dog arthritis relief" |
| Review | {keyword} review | Product clusters | "arthritis supplement for dogs review" |
| Comparison | {keyword} vs | Comparison intent | "arthritis medication vs supplement for dogs" |
| Audience | {keyword} for {audience} | Audience-specific | "dog arthritis relief for senior dogs" |
| How-to | how to {verb} {keyword} | Problem-solution | "how to manage dog arthritis" |
| Cost/Price | {keyword} cost | Ecommerce intent | "arthritis treatment for dogs cost" |
| Quick | {keyword} fast | Urgency-driven | "fast arthritis relief for dogs" |
---
### 2.3 Blueprint Assembly Service
**File:** `sag/services/blueprint_service.py`
**Primary Function:** `assemble_blueprint(site, attributes, clusters, keywords)`
**Triggering Context:** After keyword generation; creates SAGBlueprint (status=draft)
#### Input Contract
```python
assemble_blueprint(
site: Site, # igny8_core_auth.Site (integer PK)
sector: Sector, # igny8_core_auth.Sector (integer PK)
attributes: List[Tuple[name, values]], # user-populated
clusters: List[Dict], # from cluster_formation()
keywords: Dict[cluster_id, List[Dict]] # from generate_keywords()
)
```
#### Execution Steps
1. **Create SAGBlueprint Record**
```python
blueprint = SAGBlueprint.objects.create(
site=site,
status='draft',
phase='phase_1_foundation',
sector=sector,
created_by=current_user,
metadata={
'version': '1.0',
'created_date': now(),
'last_modified': now()
}
)
```
2. **Create SAGAttribute Records**
```python
FOR EACH (attribute_name, values) IN attributes:
attribute = SAGAttribute.objects.create(
blueprint=blueprint,
name=attribute_name,
values=values, # stored as JSON array
is_primary=DETERMINE_PRIMACY(attribute_name, site.site_type),
source='user_input'
)
```
3. **Create SAGCluster Records from Formed Clusters**
```python
FOR EACH cluster IN clusters:
db_cluster = SAGCluster.objects.create(
blueprint=blueprint,
cluster_key=cluster['id'],
title=cluster['hub_title'],
description=GENERATE_CLUSTER_DESC(cluster),
cluster_type=cluster['type'],
dimensions=cluster['dimensions'], # JSON
intersection_depth=cluster['intersection_depth'],
viability_score=cluster['viability_score'],
hub_title=cluster['hub_title'],
supporting_content_plan=cluster['supporting_content_plan'], # JSON array
status='draft',
keyword_count=0 # updated in next step
)
```
4. **Populate auto_generated_keywords on Each Cluster**
```python
FOR EACH (cluster_id, keyword_list) IN keywords.items():
cluster = SAGCluster.objects.get(cluster_key=cluster_id)
keyword_records = []
FOR EACH kw_data IN keyword_list:
keyword = SAGKeyword.objects.create(
cluster=cluster,
keyword_text=kw_data['keyword'],
search_volume=kw_data['search_volume'],
difficulty=kw_data['difficulty'],
intent=kw_data['intent'],
generated_from=kw_data['generated_from'],
variant_type=kw_data['variant_type'],
source='auto_generated'
)
keyword_records.append(keyword)
cluster.auto_generated_keywords.set(keyword_records)
cluster.keyword_count = len(keyword_records)
cluster.save()
```
5. **Generate Taxonomy Plan**
```python
taxonomy_plan = {
'wp_categories': [],
'wp_tags': [],
'hierarchy': {}
}
FOR EACH attribute IN blueprint.sagattribute_set.all():
if attribute.is_primary:
category = {
'name': attribute.name,
'slug': slugify(attribute.name),
'description': f"Posts about {attribute.name}"
}
taxonomy_plan['wp_categories'].append(category)
else:
tag = {
'name': v,
'slug': slugify(v),
'parent_category': primary_attr_name
}
FOR EACH v IN attribute.values:
taxonomy_plan['wp_tags'].append(tag)
blueprint.taxonomy_plan = taxonomy_plan # JSON field
```
6. **Generate Execution Priority (Phased Approach)**
```python
execution_priority = {
'phase': 'phase_1_hubs',
'content_sequence': []
}
# Phase 1: Hub pages (1 per cluster)
hub_items = []
FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'):
hub_items.append({
'type': 'hub_page',
'cluster_id': cluster.id,
'title': cluster.hub_title,
'priority': 1,
'estimated_effort': 'high',
'SEO_impact': 'critical'
})
execution_priority['content_sequence'].extend(hub_items)
# Phase 2: Supporting content (5-8 articles per cluster)
supporting_items = []
FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'):
FOR EACH content_title IN cluster.supporting_content_plan:
supporting_items.append({
'type': 'supporting_article',
'cluster_id': cluster.id,
'parent_hub': cluster.hub_title,
'title': content_title,
'priority': 2,
'estimated_effort': 'medium',
'SEO_impact': 'supporting'
})
execution_priority['content_sequence'].extend(supporting_items)
# Phase 3: Term/pillar pages (keywords + long-tail)
term_items = []
FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'):
FOR EACH keyword IN cluster.auto_generated_keywords.all():
term_items.append({
'type': 'term_page',
'cluster_id': cluster.id,
'keyword': keyword.keyword_text,
'priority': 3,
'estimated_effort': 'low',
'SEO_impact': 'supportive'
})
execution_priority['content_sequence'].extend(term_items)
blueprint.execution_priority = execution_priority # JSON field
```
7. **Populate Denormalized JSON Fields**
```python
blueprint.attributes_json = {
'total_attributes': blueprint.sagattribute_set.count(),
'summary': [
{
'name': attr.name,
'value_count': len(attr.values),
'values': attr.values,
'is_primary': attr.is_primary
}
FOR EACH attr IN blueprint.sagattribute_set.all()
]
}
blueprint.clusters_json = {
'total_clusters': blueprint.sagcluster_set.count(),
'summary': [
{
'id': cluster.cluster_key,
'title': cluster.title,
'type': cluster.cluster_type,
'keyword_count': cluster.keyword_count,
'viability_score': cluster.viability_score
}
FOR EACH cluster IN blueprint.sagcluster_set.all()
]
}
blueprint.save()
```
8. **Return Blueprint ID & Status**
```python
return {
'blueprint_id': blueprint.id,
'status': 'draft',
'created_at': blueprint.created_at,
'summary': {
'total_attributes': blueprint.sagattribute_set.count(),
'total_clusters': blueprint.sagcluster_set.count(),
'total_keywords': SAGKeyword.objects.filter(cluster__blueprint=blueprint).count(),
'next_step': 'review blueprint in 01E (Pipeline Configuration)'
}
}
```
---
### 2.4 Manual Keyword Supplementation (User Interface)
#### Feature: Add Keywords from Multiple Sources
1. **IGNY8 Library Integration**
- Users browse pre-curated keyword library per site_type
- Select keywords → auto-map to clusters by attribute match
- Unmatched keywords → flagged for review
2. **Manual Entry**
- Form field: paste or type keywords (comma-separated)
- System deduplicates against existing
- Prompts user to assign to cluster(s)
3. **CSV Import**
- Upload CSV with columns: keyword, search_volume (optional), difficulty (optional)
- Preview & validate before import
- Bulk assign to clusters or mark for review
4. **Keyword API Integration** (optional in Phase 1)
- Connect to SEMrush, Ahrefs, or similar
- Fetch keyword suggestions for cluster dimensions
- User approves additions
#### Keyword Mapping Logic
```python
FUNCTION map_keyword_to_clusters(new_keyword, clusters, threshold=0.70):
matches = []
FOR EACH cluster IN clusters:
# Extract all attribute values from cluster dimensions
cluster_attrs = EXTRACT_ATTRIBUTES(cluster.dimensions)
# Calculate semantic similarity
similarity = CALCULATE_SIMILARITY(new_keyword, cluster_attrs)
if similarity > threshold:
matches.append({
'cluster_id': cluster.id,
'cluster_title': cluster.title,
'similarity_score': similarity
})
return matches # May be 0, 1, or multiple matches
END FUNCTION
```
#### Conflict Resolution: Multi-Cluster Keyword Assignment
**Problem:** A keyword fits multiple clusters (e.g., "arthritis relief for pets" fits both Dog Cluster and Cat Cluster)
**Resolution Algorithm:**
1. **Identify Multi-Fit Keywords**
```python
potential_conflicts = []
FOR EACH new_keyword IN keywords_to_add:
matching_clusters = map_keyword_to_clusters(new_keyword, all_clusters)
if len(matching_clusters) > 1:
potential_conflicts.append({
'keyword': new_keyword,
'matching_clusters': matching_clusters
})
```
2. **Apply Decision Criteria (in order)**
- **Criterion 1: Dimensional Intersection Count**
- Assign to cluster with MOST dimensional intersections
- Example: "dog arthritis relief" → Dog cluster has 3 dimensions (pet type, condition, audience); Cat cluster has 2 → assign to Dog cluster
- **Criterion 2: Specificity**
- If tied on intersection count, assign to MORE SPECIFIC cluster
- Example: "arthritis relief" (general) vs "dog arthritis relief" (specific) → assign to Dog cluster
- **Criterion 3: Primary User Intent Match**
- If still tied, assign to cluster whose hub_title best matches user intent
- Example: Both Dog & Cat clusters have "arthritis relief" hub; Dog hub is "Best Arthritis Treatments for Dogs" → assign to Dog
- **Criterion 4: Last Resort - Create New Cluster**
- If keyword doesn't fit any cluster well, flag as "potential_new_cluster"
- User reviews and decides: split existing cluster, merge, or create new
3. **Implementation**
```python
FUNCTION resolve_keyword_conflict(keyword, matching_clusters):
# Step 1: Compare intersection depth
sorted_by_depth = SORT_BY(matching_clusters, 'intersection_depth', DESC)
best_by_depth = sorted_by_depth[0]
if sorted_by_depth[0].intersection_depth > sorted_by_depth[1].intersection_depth:
return best_by_depth
# Step 2: Compare specificity
specificity_scores = [CALC_SPECIFICITY(cluster, keyword) for cluster in sorted_by_depth]
best_by_specificity = sorted_by_depth[ARGMAX(specificity_scores)]
if specificity_scores[0] > specificity_scores[1]:
return best_by_specificity
# Step 3: Compare intent match
intent_scores = [CALC_INTENT_MATCH(cluster.hub_title, keyword) for cluster in sorted_by_depth]
best_by_intent = sorted_by_depth[ARGMAX(intent_scores)]
if intent_scores[0] > intent_scores[1]:
return best_by_intent
# Step 4: Flag for user review
return {
'status': 'flagged_for_review',
'keyword': keyword,
'candidates': matching_clusters,
'reason': 'ambiguous_assignment'
}
END FUNCTION
```
---
## 3. Data Models / APIs
### 3.1 Database Models (Django ORM)
#### SAGBlueprint (existing from 01A, extended)
```python
# Inherits account, created_at, updated_at from AccountBaseModel
class SAGBlueprint(AccountBaseModel):
STATUS_CHOICES = (
('draft', 'Draft'),
('cluster_formation_complete', 'Cluster Formation Complete'),
('keyword_generation_complete', 'Keyword Generation Complete'),
('keyword_supplemented', 'Keywords Supplemented'),
('ready_for_pipeline', 'Ready for Pipeline'),
('published', 'Published'),
)
site = models.ForeignKey('igny8_core_auth.Site', on_delete=models.CASCADE)
status = models.CharField(max_length=50, choices=STATUS_CHOICES, default='draft')
phase = models.CharField(max_length=50, default='phase_1_foundation')
sector = models.ForeignKey('igny8_core_auth.Sector', on_delete=models.CASCADE)
# Denormalized JSON for fast access
attributes_json = models.JSONField(default=dict, blank=True)
clusters_json = models.JSONField(default=dict, blank=True)
taxonomy_plan = models.JSONField(default=dict, blank=True)
execution_priority = models.JSONField(default=dict, blank=True)
created_by = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.SET_NULL, null=True)
# created_at, updated_at inherited from AccountBaseModel
class Meta:
db_table = 'sag_blueprint'
ordering = ['-created_at']
```
#### SAGAttribute (existing from 01A, no changes required)
```python
# Inherits account, created_at, updated_at from AccountBaseModel
class SAGAttribute(AccountBaseModel):
blueprint = models.ForeignKey(SAGBlueprint, on_delete=models.CASCADE)
name = models.CharField(max_length=255)
values = models.JSONField() # array of strings
is_primary = models.BooleanField(default=False)
source = models.CharField(max_length=50) # 'user_input', 'template', 'api'
# created_at, updated_at inherited from AccountBaseModel
class Meta:
db_table = 'sag_attribute'
unique_together = ('blueprint', 'name')
```
#### SAGCluster (existing from 01A, extended)
```python
# Inherits account, created_at, updated_at from AccountBaseModel
class SAGCluster(AccountBaseModel):
TYPE_CHOICES = (
('product_category', 'Product/Service Category'),
('condition_problem', 'Condition/Problem'),
('feature', 'Feature'),
('brand', 'Brand'),
('informational', 'Informational'),
('comparison', 'Comparison'),
('life_stage', 'Life Stage/Audience'),
)
STATUS_CHOICES = (
('draft', 'Draft'),
('validated', 'Validated'),
('keyword_assigned', 'Keywords Assigned'),
('content_created', 'Content Created'),
)
blueprint = models.ForeignKey(SAGBlueprint, on_delete=models.CASCADE)
cluster_key = models.CharField(max_length=100) # unique ID from cluster formation
title = models.CharField(max_length=255)
description = models.TextField(blank=True)
cluster_type = models.CharField(max_length=50, choices=TYPE_CHOICES)
dimensions = models.JSONField() # ["dimension1", "dimension2", ...]
intersection_depth = models.IntegerField() # count of intersecting dimensions
viability_score = models.FloatField() # 0-1
hub_title = models.CharField(max_length=255)
supporting_content_plan = models.JSONField() # array of content titles
auto_generated_keywords = models.ManyToManyField(
'SAGKeyword',
related_name='clusters_auto',
blank=True
)
supplemented_keywords = models.ManyToManyField(
'SAGKeyword',
related_name='clusters_supplemented',
blank=True
)
keyword_count = models.IntegerField(default=0)
status = models.CharField(max_length=50, choices=STATUS_CHOICES, default='draft')
# created_at, updated_at inherited from AccountBaseModel
class Meta:
db_table = 'sag_cluster'
unique_together = ('blueprint', 'cluster_key')
ordering = ['-viability_score']
```
#### SAGKeyword (new)
```python
# Inherits account, created_at, updated_at from AccountBaseModel
class SAGKeyword(AccountBaseModel):
INTENT_CHOICES = (
('informational', 'Informational'),
('transactional', 'Transactional'),
('navigational', 'Navigational'),
('commercial', 'Commercial Intent'),
)
VARIANT_TYPES = (
('base', 'Base Keyword'),
('long_tail', 'Long-tail Variant'),
('brand', 'Brand Variant'),
('comparison', 'Comparison'),
('review', 'Review'),
('how_to', 'How-to'),
)
SOURCE_CHOICES = (
('auto_generated', 'Auto-Generated'),
('manual_entry', 'Manual Entry'),
('csv_import', 'CSV Import'),
('api_fetch', 'API Fetch'),
('library', 'IGNY8 Library'),
)
cluster = models.ForeignKey(
SAGCluster,
on_delete=models.CASCADE,
related_name='all_keywords'
)
keyword_text = models.CharField(max_length=255)
search_volume = models.IntegerField(null=True, blank=True)
difficulty = models.CharField(max_length=50, blank=True) # 'easy', 'medium', 'hard'
intent = models.CharField(max_length=50, choices=INTENT_CHOICES)
generated_from = models.CharField(max_length=100, blank=True) # template ID or source
variant_type = models.CharField(max_length=50, choices=VARIANT_TYPES)
source = models.CharField(max_length=50, choices=SOURCE_CHOICES)
cpc = models.FloatField(null=True, blank=True) # if available from API
competition = models.CharField(max_length=50, blank=True) # 'low', 'medium', 'high'
# created_at, updated_at inherited from AccountBaseModel
class Meta:
db_table = 'sag_keyword'
unique_together = ('cluster', 'keyword_text')
ordering = ['-search_volume']
```
---
### 3.2 API Endpoints
#### POST /api/v1/blueprints/{blueprint_id}/clusters/form/
**Purpose:** Trigger cluster formation AI function
**Authentication:** Required (JWT)
**Input:**
```json
{
"populated_attributes": [
{"name": "Pet Type", "values": ["Dogs", "Cats"]},
{"name": "Health Condition", "values": ["Allergies", "Arthritis"]}
],
"max_clusters": 50
}
```
**Output:**
```json
{
"clusters": [...],
"summary": {
"total_clusters_formed": 12,
"type_distribution": {...}
},
"status": "success"
}
```
**Error Cases:**
- 400: Invalid attributes structure
- 403: Unauthorized (wrong blueprint owner)
- 422: Insufficient attributes for cluster formation (< 2 dimensions)
---
#### POST /api/v1/blueprints/{blueprint_id}/keywords/generate/
**Purpose:** Trigger keyword generation AI function
**Authentication:** Required
**Input:**
```json
{
"use_cluster_ids": ["cluster_001", "cluster_002"],
"target_keywords_per_cluster": 15,
"include_long_tail_variants": true
}
```
**Output:**
```json
{
"keywords_per_cluster": {...},
"deduplication": {
"duplicates_removed": 5
},
"summary": {
"total_unique_keywords": 180,
"within_constraints": true
}
}
```
---
#### POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/
**Purpose:** Add manual, CSV, library, or API-sourced keywords
**Authentication:** Required
**Input (Multiple Scenarios):**
**Scenario 1: Manual Entry**
```json
{
"source": "manual_entry",
"keywords": ["arthritis relief dogs", "joint pain dogs"],
"cluster_id": "cluster_001"
}
```
**Scenario 2: CSV Import**
```json
{
"source": "csv_import",
"csv_url": "https://example.com/keywords.csv",
"auto_cluster": true
}
```
**Scenario 3: Library Selection**
```json
{
"source": "library",
"library_keyword_ids": [123, 456, 789],
"auto_cluster": true
}
```
**Output:**
```json
{
"added_keywords": 10,
"auto_clustered": 9,
"flagged_for_review": 1,
"conflicts_resolved": {
"reassigned": 2,
"deferred": 1
}
}
```
---
#### POST /api/v1/blueprints/{blueprint_id}/assemble/
**Purpose:** Trigger blueprint assembly (create final SAGBlueprint with all records)
**Authentication:** Required
**Input:**
```json
{
"finalize_keyword_review": true,
"set_status": "ready_for_pipeline"
}
```
**Output:**
```json
{
"blueprint_id": 42,
"status": "ready_for_pipeline",
"summary": {
"total_attributes": 4,
"total_clusters": 12,
"total_keywords": 180,
"execution_priority_phases": 3
}
}
```
---
#### GET /api/v1/blueprints/{blueprint_id}/clusters/?status=draft&type=product_category
**Purpose:** List clusters with filtering
**Query Params:**
- `status`: draft, validated, keyword_assigned, content_created
- `type`: product_category, condition_problem, feature, brand, informational, comparison
- `min_viability`: 0.70
- `limit`: 50, `offset`: 0
**Output:**
```json
{
"results": [
{
"id": 1,
"cluster_key": "cluster_001",
"title": "Dog Arthritis Relief Solutions",
"hub_title": "Best Arthritis Treatments for Dogs",
"keyword_count": 15,
"viability_score": 0.92,
"type": "product_category"
}
],
"total_count": 12,
"total_keywords": 180
}
```
---
#### GET /api/v1/blueprints/{blueprint_id}/keywords/?cluster_id=cluster_001&source=auto_generated
**Purpose:** List keywords for a cluster
**Query Params:**
- `cluster_id`: filter by cluster
- `source`: auto_generated, manual_entry, csv_import, api_fetch, library
- `intent`: informational, transactional, navigational
- `min_search_volume`: 100
- `order_by`: search_volume (DESC), difficulty, intent
**Output:**
```json
{
"results": [
{
"id": 1,
"keyword_text": "best arthritis treatment for dogs",
"search_volume": 1200,
"difficulty": "medium",
"intent": "informational",
"variant_type": "long_tail",
"source": "auto_generated"
}
],
"total_count": 15
}
```
---
#### DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/
**Purpose:** Remove a keyword (before assembly)
**Authentication:** Required
**Status:** Only available if blueprint.status='draft' or 'keyword_generation_complete'
---
## 4. Implementation Steps
### Phase 1: AI Functions Development (Week 1-2)
#### Step 1.1: Set up cluster_formation.py structure
- [ ] Create `sag/ai_functions/cluster_formation.py`
- [ ] Define input/output contracts
- [ ] Implement intersection generation logic (2-value, 3-value)
- [ ] Stub out AI evaluation function (ready for Claude integration)
- [ ] Implement constraint filtering & sorting
#### Step 1.2: Implement cluster formation AI logic
- [ ] Integrate Claude AI API for cluster viability evaluation
- Real topical ecosystem check
- User search demand validation
- Content support assessment
- Differentiation evaluation
- [ ] Implement cluster type classification (using embeddings or rule-based logic)
- [ ] Implement hub title & supporting content plan generation
- [ ] Add viability scoring (0-1 scale)
- [ ] Implement distribution validation
#### Step 1.3: Unit tests for cluster formation
- [ ] Test intersection generation (2-value, 3-value)
- [ ] Test AI evaluation with mock responses
- [ ] Test constraint filtering (max 50 clusters)
- [ ] Test type distribution analysis
- [ ] Test handling of edge cases (0 intersections, all rejected, etc.)
#### Step 1.4: Create keyword_generation.py structure
- [ ] Create `sag/ai_functions/keyword_generation.py`
- [ ] Define input/output contracts
- [ ] Implement template substitution logic
- [ ] Implement long-tail variant generation
- [ ] Implement deduplication logic
#### Step 1.5: Implement keyword generation AI logic
- [ ] Integrate template loading from SectorAttributeTemplate (01B)
- [ ] Implement keyword enrichment (search volume, difficulty, intent)
- [ ] Implement filtering & sorting by search volume
- [ ] Implement constraint validation (10-25 per cluster, 300-500 total)
- [ ] Implement global deduplication & conflict resolution
#### Step 1.6: Unit tests for keyword generation
- [ ] Test template substitution with various attribute combinations
- [ ] Test long-tail variant generation
- [ ] Test deduplication across clusters
- [ ] Test constraint validation
- [ ] Test conflict resolution (multi-cluster keywords)
---
### Phase 2: Data Models & Service Layer (Week 2-3)
#### Step 2.1: Database migrations
- [ ] Create SAGKeyword model
- [ ] Add ManyToMany relations to SAGCluster (auto_generated_keywords, supplemented_keywords)
- [ ] Extend SAGBlueprint with denormalized JSON fields (attributes_json, clusters_json, taxonomy_plan, execution_priority)
- [ ] Extend SAGCluster with cluster_key, type, intersection_depth, viability_score, hub_title, supporting_content_plan
- [ ] Run and test migrations on dev database
#### Step 2.2: Implement blueprint_service.py
- [ ] Create `sag/services/blueprint_service.py`
- [ ] Implement assemble_blueprint() function with 8 steps
- [ ] Implement SAGBlueprint creation & status management
- [ ] Implement SAGAttribute creation from user input
- [ ] Implement SAGCluster creation from cluster formation results
- [ ] Implement SAGKeyword creation & assignment
- [ ] Implement taxonomy_plan generation
- [ ] Implement execution_priority generation
- [ ] Implement denormalized JSON population
#### Step 2.3: Unit tests for blueprint_service
- [ ] Test blueprint creation & status transitions
- [ ] Test attribute record creation
- [ ] Test cluster record creation with all fields
- [ ] Test keyword assignment to clusters
- [ ] Test taxonomy plan generation
- [ ] Test execution priority generation
- [ ] Test denormalized JSON accuracy
---
### Phase 3: API Endpoints & Integration (Week 3-4)
#### Step 3.1: Implement cluster formation API endpoint
- [ ] Create POST /api/v1/blueprints/{blueprint_id}/clusters/form/
- [ ] Validate input attributes
- [ ] Call cluster_formation() AI function
- [ ] Return results with summary
- [ ] Error handling (400, 403, 422)
#### Step 3.2: Implement keyword generation API endpoint
- [ ] Create POST /api/v1/blueprints/{blueprint_id}/keywords/generate/
- [ ] Validate input & cluster availability
- [ ] Call keyword_generation() AI function
- [ ] Return results with deduplication summary
- [ ] Error handling
#### Step 3.3: Implement keyword supplementation API endpoint
- [ ] Create POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/
- [ ] Support multiple input sources (manual, CSV, library, API)
- [ ] Implement auto-clustering via map_keyword_to_clusters()
- [ ] Implement conflict resolution via resolve_keyword_conflict()
- [ ] Return summary of added, clustered, flagged keywords
#### Step 3.4: Implement blueprint assembly API endpoint
- [ ] Create POST /api/v1/blueprints/{blueprint_id}/assemble/
- [ ] Call blueprint_service.assemble_blueprint()
- [ ] Manage status transitions
- [ ] Return blueprint summary with next steps
#### Step 3.5: Implement read endpoints
- [ ] Create GET /api/v1/blueprints/{blueprint_id}/clusters/?status=draft
- [ ] Create GET /api/v1/blueprints/{blueprint_id}/keywords/?cluster_id=...
- [ ] Implement filtering & pagination
- [ ] Add ordering options
#### Step 3.6: Implement keyword removal endpoint
- [ ] Create DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/
- [ ] Validate blueprint status (only draft)
- [ ] Cascade delete as needed
---
### Phase 4: Integration with 01D & Testing (Week 4-5)
#### Step 4.1: Integrate with Setup Wizard (01D)
- [ ] Call cluster_formation() after user populates attributes
- [ ] Display clusters to user for review (optional: allow edits)
- [ ] Call keyword_generation() if user confirms clusters
- [ ] Display keywords for review
- [ ] Allow manual supplementation before final assembly
#### Step 4.2: End-to-end testing
- [ ] Test full flow: attributes → clusters → keywords → blueprint
- [ ] Test with various sector/site_type combinations
- [ ] Test constraint enforcement
- [ ] Test conflict resolution with real scenarios
- [ ] Performance test with large attribute sets (100+ values)
#### Step 4.3: Integration with 01E (Pipeline Configuration)
- [ ] Verify blueprint is available to pipeline service
- [ ] Test taxonomy plan usage in content generation
- [ ] Test execution_priority ordering in pipeline
---
## 5. Acceptance Criteria
### Cluster Formation AI Function (01C-CF)
- [ ] **CF-1:** Generates all 2-value intersections from populated attributes
- [ ] **CF-2:** Generates relevant 3-value intersections (at least 50% of possible combinations)
- [ ] **CF-3:** AI evaluates each intersection on 5 decision criteria (ecosystem, demand, content support, differentiation, clarity)
- [ ] **CF-4:** Classification assigns correct cluster type (product_category, condition_problem, feature, brand, informational, comparison)
- [ ] **CF-5:** Hub titles are specific, actionable, and 5-12 words long
- [ ] **CF-6:** Supporting content plans contain 5-8 titles, semantically related to hub, covering different angles
- [ ] **CF-7:** Viability scores accurately reflect cluster strength (0-1 scale, with clear rationale)
- [ ] **CF-8:** Hard constraint enforced: max 50 clusters per sector, sorted by viability score
- [ ] **CF-9:** Type distribution meets targets: Product/Service 40-50%, Condition/Problem 20-30%, Feature 10-15%, Brand 5-10%, Life Stage 5-10%
- [ ] **CF-10:** Clusters have 3+ dimensional intersections for strong coherence
- [ ] **CF-11:** No duplicative clusters (semantic coherence check prevents near-duplicates like "Dog Joint Health" + "Dog Arthritis")
- [ ] **CF-12:** API response includes summary with cluster count, type distribution, avg intersection depth
### Keyword Generation AI Function (01C-KG)
- [ ] **KG-1:** Loads keyword templates from SectorAttributeTemplate for correct site_type
- [ ] **KG-2:** Substitutes attribute values into templates to generate base keywords
- [ ] **KG-3:** Generates long-tail variants (best, review, vs, for, how to) for each base keyword
- [ ] **KG-4:** Deduplicates keywords across all clusters (no keyword appears twice)
- [ ] **KG-5:** Global deduplication identifies multi-cluster keywords and reassigns via conflict resolution
- [ ] **KG-6:** Per-cluster keyword count: 10-25 keywords (soft target 15)
- [ ] **KG-7:** Total keyword count: 300-500+ for site (configurable per sector)
- [ ] **KG-8:** Keywords enriched with search volume, difficulty, intent classification
- [ ] **KG-9:** API response includes per-cluster breakdown, deduplication summary, total keyword count
- [ ] **KG-10:** Handles missing attribute values gracefully (skips template if required attrs not present)
### Keyword Conflict Resolution (01C-CR)
- [ ] **CR-1:** Identifies keywords matching multiple clusters (≥2 matches)
- [ ] **CR-2:** Decision Criterion 1: assigns to cluster with most dimensional intersections
- [ ] **CR-3:** Decision Criterion 2 (tiebreaker): assigns to more specific cluster
- [ ] **CR-4:** Decision Criterion 3 (tiebreaker): assigns by primary user intent match
- [ ] **CR-5:** Decision Criterion 4 (last resort): flags for user review with clear reasoning
- [ ] **CR-6:** Reassignment logic preserves keyword integrity (no loss, duplication, or orphaning)
### Blueprint Assembly Service (01C-BA)
- [ ] **BA-1:** Creates SAGBlueprint record with status='draft'
- [ ] **BA-2:** Creates SAGAttribute records from populated attributes (preserves name, values, is_primary flag)
- [ ] **BA-3:** Creates SAGCluster records from cluster formation output (all fields populated)
- [ ] **BA-4:** Creates SAGKeyword records from keyword generation output (all fields preserved)
- [ ] **BA-5:** Associates keywords to clusters via ManyToMany relations
- [ ] **BA-6:** Generates taxonomy_plan with WP categories (primary attributes) and tags (secondary)
- [ ] **BA-7:** Generates execution_priority with 3 phases: hubs first, supporting articles, term pages
- [ ] **BA-8:** Populates denormalized JSON fields (attributes_json, clusters_json) for fast queries
- [ ] **BA-9:** Returns blueprint ID and summary (attribute count, cluster count, keyword count, next steps)
- [ ] **BA-10:** Status transitions correctly: draft → ready_for_pipeline (or intermediate statuses as needed)
### Manual Keyword Supplementation (01C-MKS)
- [ ] **MKS-1:** Users can add keywords via: manual entry, CSV import, library selection, API fetch
- [ ] **MKS-2:** Manual entry accepts comma-separated keywords, validates against duplicates
- [ ] **MKS-3:** CSV import validates file structure (keyword, search_volume optional, difficulty optional)
- [ ] **MKS-4:** Library integration allows browsing & selection per site_type
- [ ] **MKS-5:** Auto-clustering maps new keywords to clusters via attribute similarity matching
- [ ] **MKS-6:** Unmatched keywords flagged for user review: gap analysis, potential new cluster, or outlier
- [ ] **MKS-7:** User can assign unmatched keywords to specific cluster or create new cluster
- [ ] **MKS-8:** API returns summary: added count, auto-clustered count, flagged count, conflicts resolved
### API Endpoints (01C-API)
- [ ] **API-1:** POST /api/v1/blueprints/{blueprint_id}/clusters/form/ returns 200 + cluster results
- [ ] **API-2:** POST /api/v1/blueprints/{blueprint_id}/keywords/generate/ returns 200 + keyword results
- [ ] **API-3:** POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/ returns 200 + supplementation summary
- [ ] **API-4:** POST /api/v1/blueprints/{blueprint_id}/assemble/ returns 200 + blueprint summary
- [ ] **API-5:** GET /api/v1/blueprints/{blueprint_id}/clusters/ supports status, type, min_viability filters
- [ ] **API-6:** GET /api/v1/blueprints/{blueprint_id}/keywords/ supports cluster_id, source, intent, min_search_volume filters
- [ ] **API-7:** DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/ only works on draft blueprints
- [ ] **API-8:** Error handling: 400 (bad input), 403 (unauthorized), 404 (not found), 422 (unprocessable)
### Data Integrity (01C-DI)
- [ ] **DI-1:** No keyword appears in multiple clusters (enforced via unique_together in SAGKeyword)
- [ ] **DI-2:** Deleted clusters cascade-delete associated keywords (no orphaned keywords)
- [ ] **DI-3:** Deleted blueprints cascade-delete all attributes, clusters, keywords
- [ ] **DI-4:** Blueprint status transitions prevent invalid operations (e.g., can't supplement keywords on published blueprint)
- [ ] **DI-5:** Denormalized JSON fields stay in sync with normalized records (updated on every change)
### Performance (01C-PERF)
- [ ] **PERF-1:** Cluster formation completes in <5 seconds for 100+ intersection combinations
- [ ] **PERF-2:** Keyword generation completes in <10 seconds for 50 clusters
- [ ] **PERF-3:** Blueprint assembly completes in <3 seconds (DB writes + JSON generation)
- [ ] **PERF-4:** GET endpoints with filters return results in <2 seconds
- [ ] **PERF-5:** CSV import (1000 keywords) completes in <15 seconds
---
## 6. Claude Code Instructions
### 6.1 Generating Cluster Formation Logic
**Prompt Template for Claude:**
```
Generate the cluster formation algorithm for an AI-powered content planning system.
Input:
- populated_attributes: List of attributes with values from user setup wizard
Example: [
{"name": "Pet Type", "values": ["Dogs", "Cats", "Birds"]},
{"name": "Health Condition", "values": ["Allergies", "Arthritis", "Obesity"]}
]
- sector_context: Information about the sector (e.g., "pet health e-commerce")
Task:
1. Generate all meaningful 2-value intersections (Pet Type × Health Condition, Pet Type × Pet Type, etc.)
2. For each intersection, use Claude's reasoning to evaluate:
- Is this a real topical ecosystem? (do the dimensions naturally fit together?)
- Would users search for this? (assess search demand)
- Can we build 1 hub + 3-8 supporting articles?
- Is it differentiated from other clusters?
3. Classify valid clusters by type: product_category, condition_problem, feature, brand, informational
4. Generate a compelling hub title and 5-8 supporting content titles
5. Assign a viability score (0-1) based on coherence, search demand, content potential
Output:
- clusters: Array of cluster objects with all fields from the spec
- summary: Total clusters, type distribution, viability analysis
Constraints:
- Max 50 clusters per sector
- Minimum 3 dimensional intersections for strong clusters
- Quality over quantity: prefer 5 strong clusters over 15 weak ones
```
### 6.2 Generating Keyword Generation Logic
**Prompt Template for Claude:**
```
Generate keywords for content clusters using templates and AI-driven expansion.
Input:
- clusters: Array of clusters from cluster formation (with dimensions and hub title)
- keyword_templates: Pre-configured templates for site_type
Example: [
"best {health_condition} for {pet_type}",
"{pet_type} {health_condition} treatment",
"affordable {health_condition} relief for {pet_type}"
]
- sector_context: Site type (ecommerce, blog, saas, etc.)
Task:
1. Load keyword templates filtered by sector site_type
2. For each cluster:
- Extract dimension values
- Substitute values into matching templates
- Generate long-tail variants: best, review, vs, for, how to
- Enrich with search volume, difficulty, intent (informational, transactional, etc.)
3. Deduplicate globally across all clusters
4. Identify multi-cluster keywords and resolve conflicts via:
- Highest dimensional intersection count
- Most specific cluster (tiebreaker)
- Primary user intent match (tiebreaker)
5. Validate constraints: 10-25 per cluster, 300-500 total
Output:
- keywords_per_cluster: Keywords organized by cluster ID
- deduplication: Count of duplicates removed, conflicts flagged
- summary: Total unique keywords, per-cluster average, search volume total
Constraints:
- Do NOT generate more than 25 keywords per cluster
- Do NOT allow duplicates
- Prioritize high search volume keywords
- Ensure diversity: mix of base keywords and long-tail variants
```
### 6.3 Integrating with Setup Wizard (01D)
**Implementation Notes:**
1. After user completes attribute population in wizard:
- Call `POST /api/v1/blueprints/{blueprint_id}/clusters/form/`
- Display clusters to user (preview mode)
- Allow user to: review, edit (rename hub titles, remove clusters), or confirm
2. After user confirms clusters:
- Call `POST /api/v1/blueprints/{blueprint_id}/keywords/generate/`
- Display keywords grouped by cluster (preview mode)
- Allow user to: supplement keywords, remove outliers, or confirm
3. Before finalizing blueprint:
- Optionally allow manual keyword supplementation (CSV, library, manual entry)
- Call `POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/` for each source
- Resolve conflicts (auto or manual)
- Call `POST /api/v1/blueprints/{blueprint_id}/assemble/` to finalize
### 6.4 Testing with Sample Data
**Test Case 1: Pet Health E-commerce Site**
```python
populated_attributes = [
{"name": "Pet Type", "values": ["Dogs", "Cats"]},
{"name": "Health Condition", "values": ["Arthritis", "Allergies", "Obesity"]},
{"name": "Target Audience", "values": ["Pet Owners", "Veterinarians"]}
]
sector_context = {
"sector_id": 1, # integer PK (BigAutoField)
"site_type": "ecommerce",
"sector_name": "Pet Health Products"
}
# Expected clusters:
# 1. Dog Arthritis Relief (product_category)
# 2. Cat Allergies Nutrition (product_category)
# 3. Senior Dog Joint Support (life_stage)
# ... etc.
```
**Test Case 2: Local Service (Veterinary Clinic)**
```python
populated_attributes = [
{"name": "Service Type", "values": ["Surgery", "Preventive Care", "Emergency"]},
{"name": "Pet Type", "values": ["Dogs", "Cats", "Exotic"]},
{"name": "Location", "values": ["Downtown", "Suburbs"]}
]
sector_context = {
"sector_id": 2, # integer PK (BigAutoField)
"site_type": "local_service",
"sector_name": "Veterinary Clinic"
}
# Expected clusters:
# 1. Emergency Dog Surgery Downtown (local_service + product_category)
# 2. Preventive Cat Care Suburbs (informational + local_service)
# ... etc.
```
---
## 7. Cross-Document References
### Upstream Dependencies
- **01A (SAG Master Data Models):** Provides SAGBlueprint, SAGAttribute, SAGCluster base models
- **01B (Sector Attribute Templates):** Provides attribute framework, keyword templates, site_type configurations
### Downstream Consumers
- **01D (Setup Wizard):** Triggers cluster formation & keyword generation after attribute population
- **01E (Blueprint-aware Pipeline):** Uses clusters, keywords, taxonomy_plan, execution_priority for content generation
- **01F (Existing Site Analysis):** May feed competitor/existing keywords into supplementation process
- **01G (Health Monitoring):** Tracks cluster completeness, keyword coverage, content generation progress against blueprint
---
## 8. Appendix: Algorithm Complexity & Performance Estimates
### Cluster Formation Complexity
- **Input:** N attributes with M average values each
- **Intersections Generated:** O(M²) for 2-value, O(M³) for 3-value
- **AI Evaluations:** O(M² or M³) function calls (largest cost)
- **Time Estimate:** ~1-2 seconds per 100 intersections (depending on Claude API latency)
- **Bottleneck:** Claude API response time for viability evaluation
### Keyword Generation Complexity
- **Input:** C clusters, T keyword templates per cluster
- **Base Keywords:** O(C × T) (template substitution)
- **Long-tail Variants:** O(C × T × V) where V ≈ 7 (base + 6 variants)
- **Deduplication:** O(K log K) where K = total keywords (sort-based)
- **Time Estimate:** ~3-5 seconds for 300+ keywords
### Blueprint Assembly Complexity
- **DB Writes:** O(A + C + K) where A=attributes, C=clusters, K=keywords
- **JSON Generation:** O(A + C + K) for denormalization
- **Time Estimate:** <1 second for typical blueprints (< 10 MB JSON)
---
**Document Complete**
**Status:** Ready for Development
**Next Step:** Implement Phase 1 (AI Functions) per Section 4