58 KiB
IGNY8 Phase 1: Cluster Formation & Keyword Engine (Doc 01C)
Document Version: 1.0 Date: 2026-03-23 Phase: Phase 1 - Foundation & Intelligence Status: Build Ready
1. Current State
Existing Components
- SAGBlueprint (01A): Data model with status tracking, blueprint lifecycle management
- SAGAttribute & SAGCluster models (01A): Schema definitions for attributes and topic clusters
- SectorAttributeTemplate (01B): Pre-configured attribute framework with keyword templates per site_type
- Setup Wizard (01D): Collects sector, site_type, and populated attribute values from user
- Blueprint Service (01G - earlier iteration): Basic blueprint assembly, denormalization
Current Limitations
- No automated cluster formation from attribute intersection logic
- No keyword generation from templates
- No conflict resolution for multi-cluster keyword assignments
- No cluster type classification (product, condition, feature, etc.)
- No validation of cluster viability (size, coherence, user demand)
- No hub title and supporting content plan generation
Dependencies Ready
- ✅ Sector attribute templates loaded with keyword templates
- ✅ Setup wizard populates attributes
- ✅ Data models support cluster and keyword storage
- ✅ Blueprint lifecycle framework exists
2. What to Build
2.1 Cluster Formation AI Function
File: sag/ai_functions/cluster_formation.py
Register Key: 'form_clusters'
Triggering Context: After user populates attributes in setup wizard; before keyword assignment
Input Contract
{
"populated_attributes": [
{"name": "Target Audience", "values": ["Pet Owners", "Veterinarians"]},
{"name": "Pet Type", "values": ["Dogs", "Cats"]},
{"name": "Health Condition", "values": ["Allergies", "Arthritis", "Obesity"]}
],
"sector_context": {
"sector_id": str,
"site_type": "ecommerce|saas|blog|local_service",
"sector_name": str
},
"constraints": {
"max_clusters": 50, # hard cap per sector
"min_keywords_per_cluster": 5,
"max_keywords_per_cluster": 20,
"optimal_keywords_per_cluster": 7-15
}
}
Output Contract
{
"clusters": [
{
"id": "cluster_001",
"title": "Dog Arthritis Relief Solutions",
"type": "product_category", # or condition_problem, feature, brand, informational, comparison
"dimensions": {
"primary": ["Pet Type: Dogs", "Health Condition: Arthritis"],
"secondary": ["Target Audience: Pet Owners"]
},
"intersection_depth": 3, # count of dimensional intersections
"viability_score": 0.92, # 0-1 based on coherence + demand assessment
"hub_title": "Best Arthritis Treatments for Dogs",
"supporting_content_plan": [
"Senior Dog Arthritis: Causes & Prevention",
"Dog Arthritis Medications: Complete Guide",
"Physical Therapy Exercises for Dogs with Arthritis",
"Diet Changes to Support Joint Health",
"When to See a Vet About Dog Joint Pain"
],
"keywords": [], # populated in keyword generation phase
"dimension_count": 3,
"validation": {
"is_real_topical_ecosystem": true,
"has_search_demand": true,
"can_support_content_plan": true,
"sufficient_differentiation": true
}
},
// ... more clusters
],
"summary": {
"total_clusters_formed": 12,
"type_distribution": {
"product_category": 6,
"condition_problem": 4,
"feature": 1,
"brand": 0,
"informational": 1,
"comparison": 0
},
"avg_intersection_depth": 2.3,
"clusters_below_viability_threshold": 0
}
}
Algorithm (Pseudocode)
FUNCTION form_clusters(populated_attributes, sector_context):
# STEP 1: Generate all 2-value intersections
all_intersections = []
for each attribute_pair in populated_attributes:
for value1 in attribute_pair[0].values:
for value2 in attribute_pair[1].values:
intersection = {
"dimensions": [value1, value2],
"attribute_names": [attribute_pair[0].name, attribute_pair[1].name]
}
all_intersections.append(intersection)
# Also generate 3-value intersections for strong coherence
for attribute_triplet in populated_attributes (size=3):
for value1 in attribute_triplet[0].values:
for value2 in attribute_triplet[1].values:
for value3 in attribute_triplet[2].values:
intersection = {
"dimensions": [value1, value2, value3],
"attribute_names": [name[0], name[1], name[2]]
}
all_intersections.append(intersection)
# STEP 2: AI evaluates each intersection
valid_clusters = []
for intersection in all_intersections:
evaluation = AI_EVALUATE_INTERSECTION(intersection, sector_context):
- Is this a real topical ecosystem?
- Would users search for this combination?
- Can we build a hub + 3-10 supporting articles?
- Is there sufficient differentiation from other clusters?
- Does the combination make semantic sense?
if evaluation.is_valid:
# STEP 3: Classify cluster type
cluster_type = AI_CLASSIFY_TYPE(intersection)
→ product_category, condition_problem, feature, brand,
informational, comparison
# STEP 4: Generate hub title + supporting content plan
hub_title = AI_GENERATE_HUB_TITLE(intersection, sector_context)
supporting_titles = AI_GENERATE_SUPPORTING_TITLES(
hub_title,
intersection,
count=5-8
)
# Create cluster object
cluster = {
"dimensions": intersection.dimensions,
"type": cluster_type,
"viability_score": evaluation.confidence_score,
"hub_title": hub_title,
"supporting_content_plan": supporting_titles,
"validation": evaluation
}
valid_clusters.append(cluster)
# STEP 4: Apply constraints & filtering
sorted_clusters = SORT_BY_VIABILITY_SCORE(valid_clusters)
final_clusters = sorted_clusters[0:max_clusters]
# STEP 5: Validate distribution & completeness
distribution = CALCULATE_TYPE_DISTRIBUTION(final_clusters)
# Flag if any type is severely under-represented
if distribution.imbalance > THRESHOLD:
LOG_WARNING("Type distribution may be suboptimal")
# STEP 6: Return with summary
return {
"clusters": final_clusters,
"summary": {
"total_clusters": len(final_clusters),
"type_distribution": distribution,
"viability_threshold_met": all clusters have score >= 0.70
}
}
END FUNCTION
AI Evaluation Criteria
For each intersection, the AI must answer:
-
Real Topical Ecosystem?
- Do the dimensions naturally connect in user intent?
- Is there an existing product/service/solution category?
- Example: YES - "Dog Arthritis Relief" (real problem + real solutions)
- Example: NO - "Vegetarian Chainsaw" (nonsensical combination)
-
User Search Demand?
- Would users actively search for this combination?
- Check: keyword templates, search volume patterns, user forums
- Target: ≥500 monthly searches for hub keyword
-
Content Support?
- Can we create 1 hub + 3-10 supporting articles?
- Is there enough subtopic depth?
- Example: YES - "Dog Arthritis" can have medication, exercise, diet, vet visits
- Example: NO - "Red Dog Collar" (too niche, limited subtopics)
-
Sufficient Differentiation?
- Does this cluster stand apart from others?
- Avoid near-duplicate clusters (e.g., "Dog Joint Health" vs "Dog Arthritis")
- Decision: merge or reject the weaker one
-
Dimensional Clarity
- Do all dimensions contribute meaningfully?
- Remove secondary dimensions that don't add coherence
Hard Constraints
- Maximum Clusters: 50 per sector (enforce in sorting/filtering)
- Minimum Keywords per Cluster: 5 (checked in keyword generation)
- Maximum Keywords per Cluster: 20 (checked in keyword generation)
- Optimal Range: 7-15 keywords per cluster
- No Keyword Duplication: Each keyword in exactly one cluster (enforced in conflict resolution)
- Type Distribution Target:
- Product/Service Type: 40-50%
- Condition/Problem: 20-30%
- Feature: 10-15%
- Brand: 5-10%
- Life Stage/Audience: 5-10%
2.2 Keyword Auto-Generation AI Function
File: sag/ai_functions/keyword_generation.py
Register Key: 'generate_keywords'
Triggering Context: After cluster formation; before blueprint assembly
Input Contract
{
"clusters": [ # output from cluster_formation
{
"id": "cluster_001",
"dimensions": ["Pet Type: Dogs", "Health Condition: Arthritis"],
"hub_title": "Best Arthritis Treatments for Dogs",
"supporting_content_plan": [...]
}
],
"sector_context": {
"sector_id": str,
"site_type": "ecommerce|saas|blog|local_service",
"site_intent": "sell|inform|book|download"
},
"keyword_templates": { # loaded from SectorAttributeTemplate
"template_001": "best {health_condition} for {pet_type}",
"template_002": "{pet_type} {health_condition} treatment",
// ... more templates
},
"constraints": {
"min_keywords_per_cluster": 10,
"max_keywords_per_cluster": 25,
"total_target": "300-500"
}
}
Output Contract
{
"keywords_per_cluster": {
"cluster_001": {
"keywords": [
{
"keyword": "best arthritis treatment for dogs",
"search_volume": 1200,
"difficulty": "medium",
"intent": "informational",
"generated_from": "template_001",
"variant_type": "long_tail"
},
{
"keyword": "dog arthritis remedies",
"search_volume": 800,
"difficulty": "easy",
"intent": "informational",
"generated_from": "template_002",
"variant_type": "base"
},
// ... 13-23 more keywords
],
"keyword_count": 15,
"primary_intent": "informational",
"search_volume_total": 12500
}
},
"deduplication": {
"duplicates_removed": 8,
"flagged_conflicts": 3 # keywords fitting multiple clusters
},
"summary": {
"total_unique_keywords": 342,
"per_cluster_avg": 14.25,
"total_search_volume": 892000,
"within_constraints": true
}
}
Algorithm (Pseudocode)
FUNCTION generate_keywords(clusters, sector_context, keyword_templates):
all_keywords = {}
FOR EACH cluster IN clusters:
# STEP 1: Extract attribute values from cluster dimensions
attribute_values = EXTRACT_ATTRIBUTE_VALUES(cluster.dimensions)
# Output: {"Pet Type": "Dogs", "Health Condition": "Arthritis", ...}
cluster_keywords = []
# STEP 2: Substitute values into templates
FOR EACH template IN keyword_templates:
# Check if template requires all attribute values present
required_attrs = PARSE_TEMPLATE_VARIABLES(template)
if ALL_ATTRS_AVAILABLE(required_attrs, attribute_values):
# Substitute values
base_keyword = SUBSTITUTE_VALUES(template, attribute_values)
cluster_keywords.append({
"keyword": base_keyword,
"generated_from": template.id,
"variant_type": "base"
})
# STEP 3: Generate long-tail variants
long_tail_variants = []
FOR EACH base_keyword IN cluster_keywords:
# "best arthritis treatment for dogs"
variants = []
# Variant: Add "best"
variants.append("best " + base_keyword)
# Variant: Add "review"
variants.append(base_keyword + " review")
# Variant: Add "vs" (comparison)
if CLUSTER_TYPE in [product_category, comparison]:
variants.append(base_keyword + " vs alternatives")
# Variant: Add "for" (audience)
variants.append(base_keyword + " for seniors")
# Variant: Add "how to"
variants.append("how to " + base_keyword)
# Variant: Add "cost" (ecommerce intent)
if site_intent == "sell":
variants.append(base_keyword + " cost")
FOR EACH variant IN variants:
if NOT_DUPLICATE(variant, cluster_keywords):
cluster_keywords.append({
"keyword": variant,
"variant_type": "long_tail",
"parent": base_keyword
})
# STEP 4: Enrich keywords with metadata
enriched_keywords = []
FOR EACH kw IN cluster_keywords:
enriched = {
"keyword": kw.keyword,
"search_volume": ESTIMATE_SEARCH_VOLUME(kw.keyword, sector),
"difficulty": ESTIMATE_DIFFICULTY(kw.keyword, sector),
"intent": CLASSIFY_INTENT(kw.keyword), # informational, transactional, navigational
"generated_from": kw.generated_from,
"variant_type": kw.variant_type
}
enriched_keywords.append(enriched)
# STEP 5: Filter & sort
filtered_keywords = SORT_BY_SEARCH_VOLUME(enriched_keywords)
# Keep top 10-25 per cluster
cluster_keywords_final = filtered_keywords[0:25]
# Validate minimum
if LEN(cluster_keywords_final) < 10:
ADD_SUPPLEMENTARY_KEYWORDS(cluster_keywords_final, 5)
all_keywords[cluster.id] = {
"keywords": cluster_keywords_final,
"keyword_count": len(cluster_keywords_final),
"primary_intent": MODE(intent from all keywords),
"search_volume_total": SUM(all search volumes)
}
# STEP 6: Global deduplication
all_keywords_flat = FLATTEN(all_keywords)
duplicates = FIND_DUPLICATES(all_keywords_flat)
FOR EACH duplicate_set IN duplicates:
primary_cluster = PRIMARY_CLUSTER(duplicate_set) # best fit by dimensions
REASSIGN_DUPLICATES_TO_PRIMARY(duplicate_set, primary_cluster)
# STEP 7: Validate constraints
total_keywords = SUM(keyword_count for each cluster)
validation = {
"within_min_per_cluster": all clusters >= 10,
"within_max_per_cluster": all clusters <= 25,
"total_within_target": total_keywords between 300-500,
"no_duplicates": len(duplicates) == 0
}
if NOT validation.all_true:
LOG_WARNING("Keyword generation constraints not fully met")
# STEP 8: Return results
return {
"keywords_per_cluster": all_keywords,
"deduplication": {
"duplicates_removed": len(duplicates),
"flagged_conflicts": identify_multi_cluster_fits()
},
"summary": {
"total_unique_keywords": total_keywords,
"per_cluster_avg": total_keywords / len(clusters),
"total_search_volume": sum of all volumes,
"within_constraints": validation.all_true
}
}
END FUNCTION
Keyword Template Structure (from SectorAttributeTemplate, 01B)
# Example for Pet Health ecommerce site
keyword_templates = {
"site_type": "ecommerce",
"templates": [
{
"id": "template_001",
"pattern": "best {health_condition} treatment for {pet_type}",
"weight": 5, # prioritize this template
"min_required_attrs": ["health_condition", "pet_type"]
},
{
"id": "template_002",
"pattern": "{pet_type} {health_condition} medication",
"weight": 4,
"min_required_attrs": ["pet_type", "health_condition"]
},
{
"id": "template_003",
"pattern": "affordable {health_condition} relief for {pet_type}",
"weight": 3,
"min_required_attrs": ["health_condition", "pet_type"]
},
// ... more templates
]
}
Long-tail Variant Rules
| Variant Type | Pattern | Use Case | Example |
|---|---|---|---|
| Base | {keyword} | All clusters | "dog arthritis relief" |
| Best/Top | best {keyword} | All clusters | "best dog arthritis relief" |
| Review | {keyword} review | Product clusters | "arthritis supplement for dogs review" |
| Comparison | {keyword} vs | Comparison intent | "arthritis medication vs supplement for dogs" |
| Audience | {keyword} for {audience} | Audience-specific | "dog arthritis relief for senior dogs" |
| How-to | how to {verb} {keyword} | Problem-solution | "how to manage dog arthritis" |
| Cost/Price | {keyword} cost | Ecommerce intent | "arthritis treatment for dogs cost" |
| Quick | {keyword} fast | Urgency-driven | "fast arthritis relief for dogs" |
2.3 Blueprint Assembly Service
File: sag/services/blueprint_service.py
Primary Function: assemble_blueprint(site, attributes, clusters, keywords)
Triggering Context: After keyword generation; creates SAGBlueprint (status=draft)
Input Contract
assemble_blueprint(
site: Website, # from 01A
attributes: List[Tuple[name, values]], # user-populated
clusters: List[Dict], # from cluster_formation()
keywords: Dict[cluster_id, List[Dict]] # from generate_keywords()
)
Execution Steps
-
Create SAGBlueprint Record
blueprint = SAGBlueprint.objects.create( site=site, status='draft', phase='phase_1_foundation', sector_id=site.sector_id, created_by=current_user, metadata={ 'version': '1.0', 'created_date': now(), 'last_modified': now() } ) -
Create SAGAttribute Records
FOR EACH (attribute_name, values) IN attributes: attribute = SAGAttribute.objects.create( blueprint=blueprint, name=attribute_name, values=values, # stored as JSON array is_primary=DETERMINE_PRIMACY(attribute_name, site.site_type), source='user_input' ) -
Create SAGCluster Records from Formed Clusters
FOR EACH cluster IN clusters: db_cluster = SAGCluster.objects.create( blueprint=blueprint, cluster_key=cluster['id'], title=cluster['hub_title'], description=GENERATE_CLUSTER_DESC(cluster), cluster_type=cluster['type'], dimensions=cluster['dimensions'], # JSON intersection_depth=cluster['intersection_depth'], viability_score=cluster['viability_score'], hub_title=cluster['hub_title'], supporting_content_plan=cluster['supporting_content_plan'], # JSON array status='draft', keyword_count=0 # updated in next step ) -
Populate auto_generated_keywords on Each Cluster
FOR EACH (cluster_id, keyword_list) IN keywords.items(): cluster = SAGCluster.objects.get(cluster_key=cluster_id) keyword_records = [] FOR EACH kw_data IN keyword_list: keyword = SAGKeyword.objects.create( cluster=cluster, keyword_text=kw_data['keyword'], search_volume=kw_data['search_volume'], difficulty=kw_data['difficulty'], intent=kw_data['intent'], generated_from=kw_data['generated_from'], variant_type=kw_data['variant_type'], source='auto_generated' ) keyword_records.append(keyword) cluster.auto_generated_keywords.set(keyword_records) cluster.keyword_count = len(keyword_records) cluster.save() -
Generate Taxonomy Plan
taxonomy_plan = { 'wp_categories': [], 'wp_tags': [], 'hierarchy': {} } FOR EACH attribute IN blueprint.sagattribute_set.all(): if attribute.is_primary: category = { 'name': attribute.name, 'slug': slugify(attribute.name), 'description': f"Posts about {attribute.name}" } taxonomy_plan['wp_categories'].append(category) else: tag = { 'name': v, 'slug': slugify(v), 'parent_category': primary_attr_name } FOR EACH v IN attribute.values: taxonomy_plan['wp_tags'].append(tag) blueprint.taxonomy_plan = taxonomy_plan # JSON field -
Generate Execution Priority (Phased Approach)
execution_priority = { 'phase': 'phase_1_hubs', 'content_sequence': [] } # Phase 1: Hub pages (1 per cluster) hub_items = [] FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'): hub_items.append({ 'type': 'hub_page', 'cluster_id': cluster.id, 'title': cluster.hub_title, 'priority': 1, 'estimated_effort': 'high', 'SEO_impact': 'critical' }) execution_priority['content_sequence'].extend(hub_items) # Phase 2: Supporting content (5-8 articles per cluster) supporting_items = [] FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'): FOR EACH content_title IN cluster.supporting_content_plan: supporting_items.append({ 'type': 'supporting_article', 'cluster_id': cluster.id, 'parent_hub': cluster.hub_title, 'title': content_title, 'priority': 2, 'estimated_effort': 'medium', 'SEO_impact': 'supporting' }) execution_priority['content_sequence'].extend(supporting_items) # Phase 3: Term/pillar pages (keywords + long-tail) term_items = [] FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'): FOR EACH keyword IN cluster.auto_generated_keywords.all(): term_items.append({ 'type': 'term_page', 'cluster_id': cluster.id, 'keyword': keyword.keyword_text, 'priority': 3, 'estimated_effort': 'low', 'SEO_impact': 'supportive' }) execution_priority['content_sequence'].extend(term_items) blueprint.execution_priority = execution_priority # JSON field -
Populate Denormalized JSON Fields
blueprint.attributes_json = { 'total_attributes': blueprint.sagattribute_set.count(), 'summary': [ { 'name': attr.name, 'value_count': len(attr.values), 'values': attr.values, 'is_primary': attr.is_primary } FOR EACH attr IN blueprint.sagattribute_set.all() ] } blueprint.clusters_json = { 'total_clusters': blueprint.sagcluster_set.count(), 'summary': [ { 'id': cluster.cluster_key, 'title': cluster.title, 'type': cluster.cluster_type, 'keyword_count': cluster.keyword_count, 'viability_score': cluster.viability_score } FOR EACH cluster IN blueprint.sagcluster_set.all() ] } blueprint.save() -
Return Blueprint ID & Status
return { 'blueprint_id': blueprint.id, 'status': 'draft', 'created_at': blueprint.created_at, 'summary': { 'total_attributes': blueprint.sagattribute_set.count(), 'total_clusters': blueprint.sagcluster_set.count(), 'total_keywords': SAGKeyword.objects.filter(cluster__blueprint=blueprint).count(), 'next_step': 'review blueprint in 01E (Pipeline Configuration)' } }
2.4 Manual Keyword Supplementation (User Interface)
Feature: Add Keywords from Multiple Sources
-
IGNY8 Library Integration
- Users browse pre-curated keyword library per site_type
- Select keywords → auto-map to clusters by attribute match
- Unmatched keywords → flagged for review
-
Manual Entry
- Form field: paste or type keywords (comma-separated)
- System deduplicates against existing
- Prompts user to assign to cluster(s)
-
CSV Import
- Upload CSV with columns: keyword, search_volume (optional), difficulty (optional)
- Preview & validate before import
- Bulk assign to clusters or mark for review
-
Keyword API Integration (optional in Phase 1)
- Connect to SEMrush, Ahrefs, or similar
- Fetch keyword suggestions for cluster dimensions
- User approves additions
Keyword Mapping Logic
FUNCTION map_keyword_to_clusters(new_keyword, clusters, threshold=0.70):
matches = []
FOR EACH cluster IN clusters:
# Extract all attribute values from cluster dimensions
cluster_attrs = EXTRACT_ATTRIBUTES(cluster.dimensions)
# Calculate semantic similarity
similarity = CALCULATE_SIMILARITY(new_keyword, cluster_attrs)
if similarity > threshold:
matches.append({
'cluster_id': cluster.id,
'cluster_title': cluster.title,
'similarity_score': similarity
})
return matches # May be 0, 1, or multiple matches
END FUNCTION
Conflict Resolution: Multi-Cluster Keyword Assignment
Problem: A keyword fits multiple clusters (e.g., "arthritis relief for pets" fits both Dog Cluster and Cat Cluster)
Resolution Algorithm:
-
Identify Multi-Fit Keywords
potential_conflicts = [] FOR EACH new_keyword IN keywords_to_add: matching_clusters = map_keyword_to_clusters(new_keyword, all_clusters) if len(matching_clusters) > 1: potential_conflicts.append({ 'keyword': new_keyword, 'matching_clusters': matching_clusters }) -
Apply Decision Criteria (in order)
-
Criterion 1: Dimensional Intersection Count
- Assign to cluster with MOST dimensional intersections
- Example: "dog arthritis relief" → Dog cluster has 3 dimensions (pet type, condition, audience); Cat cluster has 2 → assign to Dog cluster
-
Criterion 2: Specificity
- If tied on intersection count, assign to MORE SPECIFIC cluster
- Example: "arthritis relief" (general) vs "dog arthritis relief" (specific) → assign to Dog cluster
-
Criterion 3: Primary User Intent Match
- If still tied, assign to cluster whose hub_title best matches user intent
- Example: Both Dog & Cat clusters have "arthritis relief" hub; Dog hub is "Best Arthritis Treatments for Dogs" → assign to Dog
-
Criterion 4: Last Resort - Create New Cluster
- If keyword doesn't fit any cluster well, flag as "potential_new_cluster"
- User reviews and decides: split existing cluster, merge, or create new
-
-
Implementation
FUNCTION resolve_keyword_conflict(keyword, matching_clusters): # Step 1: Compare intersection depth sorted_by_depth = SORT_BY(matching_clusters, 'intersection_depth', DESC) best_by_depth = sorted_by_depth[0] if sorted_by_depth[0].intersection_depth > sorted_by_depth[1].intersection_depth: return best_by_depth # Step 2: Compare specificity specificity_scores = [CALC_SPECIFICITY(cluster, keyword) for cluster in sorted_by_depth] best_by_specificity = sorted_by_depth[ARGMAX(specificity_scores)] if specificity_scores[0] > specificity_scores[1]: return best_by_specificity # Step 3: Compare intent match intent_scores = [CALC_INTENT_MATCH(cluster.hub_title, keyword) for cluster in sorted_by_depth] best_by_intent = sorted_by_depth[ARGMAX(intent_scores)] if intent_scores[0] > intent_scores[1]: return best_by_intent # Step 4: Flag for user review return { 'status': 'flagged_for_review', 'keyword': keyword, 'candidates': matching_clusters, 'reason': 'ambiguous_assignment' } END FUNCTION
3. Data Models / APIs
3.1 Database Models (Django ORM)
SAGBlueprint (existing from 01A, extended)
class SAGBlueprint(models.Model):
STATUS_CHOICES = (
('draft', 'Draft'),
('cluster_formation_complete', 'Cluster Formation Complete'),
('keyword_generation_complete', 'Keyword Generation Complete'),
('keyword_supplemented', 'Keywords Supplemented'),
('ready_for_pipeline', 'Ready for Pipeline'),
('published', 'Published'),
)
site = models.ForeignKey(Website, on_delete=models.CASCADE)
status = models.CharField(max_length=50, choices=STATUS_CHOICES, default='draft')
phase = models.CharField(max_length=50, default='phase_1_foundation')
sector_id = models.CharField(max_length=100)
# Denormalized JSON for fast access
attributes_json = models.JSONField(default=dict, blank=True)
clusters_json = models.JSONField(default=dict, blank=True)
taxonomy_plan = models.JSONField(default=dict, blank=True)
execution_priority = models.JSONField(default=dict, blank=True)
created_by = models.ForeignKey(User, on_delete=models.SET_NULL, null=True)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
class Meta:
db_table = 'sag_blueprint'
ordering = ['-created_at']
SAGAttribute (existing from 01A, no changes required)
class SAGAttribute(models.Model):
blueprint = models.ForeignKey(SAGBlueprint, on_delete=models.CASCADE)
name = models.CharField(max_length=255)
values = models.JSONField() # array of strings
is_primary = models.BooleanField(default=False)
source = models.CharField(max_length=50) # 'user_input', 'template', 'api'
created_at = models.DateTimeField(auto_now_add=True)
class Meta:
db_table = 'sag_attribute'
unique_together = ('blueprint', 'name')
SAGCluster (existing from 01A, extended)
class SAGCluster(models.Model):
TYPE_CHOICES = (
('product_category', 'Product/Service Category'),
('condition_problem', 'Condition/Problem'),
('feature', 'Feature'),
('brand', 'Brand'),
('informational', 'Informational'),
('comparison', 'Comparison'),
('life_stage', 'Life Stage/Audience'),
)
STATUS_CHOICES = (
('draft', 'Draft'),
('validated', 'Validated'),
('keyword_assigned', 'Keywords Assigned'),
('content_created', 'Content Created'),
)
blueprint = models.ForeignKey(SAGBlueprint, on_delete=models.CASCADE)
cluster_key = models.CharField(max_length=100) # unique ID from cluster formation
title = models.CharField(max_length=255)
description = models.TextField(blank=True)
cluster_type = models.CharField(max_length=50, choices=TYPE_CHOICES)
dimensions = models.JSONField() # ["dimension1", "dimension2", ...]
intersection_depth = models.IntegerField() # count of intersecting dimensions
viability_score = models.FloatField() # 0-1
hub_title = models.CharField(max_length=255)
supporting_content_plan = models.JSONField() # array of content titles
auto_generated_keywords = models.ManyToManyField(
'SAGKeyword',
related_name='clusters_auto',
blank=True
)
supplemented_keywords = models.ManyToManyField(
'SAGKeyword',
related_name='clusters_supplemented',
blank=True
)
keyword_count = models.IntegerField(default=0)
status = models.CharField(max_length=50, choices=STATUS_CHOICES, default='draft')
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
class Meta:
db_table = 'sag_cluster'
unique_together = ('blueprint', 'cluster_key')
ordering = ['-viability_score']
SAGKeyword (new)
class SAGKeyword(models.Model):
INTENT_CHOICES = (
('informational', 'Informational'),
('transactional', 'Transactional'),
('navigational', 'Navigational'),
('commercial', 'Commercial Intent'),
)
VARIANT_TYPES = (
('base', 'Base Keyword'),
('long_tail', 'Long-tail Variant'),
('brand', 'Brand Variant'),
('comparison', 'Comparison'),
('review', 'Review'),
('how_to', 'How-to'),
)
SOURCE_CHOICES = (
('auto_generated', 'Auto-Generated'),
('manual_entry', 'Manual Entry'),
('csv_import', 'CSV Import'),
('api_fetch', 'API Fetch'),
('library', 'IGNY8 Library'),
)
cluster = models.ForeignKey(
SAGCluster,
on_delete=models.CASCADE,
related_name='all_keywords'
)
keyword_text = models.CharField(max_length=255)
search_volume = models.IntegerField(null=True, blank=True)
difficulty = models.CharField(max_length=50, blank=True) # 'easy', 'medium', 'hard'
intent = models.CharField(max_length=50, choices=INTENT_CHOICES)
generated_from = models.CharField(max_length=100, blank=True) # template ID or source
variant_type = models.CharField(max_length=50, choices=VARIANT_TYPES)
source = models.CharField(max_length=50, choices=SOURCE_CHOICES)
cpc = models.FloatField(null=True, blank=True) # if available from API
competition = models.CharField(max_length=50, blank=True) # 'low', 'medium', 'high'
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
class Meta:
db_table = 'sag_keyword'
unique_together = ('cluster', 'keyword_text')
ordering = ['-search_volume']
3.2 API Endpoints
POST /api/v1/blueprints/{blueprint_id}/clusters/form/
Purpose: Trigger cluster formation AI function Authentication: Required (JWT) Input:
{
"populated_attributes": [
{"name": "Pet Type", "values": ["Dogs", "Cats"]},
{"name": "Health Condition", "values": ["Allergies", "Arthritis"]}
],
"max_clusters": 50
}
Output:
{
"clusters": [...],
"summary": {
"total_clusters_formed": 12,
"type_distribution": {...}
},
"status": "success"
}
Error Cases:
- 400: Invalid attributes structure
- 403: Unauthorized (wrong blueprint owner)
- 422: Insufficient attributes for cluster formation (< 2 dimensions)
POST /api/v1/blueprints/{blueprint_id}/keywords/generate/
Purpose: Trigger keyword generation AI function Authentication: Required Input:
{
"use_cluster_ids": ["cluster_001", "cluster_002"],
"target_keywords_per_cluster": 15,
"include_long_tail_variants": true
}
Output:
{
"keywords_per_cluster": {...},
"deduplication": {
"duplicates_removed": 5
},
"summary": {
"total_unique_keywords": 180,
"within_constraints": true
}
}
POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/
Purpose: Add manual, CSV, library, or API-sourced keywords Authentication: Required Input (Multiple Scenarios):
Scenario 1: Manual Entry
{
"source": "manual_entry",
"keywords": ["arthritis relief dogs", "joint pain dogs"],
"cluster_id": "cluster_001"
}
Scenario 2: CSV Import
{
"source": "csv_import",
"csv_url": "https://example.com/keywords.csv",
"auto_cluster": true
}
Scenario 3: Library Selection
{
"source": "library",
"library_keyword_ids": [123, 456, 789],
"auto_cluster": true
}
Output:
{
"added_keywords": 10,
"auto_clustered": 9,
"flagged_for_review": 1,
"conflicts_resolved": {
"reassigned": 2,
"deferred": 1
}
}
POST /api/v1/blueprints/{blueprint_id}/assemble/
Purpose: Trigger blueprint assembly (create final SAGBlueprint with all records) Authentication: Required Input:
{
"finalize_keyword_review": true,
"set_status": "ready_for_pipeline"
}
Output:
{
"blueprint_id": 42,
"status": "ready_for_pipeline",
"summary": {
"total_attributes": 4,
"total_clusters": 12,
"total_keywords": 180,
"execution_priority_phases": 3
}
}
GET /api/v1/blueprints/{blueprint_id}/clusters/?status=draft&type=product_category
Purpose: List clusters with filtering Query Params:
status: draft, validated, keyword_assigned, content_createdtype: product_category, condition_problem, feature, brand, informational, comparisonmin_viability: 0.70limit: 50,offset: 0
Output:
{
"results": [
{
"id": 1,
"cluster_key": "cluster_001",
"title": "Dog Arthritis Relief Solutions",
"hub_title": "Best Arthritis Treatments for Dogs",
"keyword_count": 15,
"viability_score": 0.92,
"type": "product_category"
}
],
"total_count": 12,
"total_keywords": 180
}
GET /api/v1/blueprints/{blueprint_id}/keywords/?cluster_id=cluster_001&source=auto_generated
Purpose: List keywords for a cluster Query Params:
cluster_id: filter by clustersource: auto_generated, manual_entry, csv_import, api_fetch, libraryintent: informational, transactional, navigationalmin_search_volume: 100order_by: search_volume (DESC), difficulty, intent
Output:
{
"results": [
{
"id": 1,
"keyword_text": "best arthritis treatment for dogs",
"search_volume": 1200,
"difficulty": "medium",
"intent": "informational",
"variant_type": "long_tail",
"source": "auto_generated"
}
],
"total_count": 15
}
DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/
Purpose: Remove a keyword (before assembly) Authentication: Required Status: Only available if blueprint.status='draft' or 'keyword_generation_complete'
4. Implementation Steps
Phase 1: AI Functions Development (Week 1-2)
Step 1.1: Set up cluster_formation.py structure
- Create
sag/ai_functions/cluster_formation.py - Define input/output contracts
- Implement intersection generation logic (2-value, 3-value)
- Stub out AI evaluation function (ready for Claude integration)
- Implement constraint filtering & sorting
Step 1.2: Implement cluster formation AI logic
- Integrate Claude AI API for cluster viability evaluation
- Real topical ecosystem check
- User search demand validation
- Content support assessment
- Differentiation evaluation
- Implement cluster type classification (using embeddings or rule-based logic)
- Implement hub title & supporting content plan generation
- Add viability scoring (0-1 scale)
- Implement distribution validation
Step 1.3: Unit tests for cluster formation
- Test intersection generation (2-value, 3-value)
- Test AI evaluation with mock responses
- Test constraint filtering (max 50 clusters)
- Test type distribution analysis
- Test handling of edge cases (0 intersections, all rejected, etc.)
Step 1.4: Create keyword_generation.py structure
- Create
sag/ai_functions/keyword_generation.py - Define input/output contracts
- Implement template substitution logic
- Implement long-tail variant generation
- Implement deduplication logic
Step 1.5: Implement keyword generation AI logic
- Integrate template loading from SectorAttributeTemplate (01B)
- Implement keyword enrichment (search volume, difficulty, intent)
- Implement filtering & sorting by search volume
- Implement constraint validation (10-25 per cluster, 300-500 total)
- Implement global deduplication & conflict resolution
Step 1.6: Unit tests for keyword generation
- Test template substitution with various attribute combinations
- Test long-tail variant generation
- Test deduplication across clusters
- Test constraint validation
- Test conflict resolution (multi-cluster keywords)
Phase 2: Data Models & Service Layer (Week 2-3)
Step 2.1: Database migrations
- Create SAGKeyword model
- Add ManyToMany relations to SAGCluster (auto_generated_keywords, supplemented_keywords)
- Extend SAGBlueprint with denormalized JSON fields (attributes_json, clusters_json, taxonomy_plan, execution_priority)
- Extend SAGCluster with cluster_key, type, intersection_depth, viability_score, hub_title, supporting_content_plan
- Run and test migrations on dev database
Step 2.2: Implement blueprint_service.py
- Create
sag/services/blueprint_service.py - Implement assemble_blueprint() function with 8 steps
- Implement SAGBlueprint creation & status management
- Implement SAGAttribute creation from user input
- Implement SAGCluster creation from cluster formation results
- Implement SAGKeyword creation & assignment
- Implement taxonomy_plan generation
- Implement execution_priority generation
- Implement denormalized JSON population
Step 2.3: Unit tests for blueprint_service
- Test blueprint creation & status transitions
- Test attribute record creation
- Test cluster record creation with all fields
- Test keyword assignment to clusters
- Test taxonomy plan generation
- Test execution priority generation
- Test denormalized JSON accuracy
Phase 3: API Endpoints & Integration (Week 3-4)
Step 3.1: Implement cluster formation API endpoint
- Create POST /api/v1/blueprints/{blueprint_id}/clusters/form/
- Validate input attributes
- Call cluster_formation() AI function
- Return results with summary
- Error handling (400, 403, 422)
Step 3.2: Implement keyword generation API endpoint
- Create POST /api/v1/blueprints/{blueprint_id}/keywords/generate/
- Validate input & cluster availability
- Call keyword_generation() AI function
- Return results with deduplication summary
- Error handling
Step 3.3: Implement keyword supplementation API endpoint
- Create POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/
- Support multiple input sources (manual, CSV, library, API)
- Implement auto-clustering via map_keyword_to_clusters()
- Implement conflict resolution via resolve_keyword_conflict()
- Return summary of added, clustered, flagged keywords
Step 3.4: Implement blueprint assembly API endpoint
- Create POST /api/v1/blueprints/{blueprint_id}/assemble/
- Call blueprint_service.assemble_blueprint()
- Manage status transitions
- Return blueprint summary with next steps
Step 3.5: Implement read endpoints
- Create GET /api/v1/blueprints/{blueprint_id}/clusters/?status=draft
- Create GET /api/v1/blueprints/{blueprint_id}/keywords/?cluster_id=...
- Implement filtering & pagination
- Add ordering options
Step 3.6: Implement keyword removal endpoint
- Create DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/
- Validate blueprint status (only draft)
- Cascade delete as needed
Phase 4: Integration with 01D & Testing (Week 4-5)
Step 4.1: Integrate with Setup Wizard (01D)
- Call cluster_formation() after user populates attributes
- Display clusters to user for review (optional: allow edits)
- Call keyword_generation() if user confirms clusters
- Display keywords for review
- Allow manual supplementation before final assembly
Step 4.2: End-to-end testing
- Test full flow: attributes → clusters → keywords → blueprint
- Test with various sector/site_type combinations
- Test constraint enforcement
- Test conflict resolution with real scenarios
- Performance test with large attribute sets (100+ values)
Step 4.3: Integration with 01E (Pipeline Configuration)
- Verify blueprint is available to pipeline service
- Test taxonomy plan usage in content generation
- Test execution_priority ordering in pipeline
5. Acceptance Criteria
Cluster Formation AI Function (01C-CF)
- CF-1: Generates all 2-value intersections from populated attributes
- CF-2: Generates relevant 3-value intersections (at least 50% of possible combinations)
- CF-3: AI evaluates each intersection on 5 decision criteria (ecosystem, demand, content support, differentiation, clarity)
- CF-4: Classification assigns correct cluster type (product_category, condition_problem, feature, brand, informational, comparison)
- CF-5: Hub titles are specific, actionable, and 5-12 words long
- CF-6: Supporting content plans contain 5-8 titles, semantically related to hub, covering different angles
- CF-7: Viability scores accurately reflect cluster strength (0-1 scale, with clear rationale)
- CF-8: Hard constraint enforced: max 50 clusters per sector, sorted by viability score
- CF-9: Type distribution meets targets: Product/Service 40-50%, Condition/Problem 20-30%, Feature 10-15%, Brand 5-10%, Life Stage 5-10%
- CF-10: Clusters have 3+ dimensional intersections for strong coherence
- CF-11: No duplicative clusters (semantic coherence check prevents near-duplicates like "Dog Joint Health" + "Dog Arthritis")
- CF-12: API response includes summary with cluster count, type distribution, avg intersection depth
Keyword Generation AI Function (01C-KG)
- KG-1: Loads keyword templates from SectorAttributeTemplate for correct site_type
- KG-2: Substitutes attribute values into templates to generate base keywords
- KG-3: Generates long-tail variants (best, review, vs, for, how to) for each base keyword
- KG-4: Deduplicates keywords across all clusters (no keyword appears twice)
- KG-5: Global deduplication identifies multi-cluster keywords and reassigns via conflict resolution
- KG-6: Per-cluster keyword count: 10-25 keywords (soft target 15)
- KG-7: Total keyword count: 300-500+ for site (configurable per sector)
- KG-8: Keywords enriched with search volume, difficulty, intent classification
- KG-9: API response includes per-cluster breakdown, deduplication summary, total keyword count
- KG-10: Handles missing attribute values gracefully (skips template if required attrs not present)
Keyword Conflict Resolution (01C-CR)
- CR-1: Identifies keywords matching multiple clusters (≥2 matches)
- CR-2: Decision Criterion 1: assigns to cluster with most dimensional intersections
- CR-3: Decision Criterion 2 (tiebreaker): assigns to more specific cluster
- CR-4: Decision Criterion 3 (tiebreaker): assigns by primary user intent match
- CR-5: Decision Criterion 4 (last resort): flags for user review with clear reasoning
- CR-6: Reassignment logic preserves keyword integrity (no loss, duplication, or orphaning)
Blueprint Assembly Service (01C-BA)
- BA-1: Creates SAGBlueprint record with status='draft'
- BA-2: Creates SAGAttribute records from populated attributes (preserves name, values, is_primary flag)
- BA-3: Creates SAGCluster records from cluster formation output (all fields populated)
- BA-4: Creates SAGKeyword records from keyword generation output (all fields preserved)
- BA-5: Associates keywords to clusters via ManyToMany relations
- BA-6: Generates taxonomy_plan with WP categories (primary attributes) and tags (secondary)
- BA-7: Generates execution_priority with 3 phases: hubs first, supporting articles, term pages
- BA-8: Populates denormalized JSON fields (attributes_json, clusters_json) for fast queries
- BA-9: Returns blueprint ID and summary (attribute count, cluster count, keyword count, next steps)
- BA-10: Status transitions correctly: draft → ready_for_pipeline (or intermediate statuses as needed)
Manual Keyword Supplementation (01C-MKS)
- MKS-1: Users can add keywords via: manual entry, CSV import, library selection, API fetch
- MKS-2: Manual entry accepts comma-separated keywords, validates against duplicates
- MKS-3: CSV import validates file structure (keyword, search_volume optional, difficulty optional)
- MKS-4: Library integration allows browsing & selection per site_type
- MKS-5: Auto-clustering maps new keywords to clusters via attribute similarity matching
- MKS-6: Unmatched keywords flagged for user review: gap analysis, potential new cluster, or outlier
- MKS-7: User can assign unmatched keywords to specific cluster or create new cluster
- MKS-8: API returns summary: added count, auto-clustered count, flagged count, conflicts resolved
API Endpoints (01C-API)
- API-1: POST /api/v1/blueprints/{blueprint_id}/clusters/form/ returns 200 + cluster results
- API-2: POST /api/v1/blueprints/{blueprint_id}/keywords/generate/ returns 200 + keyword results
- API-3: POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/ returns 200 + supplementation summary
- API-4: POST /api/v1/blueprints/{blueprint_id}/assemble/ returns 200 + blueprint summary
- API-5: GET /api/v1/blueprints/{blueprint_id}/clusters/ supports status, type, min_viability filters
- API-6: GET /api/v1/blueprints/{blueprint_id}/keywords/ supports cluster_id, source, intent, min_search_volume filters
- API-7: DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/ only works on draft blueprints
- API-8: Error handling: 400 (bad input), 403 (unauthorized), 404 (not found), 422 (unprocessable)
Data Integrity (01C-DI)
- DI-1: No keyword appears in multiple clusters (enforced via unique_together in SAGKeyword)
- DI-2: Deleted clusters cascade-delete associated keywords (no orphaned keywords)
- DI-3: Deleted blueprints cascade-delete all attributes, clusters, keywords
- DI-4: Blueprint status transitions prevent invalid operations (e.g., can't supplement keywords on published blueprint)
- DI-5: Denormalized JSON fields stay in sync with normalized records (updated on every change)
Performance (01C-PERF)
- PERF-1: Cluster formation completes in <5 seconds for 100+ intersection combinations
- PERF-2: Keyword generation completes in <10 seconds for 50 clusters
- PERF-3: Blueprint assembly completes in <3 seconds (DB writes + JSON generation)
- PERF-4: GET endpoints with filters return results in <2 seconds
- PERF-5: CSV import (1000 keywords) completes in <15 seconds
6. Claude Code Instructions
6.1 Generating Cluster Formation Logic
Prompt Template for Claude:
Generate the cluster formation algorithm for an AI-powered content planning system.
Input:
- populated_attributes: List of attributes with values from user setup wizard
Example: [
{"name": "Pet Type", "values": ["Dogs", "Cats", "Birds"]},
{"name": "Health Condition", "values": ["Allergies", "Arthritis", "Obesity"]}
]
- sector_context: Information about the sector (e.g., "pet health e-commerce")
Task:
1. Generate all meaningful 2-value intersections (Pet Type × Health Condition, Pet Type × Pet Type, etc.)
2. For each intersection, use Claude's reasoning to evaluate:
- Is this a real topical ecosystem? (do the dimensions naturally fit together?)
- Would users search for this? (assess search demand)
- Can we build 1 hub + 3-8 supporting articles?
- Is it differentiated from other clusters?
3. Classify valid clusters by type: product_category, condition_problem, feature, brand, informational
4. Generate a compelling hub title and 5-8 supporting content titles
5. Assign a viability score (0-1) based on coherence, search demand, content potential
Output:
- clusters: Array of cluster objects with all fields from the spec
- summary: Total clusters, type distribution, viability analysis
Constraints:
- Max 50 clusters per sector
- Minimum 3 dimensional intersections for strong clusters
- Quality over quantity: prefer 5 strong clusters over 15 weak ones
6.2 Generating Keyword Generation Logic
Prompt Template for Claude:
Generate keywords for content clusters using templates and AI-driven expansion.
Input:
- clusters: Array of clusters from cluster formation (with dimensions and hub title)
- keyword_templates: Pre-configured templates for site_type
Example: [
"best {health_condition} for {pet_type}",
"{pet_type} {health_condition} treatment",
"affordable {health_condition} relief for {pet_type}"
]
- sector_context: Site type (ecommerce, blog, saas, etc.)
Task:
1. Load keyword templates filtered by sector site_type
2. For each cluster:
- Extract dimension values
- Substitute values into matching templates
- Generate long-tail variants: best, review, vs, for, how to
- Enrich with search volume, difficulty, intent (informational, transactional, etc.)
3. Deduplicate globally across all clusters
4. Identify multi-cluster keywords and resolve conflicts via:
- Highest dimensional intersection count
- Most specific cluster (tiebreaker)
- Primary user intent match (tiebreaker)
5. Validate constraints: 10-25 per cluster, 300-500 total
Output:
- keywords_per_cluster: Keywords organized by cluster ID
- deduplication: Count of duplicates removed, conflicts flagged
- summary: Total unique keywords, per-cluster average, search volume total
Constraints:
- Do NOT generate more than 25 keywords per cluster
- Do NOT allow duplicates
- Prioritize high search volume keywords
- Ensure diversity: mix of base keywords and long-tail variants
6.3 Integrating with Setup Wizard (01D)
Implementation Notes:
-
After user completes attribute population in wizard:
- Call
POST /api/v1/blueprints/{blueprint_id}/clusters/form/ - Display clusters to user (preview mode)
- Allow user to: review, edit (rename hub titles, remove clusters), or confirm
- Call
-
After user confirms clusters:
- Call
POST /api/v1/blueprints/{blueprint_id}/keywords/generate/ - Display keywords grouped by cluster (preview mode)
- Allow user to: supplement keywords, remove outliers, or confirm
- Call
-
Before finalizing blueprint:
- Optionally allow manual keyword supplementation (CSV, library, manual entry)
- Call
POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/for each source - Resolve conflicts (auto or manual)
- Call
POST /api/v1/blueprints/{blueprint_id}/assemble/to finalize
6.4 Testing with Sample Data
Test Case 1: Pet Health E-commerce Site
populated_attributes = [
{"name": "Pet Type", "values": ["Dogs", "Cats"]},
{"name": "Health Condition", "values": ["Arthritis", "Allergies", "Obesity"]},
{"name": "Target Audience", "values": ["Pet Owners", "Veterinarians"]}
]
sector_context = {
"sector_id": "pet_health",
"site_type": "ecommerce",
"sector_name": "Pet Health Products"
}
# Expected clusters:
# 1. Dog Arthritis Relief (product_category)
# 2. Cat Allergies Nutrition (product_category)
# 3. Senior Dog Joint Support (life_stage)
# ... etc.
Test Case 2: Local Service (Veterinary Clinic)
populated_attributes = [
{"name": "Service Type", "values": ["Surgery", "Preventive Care", "Emergency"]},
{"name": "Pet Type", "values": ["Dogs", "Cats", "Exotic"]},
{"name": "Location", "values": ["Downtown", "Suburbs"]}
]
sector_context = {
"sector_id": "vet_clinic",
"site_type": "local_service",
"sector_name": "Veterinary Clinic"
}
# Expected clusters:
# 1. Emergency Dog Surgery Downtown (local_service + product_category)
# 2. Preventive Cat Care Suburbs (informational + local_service)
# ... etc.
7. Cross-Document References
Upstream Dependencies
- 01A (SAG Master Data Models): Provides SAGBlueprint, SAGAttribute, SAGCluster base models
- 01B (Sector Attribute Templates): Provides attribute framework, keyword templates, site_type configurations
Downstream Consumers
- 01D (Setup Wizard): Triggers cluster formation & keyword generation after attribute population
- 01E (Blueprint-aware Pipeline): Uses clusters, keywords, taxonomy_plan, execution_priority for content generation
- 01F (Existing Site Analysis): May feed competitor/existing keywords into supplementation process
- 01G (Health Monitoring): Tracks cluster completeness, keyword coverage, content generation progress against blueprint
8. Appendix: Algorithm Complexity & Performance Estimates
Cluster Formation Complexity
- Input: N attributes with M average values each
- Intersections Generated: O(M²) for 2-value, O(M³) for 3-value
- AI Evaluations: O(M² or M³) function calls (largest cost)
- Time Estimate: ~1-2 seconds per 100 intersections (depending on Claude API latency)
- Bottleneck: Claude API response time for viability evaluation
Keyword Generation Complexity
- Input: C clusters, T keyword templates per cluster
- Base Keywords: O(C × T) (template substitution)
- Long-tail Variants: O(C × T × V) where V ≈ 7 (base + 6 variants)
- Deduplication: O(K log K) where K = total keywords (sort-based)
- Time Estimate: ~3-5 seconds for 300+ keywords
Blueprint Assembly Complexity
- DB Writes: O(A + C + K) where A=attributes, C=clusters, K=keywords
- JSON Generation: O(A + C + K) for denormalization
- Time Estimate: <1 second for typical blueprints (< 10 MB JSON)
Document Complete Status: Ready for Development Next Step: Implement Phase 1 (AI Functions) per Section 4