# IGNY8 Phase 1: Cluster Formation & Keyword Engine (Doc 01C) **Document Version:** 1.0 **Date:** 2026-03-23 **Phase:** Phase 1 - Foundation & Intelligence **Status:** Build Ready --- ## 1. Current State ### Existing Components - **SAGBlueprint** (01A): Data model with status tracking, blueprint lifecycle management - **SAGAttribute** & **SAGCluster** models (01A): Schema definitions for attributes and topic clusters - **SectorAttributeTemplate** (01B): Pre-configured attribute framework with keyword templates per site_type - **Setup Wizard** (01D): Collects sector, site_type, and populated attribute values from user - **Blueprint Service** (01G - earlier iteration): Basic blueprint assembly, denormalization ### Current Limitations - No automated cluster formation from attribute intersection logic - No keyword generation from templates - No conflict resolution for multi-cluster keyword assignments - No cluster type classification (product, condition, feature, etc.) - No validation of cluster viability (size, coherence, user demand) - No hub title and supporting content plan generation ### Dependencies Ready - ✅ Sector attribute templates loaded with keyword templates - ✅ Setup wizard populates attributes - ✅ Data models support cluster and keyword storage - ✅ Blueprint lifecycle framework exists --- ## 2. What to Build ### 2.1 Cluster Formation AI Function **File:** `sag/ai_functions/cluster_formation.py` **Register Key:** `'form_clusters'` **Triggering Context:** After user populates attributes in setup wizard; before keyword assignment #### Input Contract ```python { "populated_attributes": [ {"name": "Target Audience", "values": ["Pet Owners", "Veterinarians"]}, {"name": "Pet Type", "values": ["Dogs", "Cats"]}, {"name": "Health Condition", "values": ["Allergies", "Arthritis", "Obesity"]} ], "sector_context": { "sector_id": str, "site_type": "ecommerce|saas|blog|local_service", "sector_name": str }, "constraints": { "max_clusters": 50, # hard cap per sector "min_keywords_per_cluster": 5, "max_keywords_per_cluster": 20, "optimal_keywords_per_cluster": 7-15 } } ``` #### Output Contract ```python { "clusters": [ { "id": "cluster_001", "title": "Dog Arthritis Relief Solutions", "type": "product_category", # or condition_problem, feature, brand, informational, comparison "dimensions": { "primary": ["Pet Type: Dogs", "Health Condition: Arthritis"], "secondary": ["Target Audience: Pet Owners"] }, "intersection_depth": 3, # count of dimensional intersections "viability_score": 0.92, # 0-1 based on coherence + demand assessment "hub_title": "Best Arthritis Treatments for Dogs", "supporting_content_plan": [ "Senior Dog Arthritis: Causes & Prevention", "Dog Arthritis Medications: Complete Guide", "Physical Therapy Exercises for Dogs with Arthritis", "Diet Changes to Support Joint Health", "When to See a Vet About Dog Joint Pain" ], "keywords": [], # populated in keyword generation phase "dimension_count": 3, "validation": { "is_real_topical_ecosystem": true, "has_search_demand": true, "can_support_content_plan": true, "sufficient_differentiation": true } }, // ... more clusters ], "summary": { "total_clusters_formed": 12, "type_distribution": { "product_category": 6, "condition_problem": 4, "feature": 1, "brand": 0, "informational": 1, "comparison": 0 }, "avg_intersection_depth": 2.3, "clusters_below_viability_threshold": 0 } } ``` #### Algorithm (Pseudocode) ``` FUNCTION form_clusters(populated_attributes, sector_context): # STEP 1: Generate all 2-value intersections all_intersections = [] for each attribute_pair in populated_attributes: for value1 in attribute_pair[0].values: for value2 in attribute_pair[1].values: intersection = { "dimensions": [value1, value2], "attribute_names": [attribute_pair[0].name, attribute_pair[1].name] } all_intersections.append(intersection) # Also generate 3-value intersections for strong coherence for attribute_triplet in populated_attributes (size=3): for value1 in attribute_triplet[0].values: for value2 in attribute_triplet[1].values: for value3 in attribute_triplet[2].values: intersection = { "dimensions": [value1, value2, value3], "attribute_names": [name[0], name[1], name[2]] } all_intersections.append(intersection) # STEP 2: AI evaluates each intersection valid_clusters = [] for intersection in all_intersections: evaluation = AI_EVALUATE_INTERSECTION(intersection, sector_context): - Is this a real topical ecosystem? - Would users search for this combination? - Can we build a hub + 3-10 supporting articles? - Is there sufficient differentiation from other clusters? - Does the combination make semantic sense? if evaluation.is_valid: # STEP 3: Classify cluster type cluster_type = AI_CLASSIFY_TYPE(intersection) → product_category, condition_problem, feature, brand, informational, comparison # STEP 4: Generate hub title + supporting content plan hub_title = AI_GENERATE_HUB_TITLE(intersection, sector_context) supporting_titles = AI_GENERATE_SUPPORTING_TITLES( hub_title, intersection, count=5-8 ) # Create cluster object cluster = { "dimensions": intersection.dimensions, "type": cluster_type, "viability_score": evaluation.confidence_score, "hub_title": hub_title, "supporting_content_plan": supporting_titles, "validation": evaluation } valid_clusters.append(cluster) # STEP 4: Apply constraints & filtering sorted_clusters = SORT_BY_VIABILITY_SCORE(valid_clusters) final_clusters = sorted_clusters[0:max_clusters] # STEP 5: Validate distribution & completeness distribution = CALCULATE_TYPE_DISTRIBUTION(final_clusters) # Flag if any type is severely under-represented if distribution.imbalance > THRESHOLD: LOG_WARNING("Type distribution may be suboptimal") # STEP 6: Return with summary return { "clusters": final_clusters, "summary": { "total_clusters": len(final_clusters), "type_distribution": distribution, "viability_threshold_met": all clusters have score >= 0.70 } } END FUNCTION ``` #### AI Evaluation Criteria For each intersection, the AI must answer: 1. **Real Topical Ecosystem?** - Do the dimensions naturally connect in user intent? - Is there an existing product/service/solution category? - Example: YES - "Dog Arthritis Relief" (real problem + real solutions) - Example: NO - "Vegetarian Chainsaw" (nonsensical combination) 2. **User Search Demand?** - Would users actively search for this combination? - Check: keyword templates, search volume patterns, user forums - Target: ≥500 monthly searches for hub keyword 3. **Content Support?** - Can we create 1 hub + 3-10 supporting articles? - Is there enough subtopic depth? - Example: YES - "Dog Arthritis" can have medication, exercise, diet, vet visits - Example: NO - "Red Dog Collar" (too niche, limited subtopics) 4. **Sufficient Differentiation?** - Does this cluster stand apart from others? - Avoid near-duplicate clusters (e.g., "Dog Joint Health" vs "Dog Arthritis") - Decision: merge or reject the weaker one 5. **Dimensional Clarity** - Do all dimensions contribute meaningfully? - Remove secondary dimensions that don't add coherence #### Hard Constraints - **Maximum Clusters:** 50 per sector (enforce in sorting/filtering) - **Minimum Keywords per Cluster:** 5 (checked in keyword generation) - **Maximum Keywords per Cluster:** 20 (checked in keyword generation) - **Optimal Range:** 7-15 keywords per cluster - **No Keyword Duplication:** Each keyword in exactly one cluster (enforced in conflict resolution) - **Type Distribution Target:** - Product/Service Type: 40-50% - Condition/Problem: 20-30% - Feature: 10-15% - Brand: 5-10% - Life Stage/Audience: 5-10% --- ### 2.2 Keyword Auto-Generation AI Function **File:** `sag/ai_functions/keyword_generation.py` **Register Key:** `'generate_keywords'` **Triggering Context:** After cluster formation; before blueprint assembly #### Input Contract ```python { "clusters": [ # output from cluster_formation { "id": "cluster_001", "dimensions": ["Pet Type: Dogs", "Health Condition: Arthritis"], "hub_title": "Best Arthritis Treatments for Dogs", "supporting_content_plan": [...] } ], "sector_context": { "sector_id": str, "site_type": "ecommerce|saas|blog|local_service", "site_intent": "sell|inform|book|download" }, "keyword_templates": { # loaded from SectorAttributeTemplate "template_001": "best {health_condition} for {pet_type}", "template_002": "{pet_type} {health_condition} treatment", // ... more templates }, "constraints": { "min_keywords_per_cluster": 10, "max_keywords_per_cluster": 25, "total_target": "300-500" } } ``` #### Output Contract ```python { "keywords_per_cluster": { "cluster_001": { "keywords": [ { "keyword": "best arthritis treatment for dogs", "search_volume": 1200, "difficulty": "medium", "intent": "informational", "generated_from": "template_001", "variant_type": "long_tail" }, { "keyword": "dog arthritis remedies", "search_volume": 800, "difficulty": "easy", "intent": "informational", "generated_from": "template_002", "variant_type": "base" }, // ... 13-23 more keywords ], "keyword_count": 15, "primary_intent": "informational", "search_volume_total": 12500 } }, "deduplication": { "duplicates_removed": 8, "flagged_conflicts": 3 # keywords fitting multiple clusters }, "summary": { "total_unique_keywords": 342, "per_cluster_avg": 14.25, "total_search_volume": 892000, "within_constraints": true } } ``` #### Algorithm (Pseudocode) ``` FUNCTION generate_keywords(clusters, sector_context, keyword_templates): all_keywords = {} FOR EACH cluster IN clusters: # STEP 1: Extract attribute values from cluster dimensions attribute_values = EXTRACT_ATTRIBUTE_VALUES(cluster.dimensions) # Output: {"Pet Type": "Dogs", "Health Condition": "Arthritis", ...} cluster_keywords = [] # STEP 2: Substitute values into templates FOR EACH template IN keyword_templates: # Check if template requires all attribute values present required_attrs = PARSE_TEMPLATE_VARIABLES(template) if ALL_ATTRS_AVAILABLE(required_attrs, attribute_values): # Substitute values base_keyword = SUBSTITUTE_VALUES(template, attribute_values) cluster_keywords.append({ "keyword": base_keyword, "generated_from": template.id, "variant_type": "base" }) # STEP 3: Generate long-tail variants long_tail_variants = [] FOR EACH base_keyword IN cluster_keywords: # "best arthritis treatment for dogs" variants = [] # Variant: Add "best" variants.append("best " + base_keyword) # Variant: Add "review" variants.append(base_keyword + " review") # Variant: Add "vs" (comparison) if CLUSTER_TYPE in [product_category, comparison]: variants.append(base_keyword + " vs alternatives") # Variant: Add "for" (audience) variants.append(base_keyword + " for seniors") # Variant: Add "how to" variants.append("how to " + base_keyword) # Variant: Add "cost" (ecommerce intent) if site_intent == "sell": variants.append(base_keyword + " cost") FOR EACH variant IN variants: if NOT_DUPLICATE(variant, cluster_keywords): cluster_keywords.append({ "keyword": variant, "variant_type": "long_tail", "parent": base_keyword }) # STEP 4: Enrich keywords with metadata enriched_keywords = [] FOR EACH kw IN cluster_keywords: enriched = { "keyword": kw.keyword, "search_volume": ESTIMATE_SEARCH_VOLUME(kw.keyword, sector), "difficulty": ESTIMATE_DIFFICULTY(kw.keyword, sector), "intent": CLASSIFY_INTENT(kw.keyword), # informational, transactional, navigational "generated_from": kw.generated_from, "variant_type": kw.variant_type } enriched_keywords.append(enriched) # STEP 5: Filter & sort filtered_keywords = SORT_BY_SEARCH_VOLUME(enriched_keywords) # Keep top 10-25 per cluster cluster_keywords_final = filtered_keywords[0:25] # Validate minimum if LEN(cluster_keywords_final) < 10: ADD_SUPPLEMENTARY_KEYWORDS(cluster_keywords_final, 5) all_keywords[cluster.id] = { "keywords": cluster_keywords_final, "keyword_count": len(cluster_keywords_final), "primary_intent": MODE(intent from all keywords), "search_volume_total": SUM(all search volumes) } # STEP 6: Global deduplication all_keywords_flat = FLATTEN(all_keywords) duplicates = FIND_DUPLICATES(all_keywords_flat) FOR EACH duplicate_set IN duplicates: primary_cluster = PRIMARY_CLUSTER(duplicate_set) # best fit by dimensions REASSIGN_DUPLICATES_TO_PRIMARY(duplicate_set, primary_cluster) # STEP 7: Validate constraints total_keywords = SUM(keyword_count for each cluster) validation = { "within_min_per_cluster": all clusters >= 10, "within_max_per_cluster": all clusters <= 25, "total_within_target": total_keywords between 300-500, "no_duplicates": len(duplicates) == 0 } if NOT validation.all_true: LOG_WARNING("Keyword generation constraints not fully met") # STEP 8: Return results return { "keywords_per_cluster": all_keywords, "deduplication": { "duplicates_removed": len(duplicates), "flagged_conflicts": identify_multi_cluster_fits() }, "summary": { "total_unique_keywords": total_keywords, "per_cluster_avg": total_keywords / len(clusters), "total_search_volume": sum of all volumes, "within_constraints": validation.all_true } } END FUNCTION ``` #### Keyword Template Structure (from SectorAttributeTemplate, 01B) ```python # Example for Pet Health ecommerce site keyword_templates = { "site_type": "ecommerce", "templates": [ { "id": "template_001", "pattern": "best {health_condition} treatment for {pet_type}", "weight": 5, # prioritize this template "min_required_attrs": ["health_condition", "pet_type"] }, { "id": "template_002", "pattern": "{pet_type} {health_condition} medication", "weight": 4, "min_required_attrs": ["pet_type", "health_condition"] }, { "id": "template_003", "pattern": "affordable {health_condition} relief for {pet_type}", "weight": 3, "min_required_attrs": ["health_condition", "pet_type"] }, // ... more templates ] } ``` #### Long-tail Variant Rules | Variant Type | Pattern | Use Case | Example | |---|---|---|---| | Base | {keyword} | All clusters | "dog arthritis relief" | | Best/Top | best {keyword} | All clusters | "best dog arthritis relief" | | Review | {keyword} review | Product clusters | "arthritis supplement for dogs review" | | Comparison | {keyword} vs | Comparison intent | "arthritis medication vs supplement for dogs" | | Audience | {keyword} for {audience} | Audience-specific | "dog arthritis relief for senior dogs" | | How-to | how to {verb} {keyword} | Problem-solution | "how to manage dog arthritis" | | Cost/Price | {keyword} cost | Ecommerce intent | "arthritis treatment for dogs cost" | | Quick | {keyword} fast | Urgency-driven | "fast arthritis relief for dogs" | --- ### 2.3 Blueprint Assembly Service **File:** `sag/services/blueprint_service.py` **Primary Function:** `assemble_blueprint(site, attributes, clusters, keywords)` **Triggering Context:** After keyword generation; creates SAGBlueprint (status=draft) #### Input Contract ```python assemble_blueprint( site: Website, # from 01A attributes: List[Tuple[name, values]], # user-populated clusters: List[Dict], # from cluster_formation() keywords: Dict[cluster_id, List[Dict]] # from generate_keywords() ) ``` #### Execution Steps 1. **Create SAGBlueprint Record** ```python blueprint = SAGBlueprint.objects.create( site=site, status='draft', phase='phase_1_foundation', sector_id=site.sector_id, created_by=current_user, metadata={ 'version': '1.0', 'created_date': now(), 'last_modified': now() } ) ``` 2. **Create SAGAttribute Records** ```python FOR EACH (attribute_name, values) IN attributes: attribute = SAGAttribute.objects.create( blueprint=blueprint, name=attribute_name, values=values, # stored as JSON array is_primary=DETERMINE_PRIMACY(attribute_name, site.site_type), source='user_input' ) ``` 3. **Create SAGCluster Records from Formed Clusters** ```python FOR EACH cluster IN clusters: db_cluster = SAGCluster.objects.create( blueprint=blueprint, cluster_key=cluster['id'], title=cluster['hub_title'], description=GENERATE_CLUSTER_DESC(cluster), cluster_type=cluster['type'], dimensions=cluster['dimensions'], # JSON intersection_depth=cluster['intersection_depth'], viability_score=cluster['viability_score'], hub_title=cluster['hub_title'], supporting_content_plan=cluster['supporting_content_plan'], # JSON array status='draft', keyword_count=0 # updated in next step ) ``` 4. **Populate auto_generated_keywords on Each Cluster** ```python FOR EACH (cluster_id, keyword_list) IN keywords.items(): cluster = SAGCluster.objects.get(cluster_key=cluster_id) keyword_records = [] FOR EACH kw_data IN keyword_list: keyword = SAGKeyword.objects.create( cluster=cluster, keyword_text=kw_data['keyword'], search_volume=kw_data['search_volume'], difficulty=kw_data['difficulty'], intent=kw_data['intent'], generated_from=kw_data['generated_from'], variant_type=kw_data['variant_type'], source='auto_generated' ) keyword_records.append(keyword) cluster.auto_generated_keywords.set(keyword_records) cluster.keyword_count = len(keyword_records) cluster.save() ``` 5. **Generate Taxonomy Plan** ```python taxonomy_plan = { 'wp_categories': [], 'wp_tags': [], 'hierarchy': {} } FOR EACH attribute IN blueprint.sagattribute_set.all(): if attribute.is_primary: category = { 'name': attribute.name, 'slug': slugify(attribute.name), 'description': f"Posts about {attribute.name}" } taxonomy_plan['wp_categories'].append(category) else: tag = { 'name': v, 'slug': slugify(v), 'parent_category': primary_attr_name } FOR EACH v IN attribute.values: taxonomy_plan['wp_tags'].append(tag) blueprint.taxonomy_plan = taxonomy_plan # JSON field ``` 6. **Generate Execution Priority (Phased Approach)** ```python execution_priority = { 'phase': 'phase_1_hubs', 'content_sequence': [] } # Phase 1: Hub pages (1 per cluster) hub_items = [] FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'): hub_items.append({ 'type': 'hub_page', 'cluster_id': cluster.id, 'title': cluster.hub_title, 'priority': 1, 'estimated_effort': 'high', 'SEO_impact': 'critical' }) execution_priority['content_sequence'].extend(hub_items) # Phase 2: Supporting content (5-8 articles per cluster) supporting_items = [] FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'): FOR EACH content_title IN cluster.supporting_content_plan: supporting_items.append({ 'type': 'supporting_article', 'cluster_id': cluster.id, 'parent_hub': cluster.hub_title, 'title': content_title, 'priority': 2, 'estimated_effort': 'medium', 'SEO_impact': 'supporting' }) execution_priority['content_sequence'].extend(supporting_items) # Phase 3: Term/pillar pages (keywords + long-tail) term_items = [] FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'): FOR EACH keyword IN cluster.auto_generated_keywords.all(): term_items.append({ 'type': 'term_page', 'cluster_id': cluster.id, 'keyword': keyword.keyword_text, 'priority': 3, 'estimated_effort': 'low', 'SEO_impact': 'supportive' }) execution_priority['content_sequence'].extend(term_items) blueprint.execution_priority = execution_priority # JSON field ``` 7. **Populate Denormalized JSON Fields** ```python blueprint.attributes_json = { 'total_attributes': blueprint.sagattribute_set.count(), 'summary': [ { 'name': attr.name, 'value_count': len(attr.values), 'values': attr.values, 'is_primary': attr.is_primary } FOR EACH attr IN blueprint.sagattribute_set.all() ] } blueprint.clusters_json = { 'total_clusters': blueprint.sagcluster_set.count(), 'summary': [ { 'id': cluster.cluster_key, 'title': cluster.title, 'type': cluster.cluster_type, 'keyword_count': cluster.keyword_count, 'viability_score': cluster.viability_score } FOR EACH cluster IN blueprint.sagcluster_set.all() ] } blueprint.save() ``` 8. **Return Blueprint ID & Status** ```python return { 'blueprint_id': blueprint.id, 'status': 'draft', 'created_at': blueprint.created_at, 'summary': { 'total_attributes': blueprint.sagattribute_set.count(), 'total_clusters': blueprint.sagcluster_set.count(), 'total_keywords': SAGKeyword.objects.filter(cluster__blueprint=blueprint).count(), 'next_step': 'review blueprint in 01E (Pipeline Configuration)' } } ``` --- ### 2.4 Manual Keyword Supplementation (User Interface) #### Feature: Add Keywords from Multiple Sources 1. **IGNY8 Library Integration** - Users browse pre-curated keyword library per site_type - Select keywords → auto-map to clusters by attribute match - Unmatched keywords → flagged for review 2. **Manual Entry** - Form field: paste or type keywords (comma-separated) - System deduplicates against existing - Prompts user to assign to cluster(s) 3. **CSV Import** - Upload CSV with columns: keyword, search_volume (optional), difficulty (optional) - Preview & validate before import - Bulk assign to clusters or mark for review 4. **Keyword API Integration** (optional in Phase 1) - Connect to SEMrush, Ahrefs, or similar - Fetch keyword suggestions for cluster dimensions - User approves additions #### Keyword Mapping Logic ```python FUNCTION map_keyword_to_clusters(new_keyword, clusters, threshold=0.70): matches = [] FOR EACH cluster IN clusters: # Extract all attribute values from cluster dimensions cluster_attrs = EXTRACT_ATTRIBUTES(cluster.dimensions) # Calculate semantic similarity similarity = CALCULATE_SIMILARITY(new_keyword, cluster_attrs) if similarity > threshold: matches.append({ 'cluster_id': cluster.id, 'cluster_title': cluster.title, 'similarity_score': similarity }) return matches # May be 0, 1, or multiple matches END FUNCTION ``` #### Conflict Resolution: Multi-Cluster Keyword Assignment **Problem:** A keyword fits multiple clusters (e.g., "arthritis relief for pets" fits both Dog Cluster and Cat Cluster) **Resolution Algorithm:** 1. **Identify Multi-Fit Keywords** ```python potential_conflicts = [] FOR EACH new_keyword IN keywords_to_add: matching_clusters = map_keyword_to_clusters(new_keyword, all_clusters) if len(matching_clusters) > 1: potential_conflicts.append({ 'keyword': new_keyword, 'matching_clusters': matching_clusters }) ``` 2. **Apply Decision Criteria (in order)** - **Criterion 1: Dimensional Intersection Count** - Assign to cluster with MOST dimensional intersections - Example: "dog arthritis relief" → Dog cluster has 3 dimensions (pet type, condition, audience); Cat cluster has 2 → assign to Dog cluster - **Criterion 2: Specificity** - If tied on intersection count, assign to MORE SPECIFIC cluster - Example: "arthritis relief" (general) vs "dog arthritis relief" (specific) → assign to Dog cluster - **Criterion 3: Primary User Intent Match** - If still tied, assign to cluster whose hub_title best matches user intent - Example: Both Dog & Cat clusters have "arthritis relief" hub; Dog hub is "Best Arthritis Treatments for Dogs" → assign to Dog - **Criterion 4: Last Resort - Create New Cluster** - If keyword doesn't fit any cluster well, flag as "potential_new_cluster" - User reviews and decides: split existing cluster, merge, or create new 3. **Implementation** ```python FUNCTION resolve_keyword_conflict(keyword, matching_clusters): # Step 1: Compare intersection depth sorted_by_depth = SORT_BY(matching_clusters, 'intersection_depth', DESC) best_by_depth = sorted_by_depth[0] if sorted_by_depth[0].intersection_depth > sorted_by_depth[1].intersection_depth: return best_by_depth # Step 2: Compare specificity specificity_scores = [CALC_SPECIFICITY(cluster, keyword) for cluster in sorted_by_depth] best_by_specificity = sorted_by_depth[ARGMAX(specificity_scores)] if specificity_scores[0] > specificity_scores[1]: return best_by_specificity # Step 3: Compare intent match intent_scores = [CALC_INTENT_MATCH(cluster.hub_title, keyword) for cluster in sorted_by_depth] best_by_intent = sorted_by_depth[ARGMAX(intent_scores)] if intent_scores[0] > intent_scores[1]: return best_by_intent # Step 4: Flag for user review return { 'status': 'flagged_for_review', 'keyword': keyword, 'candidates': matching_clusters, 'reason': 'ambiguous_assignment' } END FUNCTION ``` --- ## 3. Data Models / APIs ### 3.1 Database Models (Django ORM) #### SAGBlueprint (existing from 01A, extended) ```python class SAGBlueprint(models.Model): STATUS_CHOICES = ( ('draft', 'Draft'), ('cluster_formation_complete', 'Cluster Formation Complete'), ('keyword_generation_complete', 'Keyword Generation Complete'), ('keyword_supplemented', 'Keywords Supplemented'), ('ready_for_pipeline', 'Ready for Pipeline'), ('published', 'Published'), ) site = models.ForeignKey(Website, on_delete=models.CASCADE) status = models.CharField(max_length=50, choices=STATUS_CHOICES, default='draft') phase = models.CharField(max_length=50, default='phase_1_foundation') sector_id = models.CharField(max_length=100) # Denormalized JSON for fast access attributes_json = models.JSONField(default=dict, blank=True) clusters_json = models.JSONField(default=dict, blank=True) taxonomy_plan = models.JSONField(default=dict, blank=True) execution_priority = models.JSONField(default=dict, blank=True) created_by = models.ForeignKey(User, on_delete=models.SET_NULL, null=True) created_at = models.DateTimeField(auto_now_add=True) updated_at = models.DateTimeField(auto_now=True) class Meta: db_table = 'sag_blueprint' ordering = ['-created_at'] ``` #### SAGAttribute (existing from 01A, no changes required) ```python class SAGAttribute(models.Model): blueprint = models.ForeignKey(SAGBlueprint, on_delete=models.CASCADE) name = models.CharField(max_length=255) values = models.JSONField() # array of strings is_primary = models.BooleanField(default=False) source = models.CharField(max_length=50) # 'user_input', 'template', 'api' created_at = models.DateTimeField(auto_now_add=True) class Meta: db_table = 'sag_attribute' unique_together = ('blueprint', 'name') ``` #### SAGCluster (existing from 01A, extended) ```python class SAGCluster(models.Model): TYPE_CHOICES = ( ('product_category', 'Product/Service Category'), ('condition_problem', 'Condition/Problem'), ('feature', 'Feature'), ('brand', 'Brand'), ('informational', 'Informational'), ('comparison', 'Comparison'), ('life_stage', 'Life Stage/Audience'), ) STATUS_CHOICES = ( ('draft', 'Draft'), ('validated', 'Validated'), ('keyword_assigned', 'Keywords Assigned'), ('content_created', 'Content Created'), ) blueprint = models.ForeignKey(SAGBlueprint, on_delete=models.CASCADE) cluster_key = models.CharField(max_length=100) # unique ID from cluster formation title = models.CharField(max_length=255) description = models.TextField(blank=True) cluster_type = models.CharField(max_length=50, choices=TYPE_CHOICES) dimensions = models.JSONField() # ["dimension1", "dimension2", ...] intersection_depth = models.IntegerField() # count of intersecting dimensions viability_score = models.FloatField() # 0-1 hub_title = models.CharField(max_length=255) supporting_content_plan = models.JSONField() # array of content titles auto_generated_keywords = models.ManyToManyField( 'SAGKeyword', related_name='clusters_auto', blank=True ) supplemented_keywords = models.ManyToManyField( 'SAGKeyword', related_name='clusters_supplemented', blank=True ) keyword_count = models.IntegerField(default=0) status = models.CharField(max_length=50, choices=STATUS_CHOICES, default='draft') created_at = models.DateTimeField(auto_now_add=True) updated_at = models.DateTimeField(auto_now=True) class Meta: db_table = 'sag_cluster' unique_together = ('blueprint', 'cluster_key') ordering = ['-viability_score'] ``` #### SAGKeyword (new) ```python class SAGKeyword(models.Model): INTENT_CHOICES = ( ('informational', 'Informational'), ('transactional', 'Transactional'), ('navigational', 'Navigational'), ('commercial', 'Commercial Intent'), ) VARIANT_TYPES = ( ('base', 'Base Keyword'), ('long_tail', 'Long-tail Variant'), ('brand', 'Brand Variant'), ('comparison', 'Comparison'), ('review', 'Review'), ('how_to', 'How-to'), ) SOURCE_CHOICES = ( ('auto_generated', 'Auto-Generated'), ('manual_entry', 'Manual Entry'), ('csv_import', 'CSV Import'), ('api_fetch', 'API Fetch'), ('library', 'IGNY8 Library'), ) cluster = models.ForeignKey( SAGCluster, on_delete=models.CASCADE, related_name='all_keywords' ) keyword_text = models.CharField(max_length=255) search_volume = models.IntegerField(null=True, blank=True) difficulty = models.CharField(max_length=50, blank=True) # 'easy', 'medium', 'hard' intent = models.CharField(max_length=50, choices=INTENT_CHOICES) generated_from = models.CharField(max_length=100, blank=True) # template ID or source variant_type = models.CharField(max_length=50, choices=VARIANT_TYPES) source = models.CharField(max_length=50, choices=SOURCE_CHOICES) cpc = models.FloatField(null=True, blank=True) # if available from API competition = models.CharField(max_length=50, blank=True) # 'low', 'medium', 'high' created_at = models.DateTimeField(auto_now_add=True) updated_at = models.DateTimeField(auto_now=True) class Meta: db_table = 'sag_keyword' unique_together = ('cluster', 'keyword_text') ordering = ['-search_volume'] ``` --- ### 3.2 API Endpoints #### POST /api/v1/blueprints/{blueprint_id}/clusters/form/ **Purpose:** Trigger cluster formation AI function **Authentication:** Required (JWT) **Input:** ```json { "populated_attributes": [ {"name": "Pet Type", "values": ["Dogs", "Cats"]}, {"name": "Health Condition", "values": ["Allergies", "Arthritis"]} ], "max_clusters": 50 } ``` **Output:** ```json { "clusters": [...], "summary": { "total_clusters_formed": 12, "type_distribution": {...} }, "status": "success" } ``` **Error Cases:** - 400: Invalid attributes structure - 403: Unauthorized (wrong blueprint owner) - 422: Insufficient attributes for cluster formation (< 2 dimensions) --- #### POST /api/v1/blueprints/{blueprint_id}/keywords/generate/ **Purpose:** Trigger keyword generation AI function **Authentication:** Required **Input:** ```json { "use_cluster_ids": ["cluster_001", "cluster_002"], "target_keywords_per_cluster": 15, "include_long_tail_variants": true } ``` **Output:** ```json { "keywords_per_cluster": {...}, "deduplication": { "duplicates_removed": 5 }, "summary": { "total_unique_keywords": 180, "within_constraints": true } } ``` --- #### POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/ **Purpose:** Add manual, CSV, library, or API-sourced keywords **Authentication:** Required **Input (Multiple Scenarios):** **Scenario 1: Manual Entry** ```json { "source": "manual_entry", "keywords": ["arthritis relief dogs", "joint pain dogs"], "cluster_id": "cluster_001" } ``` **Scenario 2: CSV Import** ```json { "source": "csv_import", "csv_url": "https://example.com/keywords.csv", "auto_cluster": true } ``` **Scenario 3: Library Selection** ```json { "source": "library", "library_keyword_ids": [123, 456, 789], "auto_cluster": true } ``` **Output:** ```json { "added_keywords": 10, "auto_clustered": 9, "flagged_for_review": 1, "conflicts_resolved": { "reassigned": 2, "deferred": 1 } } ``` --- #### POST /api/v1/blueprints/{blueprint_id}/assemble/ **Purpose:** Trigger blueprint assembly (create final SAGBlueprint with all records) **Authentication:** Required **Input:** ```json { "finalize_keyword_review": true, "set_status": "ready_for_pipeline" } ``` **Output:** ```json { "blueprint_id": 42, "status": "ready_for_pipeline", "summary": { "total_attributes": 4, "total_clusters": 12, "total_keywords": 180, "execution_priority_phases": 3 } } ``` --- #### GET /api/v1/blueprints/{blueprint_id}/clusters/?status=draft&type=product_category **Purpose:** List clusters with filtering **Query Params:** - `status`: draft, validated, keyword_assigned, content_created - `type`: product_category, condition_problem, feature, brand, informational, comparison - `min_viability`: 0.70 - `limit`: 50, `offset`: 0 **Output:** ```json { "results": [ { "id": 1, "cluster_key": "cluster_001", "title": "Dog Arthritis Relief Solutions", "hub_title": "Best Arthritis Treatments for Dogs", "keyword_count": 15, "viability_score": 0.92, "type": "product_category" } ], "total_count": 12, "total_keywords": 180 } ``` --- #### GET /api/v1/blueprints/{blueprint_id}/keywords/?cluster_id=cluster_001&source=auto_generated **Purpose:** List keywords for a cluster **Query Params:** - `cluster_id`: filter by cluster - `source`: auto_generated, manual_entry, csv_import, api_fetch, library - `intent`: informational, transactional, navigational - `min_search_volume`: 100 - `order_by`: search_volume (DESC), difficulty, intent **Output:** ```json { "results": [ { "id": 1, "keyword_text": "best arthritis treatment for dogs", "search_volume": 1200, "difficulty": "medium", "intent": "informational", "variant_type": "long_tail", "source": "auto_generated" } ], "total_count": 15 } ``` --- #### DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/ **Purpose:** Remove a keyword (before assembly) **Authentication:** Required **Status:** Only available if blueprint.status='draft' or 'keyword_generation_complete' --- ## 4. Implementation Steps ### Phase 1: AI Functions Development (Week 1-2) #### Step 1.1: Set up cluster_formation.py structure - [ ] Create `sag/ai_functions/cluster_formation.py` - [ ] Define input/output contracts - [ ] Implement intersection generation logic (2-value, 3-value) - [ ] Stub out AI evaluation function (ready for Claude integration) - [ ] Implement constraint filtering & sorting #### Step 1.2: Implement cluster formation AI logic - [ ] Integrate Claude AI API for cluster viability evaluation - Real topical ecosystem check - User search demand validation - Content support assessment - Differentiation evaluation - [ ] Implement cluster type classification (using embeddings or rule-based logic) - [ ] Implement hub title & supporting content plan generation - [ ] Add viability scoring (0-1 scale) - [ ] Implement distribution validation #### Step 1.3: Unit tests for cluster formation - [ ] Test intersection generation (2-value, 3-value) - [ ] Test AI evaluation with mock responses - [ ] Test constraint filtering (max 50 clusters) - [ ] Test type distribution analysis - [ ] Test handling of edge cases (0 intersections, all rejected, etc.) #### Step 1.4: Create keyword_generation.py structure - [ ] Create `sag/ai_functions/keyword_generation.py` - [ ] Define input/output contracts - [ ] Implement template substitution logic - [ ] Implement long-tail variant generation - [ ] Implement deduplication logic #### Step 1.5: Implement keyword generation AI logic - [ ] Integrate template loading from SectorAttributeTemplate (01B) - [ ] Implement keyword enrichment (search volume, difficulty, intent) - [ ] Implement filtering & sorting by search volume - [ ] Implement constraint validation (10-25 per cluster, 300-500 total) - [ ] Implement global deduplication & conflict resolution #### Step 1.6: Unit tests for keyword generation - [ ] Test template substitution with various attribute combinations - [ ] Test long-tail variant generation - [ ] Test deduplication across clusters - [ ] Test constraint validation - [ ] Test conflict resolution (multi-cluster keywords) --- ### Phase 2: Data Models & Service Layer (Week 2-3) #### Step 2.1: Database migrations - [ ] Create SAGKeyword model - [ ] Add ManyToMany relations to SAGCluster (auto_generated_keywords, supplemented_keywords) - [ ] Extend SAGBlueprint with denormalized JSON fields (attributes_json, clusters_json, taxonomy_plan, execution_priority) - [ ] Extend SAGCluster with cluster_key, type, intersection_depth, viability_score, hub_title, supporting_content_plan - [ ] Run and test migrations on dev database #### Step 2.2: Implement blueprint_service.py - [ ] Create `sag/services/blueprint_service.py` - [ ] Implement assemble_blueprint() function with 8 steps - [ ] Implement SAGBlueprint creation & status management - [ ] Implement SAGAttribute creation from user input - [ ] Implement SAGCluster creation from cluster formation results - [ ] Implement SAGKeyword creation & assignment - [ ] Implement taxonomy_plan generation - [ ] Implement execution_priority generation - [ ] Implement denormalized JSON population #### Step 2.3: Unit tests for blueprint_service - [ ] Test blueprint creation & status transitions - [ ] Test attribute record creation - [ ] Test cluster record creation with all fields - [ ] Test keyword assignment to clusters - [ ] Test taxonomy plan generation - [ ] Test execution priority generation - [ ] Test denormalized JSON accuracy --- ### Phase 3: API Endpoints & Integration (Week 3-4) #### Step 3.1: Implement cluster formation API endpoint - [ ] Create POST /api/v1/blueprints/{blueprint_id}/clusters/form/ - [ ] Validate input attributes - [ ] Call cluster_formation() AI function - [ ] Return results with summary - [ ] Error handling (400, 403, 422) #### Step 3.2: Implement keyword generation API endpoint - [ ] Create POST /api/v1/blueprints/{blueprint_id}/keywords/generate/ - [ ] Validate input & cluster availability - [ ] Call keyword_generation() AI function - [ ] Return results with deduplication summary - [ ] Error handling #### Step 3.3: Implement keyword supplementation API endpoint - [ ] Create POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/ - [ ] Support multiple input sources (manual, CSV, library, API) - [ ] Implement auto-clustering via map_keyword_to_clusters() - [ ] Implement conflict resolution via resolve_keyword_conflict() - [ ] Return summary of added, clustered, flagged keywords #### Step 3.4: Implement blueprint assembly API endpoint - [ ] Create POST /api/v1/blueprints/{blueprint_id}/assemble/ - [ ] Call blueprint_service.assemble_blueprint() - [ ] Manage status transitions - [ ] Return blueprint summary with next steps #### Step 3.5: Implement read endpoints - [ ] Create GET /api/v1/blueprints/{blueprint_id}/clusters/?status=draft - [ ] Create GET /api/v1/blueprints/{blueprint_id}/keywords/?cluster_id=... - [ ] Implement filtering & pagination - [ ] Add ordering options #### Step 3.6: Implement keyword removal endpoint - [ ] Create DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/ - [ ] Validate blueprint status (only draft) - [ ] Cascade delete as needed --- ### Phase 4: Integration with 01D & Testing (Week 4-5) #### Step 4.1: Integrate with Setup Wizard (01D) - [ ] Call cluster_formation() after user populates attributes - [ ] Display clusters to user for review (optional: allow edits) - [ ] Call keyword_generation() if user confirms clusters - [ ] Display keywords for review - [ ] Allow manual supplementation before final assembly #### Step 4.2: End-to-end testing - [ ] Test full flow: attributes → clusters → keywords → blueprint - [ ] Test with various sector/site_type combinations - [ ] Test constraint enforcement - [ ] Test conflict resolution with real scenarios - [ ] Performance test with large attribute sets (100+ values) #### Step 4.3: Integration with 01E (Pipeline Configuration) - [ ] Verify blueprint is available to pipeline service - [ ] Test taxonomy plan usage in content generation - [ ] Test execution_priority ordering in pipeline --- ## 5. Acceptance Criteria ### Cluster Formation AI Function (01C-CF) - [ ] **CF-1:** Generates all 2-value intersections from populated attributes - [ ] **CF-2:** Generates relevant 3-value intersections (at least 50% of possible combinations) - [ ] **CF-3:** AI evaluates each intersection on 5 decision criteria (ecosystem, demand, content support, differentiation, clarity) - [ ] **CF-4:** Classification assigns correct cluster type (product_category, condition_problem, feature, brand, informational, comparison) - [ ] **CF-5:** Hub titles are specific, actionable, and 5-12 words long - [ ] **CF-6:** Supporting content plans contain 5-8 titles, semantically related to hub, covering different angles - [ ] **CF-7:** Viability scores accurately reflect cluster strength (0-1 scale, with clear rationale) - [ ] **CF-8:** Hard constraint enforced: max 50 clusters per sector, sorted by viability score - [ ] **CF-9:** Type distribution meets targets: Product/Service 40-50%, Condition/Problem 20-30%, Feature 10-15%, Brand 5-10%, Life Stage 5-10% - [ ] **CF-10:** Clusters have 3+ dimensional intersections for strong coherence - [ ] **CF-11:** No duplicative clusters (semantic coherence check prevents near-duplicates like "Dog Joint Health" + "Dog Arthritis") - [ ] **CF-12:** API response includes summary with cluster count, type distribution, avg intersection depth ### Keyword Generation AI Function (01C-KG) - [ ] **KG-1:** Loads keyword templates from SectorAttributeTemplate for correct site_type - [ ] **KG-2:** Substitutes attribute values into templates to generate base keywords - [ ] **KG-3:** Generates long-tail variants (best, review, vs, for, how to) for each base keyword - [ ] **KG-4:** Deduplicates keywords across all clusters (no keyword appears twice) - [ ] **KG-5:** Global deduplication identifies multi-cluster keywords and reassigns via conflict resolution - [ ] **KG-6:** Per-cluster keyword count: 10-25 keywords (soft target 15) - [ ] **KG-7:** Total keyword count: 300-500+ for site (configurable per sector) - [ ] **KG-8:** Keywords enriched with search volume, difficulty, intent classification - [ ] **KG-9:** API response includes per-cluster breakdown, deduplication summary, total keyword count - [ ] **KG-10:** Handles missing attribute values gracefully (skips template if required attrs not present) ### Keyword Conflict Resolution (01C-CR) - [ ] **CR-1:** Identifies keywords matching multiple clusters (≥2 matches) - [ ] **CR-2:** Decision Criterion 1: assigns to cluster with most dimensional intersections - [ ] **CR-3:** Decision Criterion 2 (tiebreaker): assigns to more specific cluster - [ ] **CR-4:** Decision Criterion 3 (tiebreaker): assigns by primary user intent match - [ ] **CR-5:** Decision Criterion 4 (last resort): flags for user review with clear reasoning - [ ] **CR-6:** Reassignment logic preserves keyword integrity (no loss, duplication, or orphaning) ### Blueprint Assembly Service (01C-BA) - [ ] **BA-1:** Creates SAGBlueprint record with status='draft' - [ ] **BA-2:** Creates SAGAttribute records from populated attributes (preserves name, values, is_primary flag) - [ ] **BA-3:** Creates SAGCluster records from cluster formation output (all fields populated) - [ ] **BA-4:** Creates SAGKeyword records from keyword generation output (all fields preserved) - [ ] **BA-5:** Associates keywords to clusters via ManyToMany relations - [ ] **BA-6:** Generates taxonomy_plan with WP categories (primary attributes) and tags (secondary) - [ ] **BA-7:** Generates execution_priority with 3 phases: hubs first, supporting articles, term pages - [ ] **BA-8:** Populates denormalized JSON fields (attributes_json, clusters_json) for fast queries - [ ] **BA-9:** Returns blueprint ID and summary (attribute count, cluster count, keyword count, next steps) - [ ] **BA-10:** Status transitions correctly: draft → ready_for_pipeline (or intermediate statuses as needed) ### Manual Keyword Supplementation (01C-MKS) - [ ] **MKS-1:** Users can add keywords via: manual entry, CSV import, library selection, API fetch - [ ] **MKS-2:** Manual entry accepts comma-separated keywords, validates against duplicates - [ ] **MKS-3:** CSV import validates file structure (keyword, search_volume optional, difficulty optional) - [ ] **MKS-4:** Library integration allows browsing & selection per site_type - [ ] **MKS-5:** Auto-clustering maps new keywords to clusters via attribute similarity matching - [ ] **MKS-6:** Unmatched keywords flagged for user review: gap analysis, potential new cluster, or outlier - [ ] **MKS-7:** User can assign unmatched keywords to specific cluster or create new cluster - [ ] **MKS-8:** API returns summary: added count, auto-clustered count, flagged count, conflicts resolved ### API Endpoints (01C-API) - [ ] **API-1:** POST /api/v1/blueprints/{blueprint_id}/clusters/form/ returns 200 + cluster results - [ ] **API-2:** POST /api/v1/blueprints/{blueprint_id}/keywords/generate/ returns 200 + keyword results - [ ] **API-3:** POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/ returns 200 + supplementation summary - [ ] **API-4:** POST /api/v1/blueprints/{blueprint_id}/assemble/ returns 200 + blueprint summary - [ ] **API-5:** GET /api/v1/blueprints/{blueprint_id}/clusters/ supports status, type, min_viability filters - [ ] **API-6:** GET /api/v1/blueprints/{blueprint_id}/keywords/ supports cluster_id, source, intent, min_search_volume filters - [ ] **API-7:** DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/ only works on draft blueprints - [ ] **API-8:** Error handling: 400 (bad input), 403 (unauthorized), 404 (not found), 422 (unprocessable) ### Data Integrity (01C-DI) - [ ] **DI-1:** No keyword appears in multiple clusters (enforced via unique_together in SAGKeyword) - [ ] **DI-2:** Deleted clusters cascade-delete associated keywords (no orphaned keywords) - [ ] **DI-3:** Deleted blueprints cascade-delete all attributes, clusters, keywords - [ ] **DI-4:** Blueprint status transitions prevent invalid operations (e.g., can't supplement keywords on published blueprint) - [ ] **DI-5:** Denormalized JSON fields stay in sync with normalized records (updated on every change) ### Performance (01C-PERF) - [ ] **PERF-1:** Cluster formation completes in <5 seconds for 100+ intersection combinations - [ ] **PERF-2:** Keyword generation completes in <10 seconds for 50 clusters - [ ] **PERF-3:** Blueprint assembly completes in <3 seconds (DB writes + JSON generation) - [ ] **PERF-4:** GET endpoints with filters return results in <2 seconds - [ ] **PERF-5:** CSV import (1000 keywords) completes in <15 seconds --- ## 6. Claude Code Instructions ### 6.1 Generating Cluster Formation Logic **Prompt Template for Claude:** ``` Generate the cluster formation algorithm for an AI-powered content planning system. Input: - populated_attributes: List of attributes with values from user setup wizard Example: [ {"name": "Pet Type", "values": ["Dogs", "Cats", "Birds"]}, {"name": "Health Condition", "values": ["Allergies", "Arthritis", "Obesity"]} ] - sector_context: Information about the sector (e.g., "pet health e-commerce") Task: 1. Generate all meaningful 2-value intersections (Pet Type × Health Condition, Pet Type × Pet Type, etc.) 2. For each intersection, use Claude's reasoning to evaluate: - Is this a real topical ecosystem? (do the dimensions naturally fit together?) - Would users search for this? (assess search demand) - Can we build 1 hub + 3-8 supporting articles? - Is it differentiated from other clusters? 3. Classify valid clusters by type: product_category, condition_problem, feature, brand, informational 4. Generate a compelling hub title and 5-8 supporting content titles 5. Assign a viability score (0-1) based on coherence, search demand, content potential Output: - clusters: Array of cluster objects with all fields from the spec - summary: Total clusters, type distribution, viability analysis Constraints: - Max 50 clusters per sector - Minimum 3 dimensional intersections for strong clusters - Quality over quantity: prefer 5 strong clusters over 15 weak ones ``` ### 6.2 Generating Keyword Generation Logic **Prompt Template for Claude:** ``` Generate keywords for content clusters using templates and AI-driven expansion. Input: - clusters: Array of clusters from cluster formation (with dimensions and hub title) - keyword_templates: Pre-configured templates for site_type Example: [ "best {health_condition} for {pet_type}", "{pet_type} {health_condition} treatment", "affordable {health_condition} relief for {pet_type}" ] - sector_context: Site type (ecommerce, blog, saas, etc.) Task: 1. Load keyword templates filtered by sector site_type 2. For each cluster: - Extract dimension values - Substitute values into matching templates - Generate long-tail variants: best, review, vs, for, how to - Enrich with search volume, difficulty, intent (informational, transactional, etc.) 3. Deduplicate globally across all clusters 4. Identify multi-cluster keywords and resolve conflicts via: - Highest dimensional intersection count - Most specific cluster (tiebreaker) - Primary user intent match (tiebreaker) 5. Validate constraints: 10-25 per cluster, 300-500 total Output: - keywords_per_cluster: Keywords organized by cluster ID - deduplication: Count of duplicates removed, conflicts flagged - summary: Total unique keywords, per-cluster average, search volume total Constraints: - Do NOT generate more than 25 keywords per cluster - Do NOT allow duplicates - Prioritize high search volume keywords - Ensure diversity: mix of base keywords and long-tail variants ``` ### 6.3 Integrating with Setup Wizard (01D) **Implementation Notes:** 1. After user completes attribute population in wizard: - Call `POST /api/v1/blueprints/{blueprint_id}/clusters/form/` - Display clusters to user (preview mode) - Allow user to: review, edit (rename hub titles, remove clusters), or confirm 2. After user confirms clusters: - Call `POST /api/v1/blueprints/{blueprint_id}/keywords/generate/` - Display keywords grouped by cluster (preview mode) - Allow user to: supplement keywords, remove outliers, or confirm 3. Before finalizing blueprint: - Optionally allow manual keyword supplementation (CSV, library, manual entry) - Call `POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/` for each source - Resolve conflicts (auto or manual) - Call `POST /api/v1/blueprints/{blueprint_id}/assemble/` to finalize ### 6.4 Testing with Sample Data **Test Case 1: Pet Health E-commerce Site** ```python populated_attributes = [ {"name": "Pet Type", "values": ["Dogs", "Cats"]}, {"name": "Health Condition", "values": ["Arthritis", "Allergies", "Obesity"]}, {"name": "Target Audience", "values": ["Pet Owners", "Veterinarians"]} ] sector_context = { "sector_id": "pet_health", "site_type": "ecommerce", "sector_name": "Pet Health Products" } # Expected clusters: # 1. Dog Arthritis Relief (product_category) # 2. Cat Allergies Nutrition (product_category) # 3. Senior Dog Joint Support (life_stage) # ... etc. ``` **Test Case 2: Local Service (Veterinary Clinic)** ```python populated_attributes = [ {"name": "Service Type", "values": ["Surgery", "Preventive Care", "Emergency"]}, {"name": "Pet Type", "values": ["Dogs", "Cats", "Exotic"]}, {"name": "Location", "values": ["Downtown", "Suburbs"]} ] sector_context = { "sector_id": "vet_clinic", "site_type": "local_service", "sector_name": "Veterinary Clinic" } # Expected clusters: # 1. Emergency Dog Surgery Downtown (local_service + product_category) # 2. Preventive Cat Care Suburbs (informational + local_service) # ... etc. ``` --- ## 7. Cross-Document References ### Upstream Dependencies - **01A (SAG Master Data Models):** Provides SAGBlueprint, SAGAttribute, SAGCluster base models - **01B (Sector Attribute Templates):** Provides attribute framework, keyword templates, site_type configurations ### Downstream Consumers - **01D (Setup Wizard):** Triggers cluster formation & keyword generation after attribute population - **01E (Blueprint-aware Pipeline):** Uses clusters, keywords, taxonomy_plan, execution_priority for content generation - **01F (Existing Site Analysis):** May feed competitor/existing keywords into supplementation process - **01G (Health Monitoring):** Tracks cluster completeness, keyword coverage, content generation progress against blueprint --- ## 8. Appendix: Algorithm Complexity & Performance Estimates ### Cluster Formation Complexity - **Input:** N attributes with M average values each - **Intersections Generated:** O(M²) for 2-value, O(M³) for 3-value - **AI Evaluations:** O(M² or M³) function calls (largest cost) - **Time Estimate:** ~1-2 seconds per 100 intersections (depending on Claude API latency) - **Bottleneck:** Claude API response time for viability evaluation ### Keyword Generation Complexity - **Input:** C clusters, T keyword templates per cluster - **Base Keywords:** O(C × T) (template substitution) - **Long-tail Variants:** O(C × T × V) where V ≈ 7 (base + 6 variants) - **Deduplication:** O(K log K) where K = total keywords (sort-based) - **Time Estimate:** ~3-5 seconds for 300+ keywords ### Blueprint Assembly Complexity - **DB Writes:** O(A + C + K) where A=attributes, C=clusters, K=keywords - **JSON Generation:** O(A + C + K) for denormalization - **Time Estimate:** <1 second for typical blueprints (< 10 MB JSON) --- **Document Complete** **Status:** Ready for Development **Next Step:** Implement Phase 1 (AI Functions) per Section 4