Files
igny8/v2/V2-Execution-Docs/01C-cluster-formation-keyword-engine.md
IGNY8 VPS (Salman) 128b186865 temproary docs uplaoded
2026-03-23 09:02:49 +00:00

58 KiB
Raw Blame History

IGNY8 Phase 1: Cluster Formation & Keyword Engine (Doc 01C)

Document Version: 1.0 Date: 2026-03-23 Phase: Phase 1 - Foundation & Intelligence Status: Build Ready


1. Current State

Existing Components

  • SAGBlueprint (01A): Data model with status tracking, blueprint lifecycle management
  • SAGAttribute & SAGCluster models (01A): Schema definitions for attributes and topic clusters
  • SectorAttributeTemplate (01B): Pre-configured attribute framework with keyword templates per site_type
  • Setup Wizard (01D): Collects sector, site_type, and populated attribute values from user
  • Blueprint Service (01G - earlier iteration): Basic blueprint assembly, denormalization

Current Limitations

  • No automated cluster formation from attribute intersection logic
  • No keyword generation from templates
  • No conflict resolution for multi-cluster keyword assignments
  • No cluster type classification (product, condition, feature, etc.)
  • No validation of cluster viability (size, coherence, user demand)
  • No hub title and supporting content plan generation

Dependencies Ready

  • Sector attribute templates loaded with keyword templates
  • Setup wizard populates attributes
  • Data models support cluster and keyword storage
  • Blueprint lifecycle framework exists

2. What to Build

2.1 Cluster Formation AI Function

File: sag/ai_functions/cluster_formation.py Register Key: 'form_clusters' Triggering Context: After user populates attributes in setup wizard; before keyword assignment

Input Contract

{
    "populated_attributes": [
        {"name": "Target Audience", "values": ["Pet Owners", "Veterinarians"]},
        {"name": "Pet Type", "values": ["Dogs", "Cats"]},
        {"name": "Health Condition", "values": ["Allergies", "Arthritis", "Obesity"]}
    ],
    "sector_context": {
        "sector_id": str,
        "site_type": "ecommerce|saas|blog|local_service",
        "sector_name": str
    },
    "constraints": {
        "max_clusters": 50,  # hard cap per sector
        "min_keywords_per_cluster": 5,
        "max_keywords_per_cluster": 20,
        "optimal_keywords_per_cluster": 7-15
    }
}

Output Contract

{
    "clusters": [
        {
            "id": "cluster_001",
            "title": "Dog Arthritis Relief Solutions",
            "type": "product_category",  # or condition_problem, feature, brand, informational, comparison
            "dimensions": {
                "primary": ["Pet Type: Dogs", "Health Condition: Arthritis"],
                "secondary": ["Target Audience: Pet Owners"]
            },
            "intersection_depth": 3,  # count of dimensional intersections
            "viability_score": 0.92,  # 0-1 based on coherence + demand assessment
            "hub_title": "Best Arthritis Treatments for Dogs",
            "supporting_content_plan": [
                "Senior Dog Arthritis: Causes & Prevention",
                "Dog Arthritis Medications: Complete Guide",
                "Physical Therapy Exercises for Dogs with Arthritis",
                "Diet Changes to Support Joint Health",
                "When to See a Vet About Dog Joint Pain"
            ],
            "keywords": [],  # populated in keyword generation phase
            "dimension_count": 3,
            "validation": {
                "is_real_topical_ecosystem": true,
                "has_search_demand": true,
                "can_support_content_plan": true,
                "sufficient_differentiation": true
            }
        },
        // ... more clusters
    ],
    "summary": {
        "total_clusters_formed": 12,
        "type_distribution": {
            "product_category": 6,
            "condition_problem": 4,
            "feature": 1,
            "brand": 0,
            "informational": 1,
            "comparison": 0
        },
        "avg_intersection_depth": 2.3,
        "clusters_below_viability_threshold": 0
    }
}

Algorithm (Pseudocode)

FUNCTION form_clusters(populated_attributes, sector_context):

    # STEP 1: Generate all 2-value intersections
    all_intersections = []
    for each attribute_pair in populated_attributes:
        for value1 in attribute_pair[0].values:
            for value2 in attribute_pair[1].values:
                intersection = {
                    "dimensions": [value1, value2],
                    "attribute_names": [attribute_pair[0].name, attribute_pair[1].name]
                }
                all_intersections.append(intersection)

    # Also generate 3-value intersections for strong coherence
    for attribute_triplet in populated_attributes (size=3):
        for value1 in attribute_triplet[0].values:
            for value2 in attribute_triplet[1].values:
                for value3 in attribute_triplet[2].values:
                    intersection = {
                        "dimensions": [value1, value2, value3],
                        "attribute_names": [name[0], name[1], name[2]]
                    }
                    all_intersections.append(intersection)

    # STEP 2: AI evaluates each intersection
    valid_clusters = []
    for intersection in all_intersections:
        evaluation = AI_EVALUATE_INTERSECTION(intersection, sector_context):
            - Is this a real topical ecosystem?
            - Would users search for this combination?
            - Can we build a hub + 3-10 supporting articles?
            - Is there sufficient differentiation from other clusters?
            - Does the combination make semantic sense?

        if evaluation.is_valid:
            # STEP 3: Classify cluster type
            cluster_type = AI_CLASSIFY_TYPE(intersection)
                → product_category, condition_problem, feature, brand,
                  informational, comparison

            # STEP 4: Generate hub title + supporting content plan
            hub_title = AI_GENERATE_HUB_TITLE(intersection, sector_context)
            supporting_titles = AI_GENERATE_SUPPORTING_TITLES(
                hub_title,
                intersection,
                count=5-8
            )

            # Create cluster object
            cluster = {
                "dimensions": intersection.dimensions,
                "type": cluster_type,
                "viability_score": evaluation.confidence_score,
                "hub_title": hub_title,
                "supporting_content_plan": supporting_titles,
                "validation": evaluation
            }
            valid_clusters.append(cluster)

    # STEP 4: Apply constraints & filtering
    sorted_clusters = SORT_BY_VIABILITY_SCORE(valid_clusters)
    final_clusters = sorted_clusters[0:max_clusters]

    # STEP 5: Validate distribution & completeness
    distribution = CALCULATE_TYPE_DISTRIBUTION(final_clusters)

    # Flag if any type is severely under-represented
    if distribution.imbalance > THRESHOLD:
        LOG_WARNING("Type distribution may be suboptimal")

    # STEP 6: Return with summary
    return {
        "clusters": final_clusters,
        "summary": {
            "total_clusters": len(final_clusters),
            "type_distribution": distribution,
            "viability_threshold_met": all clusters have score >= 0.70
        }
    }

END FUNCTION

AI Evaluation Criteria

For each intersection, the AI must answer:

  1. Real Topical Ecosystem?

    • Do the dimensions naturally connect in user intent?
    • Is there an existing product/service/solution category?
    • Example: YES - "Dog Arthritis Relief" (real problem + real solutions)
    • Example: NO - "Vegetarian Chainsaw" (nonsensical combination)
  2. User Search Demand?

    • Would users actively search for this combination?
    • Check: keyword templates, search volume patterns, user forums
    • Target: ≥500 monthly searches for hub keyword
  3. Content Support?

    • Can we create 1 hub + 3-10 supporting articles?
    • Is there enough subtopic depth?
    • Example: YES - "Dog Arthritis" can have medication, exercise, diet, vet visits
    • Example: NO - "Red Dog Collar" (too niche, limited subtopics)
  4. Sufficient Differentiation?

    • Does this cluster stand apart from others?
    • Avoid near-duplicate clusters (e.g., "Dog Joint Health" vs "Dog Arthritis")
    • Decision: merge or reject the weaker one
  5. Dimensional Clarity

    • Do all dimensions contribute meaningfully?
    • Remove secondary dimensions that don't add coherence

Hard Constraints

  • Maximum Clusters: 50 per sector (enforce in sorting/filtering)
  • Minimum Keywords per Cluster: 5 (checked in keyword generation)
  • Maximum Keywords per Cluster: 20 (checked in keyword generation)
  • Optimal Range: 7-15 keywords per cluster
  • No Keyword Duplication: Each keyword in exactly one cluster (enforced in conflict resolution)
  • Type Distribution Target:
    • Product/Service Type: 40-50%
    • Condition/Problem: 20-30%
    • Feature: 10-15%
    • Brand: 5-10%
    • Life Stage/Audience: 5-10%

2.2 Keyword Auto-Generation AI Function

File: sag/ai_functions/keyword_generation.py Register Key: 'generate_keywords' Triggering Context: After cluster formation; before blueprint assembly

Input Contract

{
    "clusters": [  # output from cluster_formation
        {
            "id": "cluster_001",
            "dimensions": ["Pet Type: Dogs", "Health Condition: Arthritis"],
            "hub_title": "Best Arthritis Treatments for Dogs",
            "supporting_content_plan": [...]
        }
    ],
    "sector_context": {
        "sector_id": str,
        "site_type": "ecommerce|saas|blog|local_service",
        "site_intent": "sell|inform|book|download"
    },
    "keyword_templates": {  # loaded from SectorAttributeTemplate
        "template_001": "best {health_condition} for {pet_type}",
        "template_002": "{pet_type} {health_condition} treatment",
        // ... more templates
    },
    "constraints": {
        "min_keywords_per_cluster": 10,
        "max_keywords_per_cluster": 25,
        "total_target": "300-500"
    }
}

Output Contract

{
    "keywords_per_cluster": {
        "cluster_001": {
            "keywords": [
                {
                    "keyword": "best arthritis treatment for dogs",
                    "search_volume": 1200,
                    "difficulty": "medium",
                    "intent": "informational",
                    "generated_from": "template_001",
                    "variant_type": "long_tail"
                },
                {
                    "keyword": "dog arthritis remedies",
                    "search_volume": 800,
                    "difficulty": "easy",
                    "intent": "informational",
                    "generated_from": "template_002",
                    "variant_type": "base"
                },
                // ... 13-23 more keywords
            ],
            "keyword_count": 15,
            "primary_intent": "informational",
            "search_volume_total": 12500
        }
    },
    "deduplication": {
        "duplicates_removed": 8,
        "flagged_conflicts": 3  # keywords fitting multiple clusters
    },
    "summary": {
        "total_unique_keywords": 342,
        "per_cluster_avg": 14.25,
        "total_search_volume": 892000,
        "within_constraints": true
    }
}

Algorithm (Pseudocode)

FUNCTION generate_keywords(clusters, sector_context, keyword_templates):

    all_keywords = {}

    FOR EACH cluster IN clusters:

        # STEP 1: Extract attribute values from cluster dimensions
        attribute_values = EXTRACT_ATTRIBUTE_VALUES(cluster.dimensions)
        # Output: {"Pet Type": "Dogs", "Health Condition": "Arthritis", ...}

        cluster_keywords = []

        # STEP 2: Substitute values into templates
        FOR EACH template IN keyword_templates:

            # Check if template requires all attribute values present
            required_attrs = PARSE_TEMPLATE_VARIABLES(template)
            if ALL_ATTRS_AVAILABLE(required_attrs, attribute_values):

                # Substitute values
                base_keyword = SUBSTITUTE_VALUES(template, attribute_values)
                cluster_keywords.append({
                    "keyword": base_keyword,
                    "generated_from": template.id,
                    "variant_type": "base"
                })

        # STEP 3: Generate long-tail variants
        long_tail_variants = []

        FOR EACH base_keyword IN cluster_keywords:

            # "best arthritis treatment for dogs"
            variants = []

            # Variant: Add "best"
            variants.append("best " + base_keyword)

            # Variant: Add "review"
            variants.append(base_keyword + " review")

            # Variant: Add "vs" (comparison)
            if CLUSTER_TYPE in [product_category, comparison]:
                variants.append(base_keyword + " vs alternatives")

            # Variant: Add "for" (audience)
            variants.append(base_keyword + " for seniors")

            # Variant: Add "how to"
            variants.append("how to " + base_keyword)

            # Variant: Add "cost" (ecommerce intent)
            if site_intent == "sell":
                variants.append(base_keyword + " cost")

            FOR EACH variant IN variants:
                if NOT_DUPLICATE(variant, cluster_keywords):
                    cluster_keywords.append({
                        "keyword": variant,
                        "variant_type": "long_tail",
                        "parent": base_keyword
                    })

        # STEP 4: Enrich keywords with metadata
        enriched_keywords = []
        FOR EACH kw IN cluster_keywords:
            enriched = {
                "keyword": kw.keyword,
                "search_volume": ESTIMATE_SEARCH_VOLUME(kw.keyword, sector),
                "difficulty": ESTIMATE_DIFFICULTY(kw.keyword, sector),
                "intent": CLASSIFY_INTENT(kw.keyword),  # informational, transactional, navigational
                "generated_from": kw.generated_from,
                "variant_type": kw.variant_type
            }
            enriched_keywords.append(enriched)

        # STEP 5: Filter & sort
        filtered_keywords = SORT_BY_SEARCH_VOLUME(enriched_keywords)

        # Keep top 10-25 per cluster
        cluster_keywords_final = filtered_keywords[0:25]

        # Validate minimum
        if LEN(cluster_keywords_final) < 10:
            ADD_SUPPLEMENTARY_KEYWORDS(cluster_keywords_final, 5)

        all_keywords[cluster.id] = {
            "keywords": cluster_keywords_final,
            "keyword_count": len(cluster_keywords_final),
            "primary_intent": MODE(intent from all keywords),
            "search_volume_total": SUM(all search volumes)
        }

    # STEP 6: Global deduplication
    all_keywords_flat = FLATTEN(all_keywords)
    duplicates = FIND_DUPLICATES(all_keywords_flat)

    FOR EACH duplicate_set IN duplicates:
        primary_cluster = PRIMARY_CLUSTER(duplicate_set)  # best fit by dimensions
        REASSIGN_DUPLICATES_TO_PRIMARY(duplicate_set, primary_cluster)

    # STEP 7: Validate constraints
    total_keywords = SUM(keyword_count for each cluster)

    validation = {
        "within_min_per_cluster": all clusters >= 10,
        "within_max_per_cluster": all clusters <= 25,
        "total_within_target": total_keywords between 300-500,
        "no_duplicates": len(duplicates) == 0
    }

    if NOT validation.all_true:
        LOG_WARNING("Keyword generation constraints not fully met")

    # STEP 8: Return results
    return {
        "keywords_per_cluster": all_keywords,
        "deduplication": {
            "duplicates_removed": len(duplicates),
            "flagged_conflicts": identify_multi_cluster_fits()
        },
        "summary": {
            "total_unique_keywords": total_keywords,
            "per_cluster_avg": total_keywords / len(clusters),
            "total_search_volume": sum of all volumes,
            "within_constraints": validation.all_true
        }
    }

END FUNCTION

Keyword Template Structure (from SectorAttributeTemplate, 01B)

# Example for Pet Health ecommerce site
keyword_templates = {
    "site_type": "ecommerce",
    "templates": [
        {
            "id": "template_001",
            "pattern": "best {health_condition} treatment for {pet_type}",
            "weight": 5,  # prioritize this template
            "min_required_attrs": ["health_condition", "pet_type"]
        },
        {
            "id": "template_002",
            "pattern": "{pet_type} {health_condition} medication",
            "weight": 4,
            "min_required_attrs": ["pet_type", "health_condition"]
        },
        {
            "id": "template_003",
            "pattern": "affordable {health_condition} relief for {pet_type}",
            "weight": 3,
            "min_required_attrs": ["health_condition", "pet_type"]
        },
        // ... more templates
    ]
}

Long-tail Variant Rules

Variant Type Pattern Use Case Example
Base {keyword} All clusters "dog arthritis relief"
Best/Top best {keyword} All clusters "best dog arthritis relief"
Review {keyword} review Product clusters "arthritis supplement for dogs review"
Comparison {keyword} vs Comparison intent "arthritis medication vs supplement for dogs"
Audience {keyword} for {audience} Audience-specific "dog arthritis relief for senior dogs"
How-to how to {verb} {keyword} Problem-solution "how to manage dog arthritis"
Cost/Price {keyword} cost Ecommerce intent "arthritis treatment for dogs cost"
Quick {keyword} fast Urgency-driven "fast arthritis relief for dogs"

2.3 Blueprint Assembly Service

File: sag/services/blueprint_service.py Primary Function: assemble_blueprint(site, attributes, clusters, keywords) Triggering Context: After keyword generation; creates SAGBlueprint (status=draft)

Input Contract

assemble_blueprint(
    site: Website,  # from 01A
    attributes: List[Tuple[name, values]],  # user-populated
    clusters: List[Dict],  # from cluster_formation()
    keywords: Dict[cluster_id, List[Dict]]  # from generate_keywords()
)

Execution Steps

  1. Create SAGBlueprint Record

    blueprint = SAGBlueprint.objects.create(
        site=site,
        status='draft',
        phase='phase_1_foundation',
        sector_id=site.sector_id,
        created_by=current_user,
        metadata={
            'version': '1.0',
            'created_date': now(),
            'last_modified': now()
        }
    )
    
  2. Create SAGAttribute Records

    FOR EACH (attribute_name, values) IN attributes:
        attribute = SAGAttribute.objects.create(
            blueprint=blueprint,
            name=attribute_name,
            values=values,  # stored as JSON array
            is_primary=DETERMINE_PRIMACY(attribute_name, site.site_type),
            source='user_input'
        )
    
  3. Create SAGCluster Records from Formed Clusters

    FOR EACH cluster IN clusters:
        db_cluster = SAGCluster.objects.create(
            blueprint=blueprint,
            cluster_key=cluster['id'],
            title=cluster['hub_title'],
            description=GENERATE_CLUSTER_DESC(cluster),
            cluster_type=cluster['type'],
            dimensions=cluster['dimensions'],  # JSON
            intersection_depth=cluster['intersection_depth'],
            viability_score=cluster['viability_score'],
            hub_title=cluster['hub_title'],
            supporting_content_plan=cluster['supporting_content_plan'],  # JSON array
            status='draft',
            keyword_count=0  # updated in next step
        )
    
  4. Populate auto_generated_keywords on Each Cluster

    FOR EACH (cluster_id, keyword_list) IN keywords.items():
        cluster = SAGCluster.objects.get(cluster_key=cluster_id)
    
        keyword_records = []
        FOR EACH kw_data IN keyword_list:
            keyword = SAGKeyword.objects.create(
                cluster=cluster,
                keyword_text=kw_data['keyword'],
                search_volume=kw_data['search_volume'],
                difficulty=kw_data['difficulty'],
                intent=kw_data['intent'],
                generated_from=kw_data['generated_from'],
                variant_type=kw_data['variant_type'],
                source='auto_generated'
            )
            keyword_records.append(keyword)
    
        cluster.auto_generated_keywords.set(keyword_records)
        cluster.keyword_count = len(keyword_records)
        cluster.save()
    
  5. Generate Taxonomy Plan

    taxonomy_plan = {
        'wp_categories': [],
        'wp_tags': [],
        'hierarchy': {}
    }
    
    FOR EACH attribute IN blueprint.sagattribute_set.all():
        if attribute.is_primary:
            category = {
                'name': attribute.name,
                'slug': slugify(attribute.name),
                'description': f"Posts about {attribute.name}"
            }
            taxonomy_plan['wp_categories'].append(category)
        else:
            tag = {
                'name': v,
                'slug': slugify(v),
                'parent_category': primary_attr_name
            }
            FOR EACH v IN attribute.values:
                taxonomy_plan['wp_tags'].append(tag)
    
    blueprint.taxonomy_plan = taxonomy_plan  # JSON field
    
  6. Generate Execution Priority (Phased Approach)

    execution_priority = {
        'phase': 'phase_1_hubs',
        'content_sequence': []
    }
    
    # Phase 1: Hub pages (1 per cluster)
    hub_items = []
    FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'):
        hub_items.append({
            'type': 'hub_page',
            'cluster_id': cluster.id,
            'title': cluster.hub_title,
            'priority': 1,
            'estimated_effort': 'high',
            'SEO_impact': 'critical'
        })
    
    execution_priority['content_sequence'].extend(hub_items)
    
    # Phase 2: Supporting content (5-8 articles per cluster)
    supporting_items = []
    FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'):
        FOR EACH content_title IN cluster.supporting_content_plan:
            supporting_items.append({
                'type': 'supporting_article',
                'cluster_id': cluster.id,
                'parent_hub': cluster.hub_title,
                'title': content_title,
                'priority': 2,
                'estimated_effort': 'medium',
                'SEO_impact': 'supporting'
            })
    
    execution_priority['content_sequence'].extend(supporting_items)
    
    # Phase 3: Term/pillar pages (keywords + long-tail)
    term_items = []
    FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'):
        FOR EACH keyword IN cluster.auto_generated_keywords.all():
            term_items.append({
                'type': 'term_page',
                'cluster_id': cluster.id,
                'keyword': keyword.keyword_text,
                'priority': 3,
                'estimated_effort': 'low',
                'SEO_impact': 'supportive'
            })
    
    execution_priority['content_sequence'].extend(term_items)
    
    blueprint.execution_priority = execution_priority  # JSON field
    
  7. Populate Denormalized JSON Fields

    blueprint.attributes_json = {
        'total_attributes': blueprint.sagattribute_set.count(),
        'summary': [
            {
                'name': attr.name,
                'value_count': len(attr.values),
                'values': attr.values,
                'is_primary': attr.is_primary
            }
            FOR EACH attr IN blueprint.sagattribute_set.all()
        ]
    }
    
    blueprint.clusters_json = {
        'total_clusters': blueprint.sagcluster_set.count(),
        'summary': [
            {
                'id': cluster.cluster_key,
                'title': cluster.title,
                'type': cluster.cluster_type,
                'keyword_count': cluster.keyword_count,
                'viability_score': cluster.viability_score
            }
            FOR EACH cluster IN blueprint.sagcluster_set.all()
        ]
    }
    
    blueprint.save()
    
  8. Return Blueprint ID & Status

    return {
        'blueprint_id': blueprint.id,
        'status': 'draft',
        'created_at': blueprint.created_at,
        'summary': {
            'total_attributes': blueprint.sagattribute_set.count(),
            'total_clusters': blueprint.sagcluster_set.count(),
            'total_keywords': SAGKeyword.objects.filter(cluster__blueprint=blueprint).count(),
            'next_step': 'review blueprint in 01E (Pipeline Configuration)'
        }
    }
    

2.4 Manual Keyword Supplementation (User Interface)

Feature: Add Keywords from Multiple Sources

  1. IGNY8 Library Integration

    • Users browse pre-curated keyword library per site_type
    • Select keywords → auto-map to clusters by attribute match
    • Unmatched keywords → flagged for review
  2. Manual Entry

    • Form field: paste or type keywords (comma-separated)
    • System deduplicates against existing
    • Prompts user to assign to cluster(s)
  3. CSV Import

    • Upload CSV with columns: keyword, search_volume (optional), difficulty (optional)
    • Preview & validate before import
    • Bulk assign to clusters or mark for review
  4. Keyword API Integration (optional in Phase 1)

    • Connect to SEMrush, Ahrefs, or similar
    • Fetch keyword suggestions for cluster dimensions
    • User approves additions

Keyword Mapping Logic

FUNCTION map_keyword_to_clusters(new_keyword, clusters, threshold=0.70):

    matches = []

    FOR EACH cluster IN clusters:

        # Extract all attribute values from cluster dimensions
        cluster_attrs = EXTRACT_ATTRIBUTES(cluster.dimensions)

        # Calculate semantic similarity
        similarity = CALCULATE_SIMILARITY(new_keyword, cluster_attrs)

        if similarity > threshold:
            matches.append({
                'cluster_id': cluster.id,
                'cluster_title': cluster.title,
                'similarity_score': similarity
            })

    return matches  # May be 0, 1, or multiple matches

END FUNCTION

Conflict Resolution: Multi-Cluster Keyword Assignment

Problem: A keyword fits multiple clusters (e.g., "arthritis relief for pets" fits both Dog Cluster and Cat Cluster)

Resolution Algorithm:

  1. Identify Multi-Fit Keywords

    potential_conflicts = []
    FOR EACH new_keyword IN keywords_to_add:
        matching_clusters = map_keyword_to_clusters(new_keyword, all_clusters)
        if len(matching_clusters) > 1:
            potential_conflicts.append({
                'keyword': new_keyword,
                'matching_clusters': matching_clusters
            })
    
  2. Apply Decision Criteria (in order)

    • Criterion 1: Dimensional Intersection Count

      • Assign to cluster with MOST dimensional intersections
      • Example: "dog arthritis relief" → Dog cluster has 3 dimensions (pet type, condition, audience); Cat cluster has 2 → assign to Dog cluster
    • Criterion 2: Specificity

      • If tied on intersection count, assign to MORE SPECIFIC cluster
      • Example: "arthritis relief" (general) vs "dog arthritis relief" (specific) → assign to Dog cluster
    • Criterion 3: Primary User Intent Match

      • If still tied, assign to cluster whose hub_title best matches user intent
      • Example: Both Dog & Cat clusters have "arthritis relief" hub; Dog hub is "Best Arthritis Treatments for Dogs" → assign to Dog
    • Criterion 4: Last Resort - Create New Cluster

      • If keyword doesn't fit any cluster well, flag as "potential_new_cluster"
      • User reviews and decides: split existing cluster, merge, or create new
  3. Implementation

    FUNCTION resolve_keyword_conflict(keyword, matching_clusters):
    
        # Step 1: Compare intersection depth
        sorted_by_depth = SORT_BY(matching_clusters, 'intersection_depth', DESC)
        best_by_depth = sorted_by_depth[0]
    
        if sorted_by_depth[0].intersection_depth > sorted_by_depth[1].intersection_depth:
            return best_by_depth
    
        # Step 2: Compare specificity
        specificity_scores = [CALC_SPECIFICITY(cluster, keyword) for cluster in sorted_by_depth]
        best_by_specificity = sorted_by_depth[ARGMAX(specificity_scores)]
    
        if specificity_scores[0] > specificity_scores[1]:
            return best_by_specificity
    
        # Step 3: Compare intent match
        intent_scores = [CALC_INTENT_MATCH(cluster.hub_title, keyword) for cluster in sorted_by_depth]
        best_by_intent = sorted_by_depth[ARGMAX(intent_scores)]
    
        if intent_scores[0] > intent_scores[1]:
            return best_by_intent
    
        # Step 4: Flag for user review
        return {
            'status': 'flagged_for_review',
            'keyword': keyword,
            'candidates': matching_clusters,
            'reason': 'ambiguous_assignment'
        }
    
    END FUNCTION
    

3. Data Models / APIs

3.1 Database Models (Django ORM)

SAGBlueprint (existing from 01A, extended)

class SAGBlueprint(models.Model):
    STATUS_CHOICES = (
        ('draft', 'Draft'),
        ('cluster_formation_complete', 'Cluster Formation Complete'),
        ('keyword_generation_complete', 'Keyword Generation Complete'),
        ('keyword_supplemented', 'Keywords Supplemented'),
        ('ready_for_pipeline', 'Ready for Pipeline'),
        ('published', 'Published'),
    )

    site = models.ForeignKey(Website, on_delete=models.CASCADE)
    status = models.CharField(max_length=50, choices=STATUS_CHOICES, default='draft')
    phase = models.CharField(max_length=50, default='phase_1_foundation')
    sector_id = models.CharField(max_length=100)

    # Denormalized JSON for fast access
    attributes_json = models.JSONField(default=dict, blank=True)
    clusters_json = models.JSONField(default=dict, blank=True)
    taxonomy_plan = models.JSONField(default=dict, blank=True)
    execution_priority = models.JSONField(default=dict, blank=True)

    created_by = models.ForeignKey(User, on_delete=models.SET_NULL, null=True)
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    class Meta:
        db_table = 'sag_blueprint'
        ordering = ['-created_at']

SAGAttribute (existing from 01A, no changes required)

class SAGAttribute(models.Model):
    blueprint = models.ForeignKey(SAGBlueprint, on_delete=models.CASCADE)
    name = models.CharField(max_length=255)
    values = models.JSONField()  # array of strings
    is_primary = models.BooleanField(default=False)
    source = models.CharField(max_length=50)  # 'user_input', 'template', 'api'
    created_at = models.DateTimeField(auto_now_add=True)

    class Meta:
        db_table = 'sag_attribute'
        unique_together = ('blueprint', 'name')

SAGCluster (existing from 01A, extended)

class SAGCluster(models.Model):
    TYPE_CHOICES = (
        ('product_category', 'Product/Service Category'),
        ('condition_problem', 'Condition/Problem'),
        ('feature', 'Feature'),
        ('brand', 'Brand'),
        ('informational', 'Informational'),
        ('comparison', 'Comparison'),
        ('life_stage', 'Life Stage/Audience'),
    )

    STATUS_CHOICES = (
        ('draft', 'Draft'),
        ('validated', 'Validated'),
        ('keyword_assigned', 'Keywords Assigned'),
        ('content_created', 'Content Created'),
    )

    blueprint = models.ForeignKey(SAGBlueprint, on_delete=models.CASCADE)
    cluster_key = models.CharField(max_length=100)  # unique ID from cluster formation
    title = models.CharField(max_length=255)
    description = models.TextField(blank=True)

    cluster_type = models.CharField(max_length=50, choices=TYPE_CHOICES)
    dimensions = models.JSONField()  # ["dimension1", "dimension2", ...]
    intersection_depth = models.IntegerField()  # count of intersecting dimensions
    viability_score = models.FloatField()  # 0-1

    hub_title = models.CharField(max_length=255)
    supporting_content_plan = models.JSONField()  # array of content titles

    auto_generated_keywords = models.ManyToManyField(
        'SAGKeyword',
        related_name='clusters_auto',
        blank=True
    )
    supplemented_keywords = models.ManyToManyField(
        'SAGKeyword',
        related_name='clusters_supplemented',
        blank=True
    )

    keyword_count = models.IntegerField(default=0)
    status = models.CharField(max_length=50, choices=STATUS_CHOICES, default='draft')
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    class Meta:
        db_table = 'sag_cluster'
        unique_together = ('blueprint', 'cluster_key')
        ordering = ['-viability_score']

SAGKeyword (new)

class SAGKeyword(models.Model):
    INTENT_CHOICES = (
        ('informational', 'Informational'),
        ('transactional', 'Transactional'),
        ('navigational', 'Navigational'),
        ('commercial', 'Commercial Intent'),
    )

    VARIANT_TYPES = (
        ('base', 'Base Keyword'),
        ('long_tail', 'Long-tail Variant'),
        ('brand', 'Brand Variant'),
        ('comparison', 'Comparison'),
        ('review', 'Review'),
        ('how_to', 'How-to'),
    )

    SOURCE_CHOICES = (
        ('auto_generated', 'Auto-Generated'),
        ('manual_entry', 'Manual Entry'),
        ('csv_import', 'CSV Import'),
        ('api_fetch', 'API Fetch'),
        ('library', 'IGNY8 Library'),
    )

    cluster = models.ForeignKey(
        SAGCluster,
        on_delete=models.CASCADE,
        related_name='all_keywords'
    )
    keyword_text = models.CharField(max_length=255)
    search_volume = models.IntegerField(null=True, blank=True)
    difficulty = models.CharField(max_length=50, blank=True)  # 'easy', 'medium', 'hard'
    intent = models.CharField(max_length=50, choices=INTENT_CHOICES)

    generated_from = models.CharField(max_length=100, blank=True)  # template ID or source
    variant_type = models.CharField(max_length=50, choices=VARIANT_TYPES)
    source = models.CharField(max_length=50, choices=SOURCE_CHOICES)

    cpc = models.FloatField(null=True, blank=True)  # if available from API
    competition = models.CharField(max_length=50, blank=True)  # 'low', 'medium', 'high'

    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    class Meta:
        db_table = 'sag_keyword'
        unique_together = ('cluster', 'keyword_text')
        ordering = ['-search_volume']

3.2 API Endpoints

POST /api/v1/blueprints/{blueprint_id}/clusters/form/

Purpose: Trigger cluster formation AI function Authentication: Required (JWT) Input:

{
    "populated_attributes": [
        {"name": "Pet Type", "values": ["Dogs", "Cats"]},
        {"name": "Health Condition", "values": ["Allergies", "Arthritis"]}
    ],
    "max_clusters": 50
}

Output:

{
    "clusters": [...],
    "summary": {
        "total_clusters_formed": 12,
        "type_distribution": {...}
    },
    "status": "success"
}

Error Cases:

  • 400: Invalid attributes structure
  • 403: Unauthorized (wrong blueprint owner)
  • 422: Insufficient attributes for cluster formation (< 2 dimensions)

POST /api/v1/blueprints/{blueprint_id}/keywords/generate/

Purpose: Trigger keyword generation AI function Authentication: Required Input:

{
    "use_cluster_ids": ["cluster_001", "cluster_002"],
    "target_keywords_per_cluster": 15,
    "include_long_tail_variants": true
}

Output:

{
    "keywords_per_cluster": {...},
    "deduplication": {
        "duplicates_removed": 5
    },
    "summary": {
        "total_unique_keywords": 180,
        "within_constraints": true
    }
}

POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/

Purpose: Add manual, CSV, library, or API-sourced keywords Authentication: Required Input (Multiple Scenarios):

Scenario 1: Manual Entry

{
    "source": "manual_entry",
    "keywords": ["arthritis relief dogs", "joint pain dogs"],
    "cluster_id": "cluster_001"
}

Scenario 2: CSV Import

{
    "source": "csv_import",
    "csv_url": "https://example.com/keywords.csv",
    "auto_cluster": true
}

Scenario 3: Library Selection

{
    "source": "library",
    "library_keyword_ids": [123, 456, 789],
    "auto_cluster": true
}

Output:

{
    "added_keywords": 10,
    "auto_clustered": 9,
    "flagged_for_review": 1,
    "conflicts_resolved": {
        "reassigned": 2,
        "deferred": 1
    }
}

POST /api/v1/blueprints/{blueprint_id}/assemble/

Purpose: Trigger blueprint assembly (create final SAGBlueprint with all records) Authentication: Required Input:

{
    "finalize_keyword_review": true,
    "set_status": "ready_for_pipeline"
}

Output:

{
    "blueprint_id": 42,
    "status": "ready_for_pipeline",
    "summary": {
        "total_attributes": 4,
        "total_clusters": 12,
        "total_keywords": 180,
        "execution_priority_phases": 3
    }
}

GET /api/v1/blueprints/{blueprint_id}/clusters/?status=draft&type=product_category

Purpose: List clusters with filtering Query Params:

  • status: draft, validated, keyword_assigned, content_created
  • type: product_category, condition_problem, feature, brand, informational, comparison
  • min_viability: 0.70
  • limit: 50, offset: 0

Output:

{
    "results": [
        {
            "id": 1,
            "cluster_key": "cluster_001",
            "title": "Dog Arthritis Relief Solutions",
            "hub_title": "Best Arthritis Treatments for Dogs",
            "keyword_count": 15,
            "viability_score": 0.92,
            "type": "product_category"
        }
    ],
    "total_count": 12,
    "total_keywords": 180
}

GET /api/v1/blueprints/{blueprint_id}/keywords/?cluster_id=cluster_001&source=auto_generated

Purpose: List keywords for a cluster Query Params:

  • cluster_id: filter by cluster
  • source: auto_generated, manual_entry, csv_import, api_fetch, library
  • intent: informational, transactional, navigational
  • min_search_volume: 100
  • order_by: search_volume (DESC), difficulty, intent

Output:

{
    "results": [
        {
            "id": 1,
            "keyword_text": "best arthritis treatment for dogs",
            "search_volume": 1200,
            "difficulty": "medium",
            "intent": "informational",
            "variant_type": "long_tail",
            "source": "auto_generated"
        }
    ],
    "total_count": 15
}

DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/

Purpose: Remove a keyword (before assembly) Authentication: Required Status: Only available if blueprint.status='draft' or 'keyword_generation_complete'


4. Implementation Steps

Phase 1: AI Functions Development (Week 1-2)

Step 1.1: Set up cluster_formation.py structure

  • Create sag/ai_functions/cluster_formation.py
  • Define input/output contracts
  • Implement intersection generation logic (2-value, 3-value)
  • Stub out AI evaluation function (ready for Claude integration)
  • Implement constraint filtering & sorting

Step 1.2: Implement cluster formation AI logic

  • Integrate Claude AI API for cluster viability evaluation
    • Real topical ecosystem check
    • User search demand validation
    • Content support assessment
    • Differentiation evaluation
  • Implement cluster type classification (using embeddings or rule-based logic)
  • Implement hub title & supporting content plan generation
  • Add viability scoring (0-1 scale)
  • Implement distribution validation

Step 1.3: Unit tests for cluster formation

  • Test intersection generation (2-value, 3-value)
  • Test AI evaluation with mock responses
  • Test constraint filtering (max 50 clusters)
  • Test type distribution analysis
  • Test handling of edge cases (0 intersections, all rejected, etc.)

Step 1.4: Create keyword_generation.py structure

  • Create sag/ai_functions/keyword_generation.py
  • Define input/output contracts
  • Implement template substitution logic
  • Implement long-tail variant generation
  • Implement deduplication logic

Step 1.5: Implement keyword generation AI logic

  • Integrate template loading from SectorAttributeTemplate (01B)
  • Implement keyword enrichment (search volume, difficulty, intent)
  • Implement filtering & sorting by search volume
  • Implement constraint validation (10-25 per cluster, 300-500 total)
  • Implement global deduplication & conflict resolution

Step 1.6: Unit tests for keyword generation

  • Test template substitution with various attribute combinations
  • Test long-tail variant generation
  • Test deduplication across clusters
  • Test constraint validation
  • Test conflict resolution (multi-cluster keywords)

Phase 2: Data Models & Service Layer (Week 2-3)

Step 2.1: Database migrations

  • Create SAGKeyword model
  • Add ManyToMany relations to SAGCluster (auto_generated_keywords, supplemented_keywords)
  • Extend SAGBlueprint with denormalized JSON fields (attributes_json, clusters_json, taxonomy_plan, execution_priority)
  • Extend SAGCluster with cluster_key, type, intersection_depth, viability_score, hub_title, supporting_content_plan
  • Run and test migrations on dev database

Step 2.2: Implement blueprint_service.py

  • Create sag/services/blueprint_service.py
  • Implement assemble_blueprint() function with 8 steps
  • Implement SAGBlueprint creation & status management
  • Implement SAGAttribute creation from user input
  • Implement SAGCluster creation from cluster formation results
  • Implement SAGKeyword creation & assignment
  • Implement taxonomy_plan generation
  • Implement execution_priority generation
  • Implement denormalized JSON population

Step 2.3: Unit tests for blueprint_service

  • Test blueprint creation & status transitions
  • Test attribute record creation
  • Test cluster record creation with all fields
  • Test keyword assignment to clusters
  • Test taxonomy plan generation
  • Test execution priority generation
  • Test denormalized JSON accuracy

Phase 3: API Endpoints & Integration (Week 3-4)

Step 3.1: Implement cluster formation API endpoint

  • Create POST /api/v1/blueprints/{blueprint_id}/clusters/form/
  • Validate input attributes
  • Call cluster_formation() AI function
  • Return results with summary
  • Error handling (400, 403, 422)

Step 3.2: Implement keyword generation API endpoint

  • Create POST /api/v1/blueprints/{blueprint_id}/keywords/generate/
  • Validate input & cluster availability
  • Call keyword_generation() AI function
  • Return results with deduplication summary
  • Error handling

Step 3.3: Implement keyword supplementation API endpoint

  • Create POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/
  • Support multiple input sources (manual, CSV, library, API)
  • Implement auto-clustering via map_keyword_to_clusters()
  • Implement conflict resolution via resolve_keyword_conflict()
  • Return summary of added, clustered, flagged keywords

Step 3.4: Implement blueprint assembly API endpoint

  • Create POST /api/v1/blueprints/{blueprint_id}/assemble/
  • Call blueprint_service.assemble_blueprint()
  • Manage status transitions
  • Return blueprint summary with next steps

Step 3.5: Implement read endpoints

  • Create GET /api/v1/blueprints/{blueprint_id}/clusters/?status=draft
  • Create GET /api/v1/blueprints/{blueprint_id}/keywords/?cluster_id=...
  • Implement filtering & pagination
  • Add ordering options

Step 3.6: Implement keyword removal endpoint

  • Create DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/
  • Validate blueprint status (only draft)
  • Cascade delete as needed

Phase 4: Integration with 01D & Testing (Week 4-5)

Step 4.1: Integrate with Setup Wizard (01D)

  • Call cluster_formation() after user populates attributes
  • Display clusters to user for review (optional: allow edits)
  • Call keyword_generation() if user confirms clusters
  • Display keywords for review
  • Allow manual supplementation before final assembly

Step 4.2: End-to-end testing

  • Test full flow: attributes → clusters → keywords → blueprint
  • Test with various sector/site_type combinations
  • Test constraint enforcement
  • Test conflict resolution with real scenarios
  • Performance test with large attribute sets (100+ values)

Step 4.3: Integration with 01E (Pipeline Configuration)

  • Verify blueprint is available to pipeline service
  • Test taxonomy plan usage in content generation
  • Test execution_priority ordering in pipeline

5. Acceptance Criteria

Cluster Formation AI Function (01C-CF)

  • CF-1: Generates all 2-value intersections from populated attributes
  • CF-2: Generates relevant 3-value intersections (at least 50% of possible combinations)
  • CF-3: AI evaluates each intersection on 5 decision criteria (ecosystem, demand, content support, differentiation, clarity)
  • CF-4: Classification assigns correct cluster type (product_category, condition_problem, feature, brand, informational, comparison)
  • CF-5: Hub titles are specific, actionable, and 5-12 words long
  • CF-6: Supporting content plans contain 5-8 titles, semantically related to hub, covering different angles
  • CF-7: Viability scores accurately reflect cluster strength (0-1 scale, with clear rationale)
  • CF-8: Hard constraint enforced: max 50 clusters per sector, sorted by viability score
  • CF-9: Type distribution meets targets: Product/Service 40-50%, Condition/Problem 20-30%, Feature 10-15%, Brand 5-10%, Life Stage 5-10%
  • CF-10: Clusters have 3+ dimensional intersections for strong coherence
  • CF-11: No duplicative clusters (semantic coherence check prevents near-duplicates like "Dog Joint Health" + "Dog Arthritis")
  • CF-12: API response includes summary with cluster count, type distribution, avg intersection depth

Keyword Generation AI Function (01C-KG)

  • KG-1: Loads keyword templates from SectorAttributeTemplate for correct site_type
  • KG-2: Substitutes attribute values into templates to generate base keywords
  • KG-3: Generates long-tail variants (best, review, vs, for, how to) for each base keyword
  • KG-4: Deduplicates keywords across all clusters (no keyword appears twice)
  • KG-5: Global deduplication identifies multi-cluster keywords and reassigns via conflict resolution
  • KG-6: Per-cluster keyword count: 10-25 keywords (soft target 15)
  • KG-7: Total keyword count: 300-500+ for site (configurable per sector)
  • KG-8: Keywords enriched with search volume, difficulty, intent classification
  • KG-9: API response includes per-cluster breakdown, deduplication summary, total keyword count
  • KG-10: Handles missing attribute values gracefully (skips template if required attrs not present)

Keyword Conflict Resolution (01C-CR)

  • CR-1: Identifies keywords matching multiple clusters (≥2 matches)
  • CR-2: Decision Criterion 1: assigns to cluster with most dimensional intersections
  • CR-3: Decision Criterion 2 (tiebreaker): assigns to more specific cluster
  • CR-4: Decision Criterion 3 (tiebreaker): assigns by primary user intent match
  • CR-5: Decision Criterion 4 (last resort): flags for user review with clear reasoning
  • CR-6: Reassignment logic preserves keyword integrity (no loss, duplication, or orphaning)

Blueprint Assembly Service (01C-BA)

  • BA-1: Creates SAGBlueprint record with status='draft'
  • BA-2: Creates SAGAttribute records from populated attributes (preserves name, values, is_primary flag)
  • BA-3: Creates SAGCluster records from cluster formation output (all fields populated)
  • BA-4: Creates SAGKeyword records from keyword generation output (all fields preserved)
  • BA-5: Associates keywords to clusters via ManyToMany relations
  • BA-6: Generates taxonomy_plan with WP categories (primary attributes) and tags (secondary)
  • BA-7: Generates execution_priority with 3 phases: hubs first, supporting articles, term pages
  • BA-8: Populates denormalized JSON fields (attributes_json, clusters_json) for fast queries
  • BA-9: Returns blueprint ID and summary (attribute count, cluster count, keyword count, next steps)
  • BA-10: Status transitions correctly: draft → ready_for_pipeline (or intermediate statuses as needed)

Manual Keyword Supplementation (01C-MKS)

  • MKS-1: Users can add keywords via: manual entry, CSV import, library selection, API fetch
  • MKS-2: Manual entry accepts comma-separated keywords, validates against duplicates
  • MKS-3: CSV import validates file structure (keyword, search_volume optional, difficulty optional)
  • MKS-4: Library integration allows browsing & selection per site_type
  • MKS-5: Auto-clustering maps new keywords to clusters via attribute similarity matching
  • MKS-6: Unmatched keywords flagged for user review: gap analysis, potential new cluster, or outlier
  • MKS-7: User can assign unmatched keywords to specific cluster or create new cluster
  • MKS-8: API returns summary: added count, auto-clustered count, flagged count, conflicts resolved

API Endpoints (01C-API)

  • API-1: POST /api/v1/blueprints/{blueprint_id}/clusters/form/ returns 200 + cluster results
  • API-2: POST /api/v1/blueprints/{blueprint_id}/keywords/generate/ returns 200 + keyword results
  • API-3: POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/ returns 200 + supplementation summary
  • API-4: POST /api/v1/blueprints/{blueprint_id}/assemble/ returns 200 + blueprint summary
  • API-5: GET /api/v1/blueprints/{blueprint_id}/clusters/ supports status, type, min_viability filters
  • API-6: GET /api/v1/blueprints/{blueprint_id}/keywords/ supports cluster_id, source, intent, min_search_volume filters
  • API-7: DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/ only works on draft blueprints
  • API-8: Error handling: 400 (bad input), 403 (unauthorized), 404 (not found), 422 (unprocessable)

Data Integrity (01C-DI)

  • DI-1: No keyword appears in multiple clusters (enforced via unique_together in SAGKeyword)
  • DI-2: Deleted clusters cascade-delete associated keywords (no orphaned keywords)
  • DI-3: Deleted blueprints cascade-delete all attributes, clusters, keywords
  • DI-4: Blueprint status transitions prevent invalid operations (e.g., can't supplement keywords on published blueprint)
  • DI-5: Denormalized JSON fields stay in sync with normalized records (updated on every change)

Performance (01C-PERF)

  • PERF-1: Cluster formation completes in <5 seconds for 100+ intersection combinations
  • PERF-2: Keyword generation completes in <10 seconds for 50 clusters
  • PERF-3: Blueprint assembly completes in <3 seconds (DB writes + JSON generation)
  • PERF-4: GET endpoints with filters return results in <2 seconds
  • PERF-5: CSV import (1000 keywords) completes in <15 seconds

6. Claude Code Instructions

6.1 Generating Cluster Formation Logic

Prompt Template for Claude:

Generate the cluster formation algorithm for an AI-powered content planning system.

Input:
- populated_attributes: List of attributes with values from user setup wizard
  Example: [
    {"name": "Pet Type", "values": ["Dogs", "Cats", "Birds"]},
    {"name": "Health Condition", "values": ["Allergies", "Arthritis", "Obesity"]}
  ]
- sector_context: Information about the sector (e.g., "pet health e-commerce")

Task:
1. Generate all meaningful 2-value intersections (Pet Type × Health Condition, Pet Type × Pet Type, etc.)
2. For each intersection, use Claude's reasoning to evaluate:
   - Is this a real topical ecosystem? (do the dimensions naturally fit together?)
   - Would users search for this? (assess search demand)
   - Can we build 1 hub + 3-8 supporting articles?
   - Is it differentiated from other clusters?
3. Classify valid clusters by type: product_category, condition_problem, feature, brand, informational
4. Generate a compelling hub title and 5-8 supporting content titles
5. Assign a viability score (0-1) based on coherence, search demand, content potential

Output:
- clusters: Array of cluster objects with all fields from the spec
- summary: Total clusters, type distribution, viability analysis

Constraints:
- Max 50 clusters per sector
- Minimum 3 dimensional intersections for strong clusters
- Quality over quantity: prefer 5 strong clusters over 15 weak ones

6.2 Generating Keyword Generation Logic

Prompt Template for Claude:

Generate keywords for content clusters using templates and AI-driven expansion.

Input:
- clusters: Array of clusters from cluster formation (with dimensions and hub title)
- keyword_templates: Pre-configured templates for site_type
  Example: [
    "best {health_condition} for {pet_type}",
    "{pet_type} {health_condition} treatment",
    "affordable {health_condition} relief for {pet_type}"
  ]
- sector_context: Site type (ecommerce, blog, saas, etc.)

Task:
1. Load keyword templates filtered by sector site_type
2. For each cluster:
   - Extract dimension values
   - Substitute values into matching templates
   - Generate long-tail variants: best, review, vs, for, how to
   - Enrich with search volume, difficulty, intent (informational, transactional, etc.)
3. Deduplicate globally across all clusters
4. Identify multi-cluster keywords and resolve conflicts via:
   - Highest dimensional intersection count
   - Most specific cluster (tiebreaker)
   - Primary user intent match (tiebreaker)
5. Validate constraints: 10-25 per cluster, 300-500 total

Output:
- keywords_per_cluster: Keywords organized by cluster ID
- deduplication: Count of duplicates removed, conflicts flagged
- summary: Total unique keywords, per-cluster average, search volume total

Constraints:
- Do NOT generate more than 25 keywords per cluster
- Do NOT allow duplicates
- Prioritize high search volume keywords
- Ensure diversity: mix of base keywords and long-tail variants

6.3 Integrating with Setup Wizard (01D)

Implementation Notes:

  1. After user completes attribute population in wizard:

    • Call POST /api/v1/blueprints/{blueprint_id}/clusters/form/
    • Display clusters to user (preview mode)
    • Allow user to: review, edit (rename hub titles, remove clusters), or confirm
  2. After user confirms clusters:

    • Call POST /api/v1/blueprints/{blueprint_id}/keywords/generate/
    • Display keywords grouped by cluster (preview mode)
    • Allow user to: supplement keywords, remove outliers, or confirm
  3. Before finalizing blueprint:

    • Optionally allow manual keyword supplementation (CSV, library, manual entry)
    • Call POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/ for each source
    • Resolve conflicts (auto or manual)
    • Call POST /api/v1/blueprints/{blueprint_id}/assemble/ to finalize

6.4 Testing with Sample Data

Test Case 1: Pet Health E-commerce Site

populated_attributes = [
    {"name": "Pet Type", "values": ["Dogs", "Cats"]},
    {"name": "Health Condition", "values": ["Arthritis", "Allergies", "Obesity"]},
    {"name": "Target Audience", "values": ["Pet Owners", "Veterinarians"]}
]

sector_context = {
    "sector_id": "pet_health",
    "site_type": "ecommerce",
    "sector_name": "Pet Health Products"
}

# Expected clusters:
# 1. Dog Arthritis Relief (product_category)
# 2. Cat Allergies Nutrition (product_category)
# 3. Senior Dog Joint Support (life_stage)
# ... etc.

Test Case 2: Local Service (Veterinary Clinic)

populated_attributes = [
    {"name": "Service Type", "values": ["Surgery", "Preventive Care", "Emergency"]},
    {"name": "Pet Type", "values": ["Dogs", "Cats", "Exotic"]},
    {"name": "Location", "values": ["Downtown", "Suburbs"]}
]

sector_context = {
    "sector_id": "vet_clinic",
    "site_type": "local_service",
    "sector_name": "Veterinary Clinic"
}

# Expected clusters:
# 1. Emergency Dog Surgery Downtown (local_service + product_category)
# 2. Preventive Cat Care Suburbs (informational + local_service)
# ... etc.

7. Cross-Document References

Upstream Dependencies

  • 01A (SAG Master Data Models): Provides SAGBlueprint, SAGAttribute, SAGCluster base models
  • 01B (Sector Attribute Templates): Provides attribute framework, keyword templates, site_type configurations

Downstream Consumers

  • 01D (Setup Wizard): Triggers cluster formation & keyword generation after attribute population
  • 01E (Blueprint-aware Pipeline): Uses clusters, keywords, taxonomy_plan, execution_priority for content generation
  • 01F (Existing Site Analysis): May feed competitor/existing keywords into supplementation process
  • 01G (Health Monitoring): Tracks cluster completeness, keyword coverage, content generation progress against blueprint

8. Appendix: Algorithm Complexity & Performance Estimates

Cluster Formation Complexity

  • Input: N attributes with M average values each
  • Intersections Generated: O(M²) for 2-value, O(M³) for 3-value
  • AI Evaluations: O(M² or M³) function calls (largest cost)
  • Time Estimate: ~1-2 seconds per 100 intersections (depending on Claude API latency)
  • Bottleneck: Claude API response time for viability evaluation

Keyword Generation Complexity

  • Input: C clusters, T keyword templates per cluster
  • Base Keywords: O(C × T) (template substitution)
  • Long-tail Variants: O(C × T × V) where V ≈ 7 (base + 6 variants)
  • Deduplication: O(K log K) where K = total keywords (sort-based)
  • Time Estimate: ~3-5 seconds for 300+ keywords

Blueprint Assembly Complexity

  • DB Writes: O(A + C + K) where A=attributes, C=clusters, K=keywords
  • JSON Generation: O(A + C + K) for denormalization
  • Time Estimate: <1 second for typical blueprints (< 10 MB JSON)

Document Complete Status: Ready for Development Next Step: Implement Phase 1 (AI Functions) per Section 4