salman/igny8

Fork 0

Files

IGNY8 VPS (Salman) 128b186865 temproary docs uplaoded

2026-03-23 09:02:49 +00:00

58 KiB

Raw Blame History

IGNY8 Phase 1: Cluster Formation & Keyword Engine (Doc 01C)

Document Version: 1.0 Date: 2026-03-23 Phase: Phase 1 - Foundation & Intelligence Status: Build Ready

1. Current State

Existing Components

SAGBlueprint (01A): Data model with status tracking, blueprint lifecycle management
SAGAttribute & SAGCluster models (01A): Schema definitions for attributes and topic clusters
SectorAttributeTemplate (01B): Pre-configured attribute framework with keyword templates per site_type
Setup Wizard (01D): Collects sector, site_type, and populated attribute values from user
Blueprint Service (01G - earlier iteration): Basic blueprint assembly, denormalization

Current Limitations

No automated cluster formation from attribute intersection logic
No keyword generation from templates
No conflict resolution for multi-cluster keyword assignments
No cluster type classification (product, condition, feature, etc.)
No validation of cluster viability (size, coherence, user demand)
No hub title and supporting content plan generation

Dependencies Ready

✅ Sector attribute templates loaded with keyword templates
✅ Setup wizard populates attributes
✅ Data models support cluster and keyword storage
✅ Blueprint lifecycle framework exists

2. What to Build

2.1 Cluster Formation AI Function

File: sag/ai_functions/cluster_formation.py Register Key: 'form_clusters' Triggering Context: After user populates attributes in setup wizard; before keyword assignment

Input Contract

{
    "populated_attributes": [
        {"name": "Target Audience", "values": ["Pet Owners", "Veterinarians"]},
        {"name": "Pet Type", "values": ["Dogs", "Cats"]},
        {"name": "Health Condition", "values": ["Allergies", "Arthritis", "Obesity"]}
    ],
    "sector_context": {
        "sector_id": str,
        "site_type": "ecommerce|saas|blog|local_service",
        "sector_name": str
    },
    "constraints": {
        "max_clusters": 50,  # hard cap per sector
        "min_keywords_per_cluster": 5,
        "max_keywords_per_cluster": 20,
        "optimal_keywords_per_cluster": 7-15
    }
}

Output Contract

{
    "clusters": [
        {
            "id": "cluster_001",
            "title": "Dog Arthritis Relief Solutions",
            "type": "product_category",  # or condition_problem, feature, brand, informational, comparison
            "dimensions": {
                "primary": ["Pet Type: Dogs", "Health Condition: Arthritis"],
                "secondary": ["Target Audience: Pet Owners"]
            },
            "intersection_depth": 3,  # count of dimensional intersections
            "viability_score": 0.92,  # 0-1 based on coherence + demand assessment
            "hub_title": "Best Arthritis Treatments for Dogs",
            "supporting_content_plan": [
                "Senior Dog Arthritis: Causes & Prevention",
                "Dog Arthritis Medications: Complete Guide",
                "Physical Therapy Exercises for Dogs with Arthritis",
                "Diet Changes to Support Joint Health",
                "When to See a Vet About Dog Joint Pain"
            ],
            "keywords": [],  # populated in keyword generation phase
            "dimension_count": 3,
            "validation": {
                "is_real_topical_ecosystem": true,
                "has_search_demand": true,
                "can_support_content_plan": true,
                "sufficient_differentiation": true
            }
        },
        // ... more clusters
    ],
    "summary": {
        "total_clusters_formed": 12,
        "type_distribution": {
            "product_category": 6,
            "condition_problem": 4,
            "feature": 1,
            "brand": 0,
            "informational": 1,
            "comparison": 0
        },
        "avg_intersection_depth": 2.3,
        "clusters_below_viability_threshold": 0
    }
}

Algorithm (Pseudocode)

FUNCTION form_clusters(populated_attributes, sector_context):

    # STEP 1: Generate all 2-value intersections
    all_intersections = []
    for each attribute_pair in populated_attributes:
        for value1 in attribute_pair[0].values:
            for value2 in attribute_pair[1].values:
                intersection = {
                    "dimensions": [value1, value2],
                    "attribute_names": [attribute_pair[0].name, attribute_pair[1].name]
                }
                all_intersections.append(intersection)

    # Also generate 3-value intersections for strong coherence
    for attribute_triplet in populated_attributes (size=3):
        for value1 in attribute_triplet[0].values:
            for value2 in attribute_triplet[1].values:
                for value3 in attribute_triplet[2].values:
                    intersection = {
                        "dimensions": [value1, value2, value3],
                        "attribute_names": [name[0], name[1], name[2]]
                    }
                    all_intersections.append(intersection)

    # STEP 2: AI evaluates each intersection
    valid_clusters = []
    for intersection in all_intersections:
        evaluation = AI_EVALUATE_INTERSECTION(intersection, sector_context):
            - Is this a real topical ecosystem?
            - Would users search for this combination?
            - Can we build a hub + 3-10 supporting articles?
            - Is there sufficient differentiation from other clusters?
            - Does the combination make semantic sense?

        if evaluation.is_valid:
            # STEP 3: Classify cluster type
            cluster_type = AI_CLASSIFY_TYPE(intersection)
                → product_category, condition_problem, feature, brand,
                  informational, comparison

            # STEP 4: Generate hub title + supporting content plan
            hub_title = AI_GENERATE_HUB_TITLE(intersection, sector_context)
            supporting_titles = AI_GENERATE_SUPPORTING_TITLES(
                hub_title,
                intersection,
                count=5-8
            )

            # Create cluster object
            cluster = {
                "dimensions": intersection.dimensions,
                "type": cluster_type,
                "viability_score": evaluation.confidence_score,
                "hub_title": hub_title,
                "supporting_content_plan": supporting_titles,
                "validation": evaluation
            }
            valid_clusters.append(cluster)

    # STEP 4: Apply constraints & filtering
    sorted_clusters = SORT_BY_VIABILITY_SCORE(valid_clusters)
    final_clusters = sorted_clusters[0:max_clusters]

    # STEP 5: Validate distribution & completeness
    distribution = CALCULATE_TYPE_DISTRIBUTION(final_clusters)

    # Flag if any type is severely under-represented
    if distribution.imbalance > THRESHOLD:
        LOG_WARNING("Type distribution may be suboptimal")

    # STEP 6: Return with summary
    return {
        "clusters": final_clusters,
        "summary": {
            "total_clusters": len(final_clusters),
            "type_distribution": distribution,
            "viability_threshold_met": all clusters have score >= 0.70
        }
    }

END FUNCTION

AI Evaluation Criteria

For each intersection, the AI must answer:

Real Topical Ecosystem?
- Do the dimensions naturally connect in user intent?
- Is there an existing product/service/solution category?
- Example: YES - "Dog Arthritis Relief" (real problem + real solutions)
- Example: NO - "Vegetarian Chainsaw" (nonsensical combination)
User Search Demand?
- Would users actively search for this combination?
- Check: keyword templates, search volume patterns, user forums
- Target: ≥500 monthly searches for hub keyword
Content Support?
- Can we create 1 hub + 3-10 supporting articles?
- Is there enough subtopic depth?
- Example: YES - "Dog Arthritis" can have medication, exercise, diet, vet visits
- Example: NO - "Red Dog Collar" (too niche, limited subtopics)
Sufficient Differentiation?
- Does this cluster stand apart from others?
- Avoid near-duplicate clusters (e.g., "Dog Joint Health" vs "Dog Arthritis")
- Decision: merge or reject the weaker one
Dimensional Clarity
- Do all dimensions contribute meaningfully?
- Remove secondary dimensions that don't add coherence

Hard Constraints

Maximum Clusters: 50 per sector (enforce in sorting/filtering)
Minimum Keywords per Cluster: 5 (checked in keyword generation)
Maximum Keywords per Cluster: 20 (checked in keyword generation)
Optimal Range: 7-15 keywords per cluster
No Keyword Duplication: Each keyword in exactly one cluster (enforced in conflict resolution)
Type Distribution Target:
- Product/Service Type: 40-50%
- Condition/Problem: 20-30%
- Feature: 10-15%
- Brand: 5-10%
- Life Stage/Audience: 5-10%

2.2 Keyword Auto-Generation AI Function

File: sag/ai_functions/keyword_generation.py Register Key: 'generate_keywords' Triggering Context: After cluster formation; before blueprint assembly

Input Contract

{
    "clusters": [  # output from cluster_formation
        {
            "id": "cluster_001",
            "dimensions": ["Pet Type: Dogs", "Health Condition: Arthritis"],
            "hub_title": "Best Arthritis Treatments for Dogs",
            "supporting_content_plan": [...]
        }
    ],
    "sector_context": {
        "sector_id": str,
        "site_type": "ecommerce|saas|blog|local_service",
        "site_intent": "sell|inform|book|download"
    },
    "keyword_templates": {  # loaded from SectorAttributeTemplate
        "template_001": "best {health_condition} for {pet_type}",
        "template_002": "{pet_type} {health_condition} treatment",
        // ... more templates
    },
    "constraints": {
        "min_keywords_per_cluster": 10,
        "max_keywords_per_cluster": 25,
        "total_target": "300-500"
    }
}

Output Contract

{
    "keywords_per_cluster": {
        "cluster_001": {
            "keywords": [
                {
                    "keyword": "best arthritis treatment for dogs",
                    "search_volume": 1200,
                    "difficulty": "medium",
                    "intent": "informational",
                    "generated_from": "template_001",
                    "variant_type": "long_tail"
                },
                {
                    "keyword": "dog arthritis remedies",
                    "search_volume": 800,
                    "difficulty": "easy",
                    "intent": "informational",
                    "generated_from": "template_002",
                    "variant_type": "base"
                },
                // ... 13-23 more keywords
            ],
            "keyword_count": 15,
            "primary_intent": "informational",
            "search_volume_total": 12500
        }
    },
    "deduplication": {
        "duplicates_removed": 8,
        "flagged_conflicts": 3  # keywords fitting multiple clusters
    },
    "summary": {
        "total_unique_keywords": 342,
        "per_cluster_avg": 14.25,
        "total_search_volume": 892000,
        "within_constraints": true
    }
}

Algorithm (Pseudocode)

FUNCTION generate_keywords(clusters, sector_context, keyword_templates):

    all_keywords = {}

    FOR EACH cluster IN clusters:

        # STEP 1: Extract attribute values from cluster dimensions
        attribute_values = EXTRACT_ATTRIBUTE_VALUES(cluster.dimensions)
        # Output: {"Pet Type": "Dogs", "Health Condition": "Arthritis", ...}

        cluster_keywords = []

        # STEP 2: Substitute values into templates
        FOR EACH template IN keyword_templates:

            # Check if template requires all attribute values present
            required_attrs = PARSE_TEMPLATE_VARIABLES(template)
            if ALL_ATTRS_AVAILABLE(required_attrs, attribute_values):

                # Substitute values
                base_keyword = SUBSTITUTE_VALUES(template, attribute_values)
                cluster_keywords.append({
                    "keyword": base_keyword,
                    "generated_from": template.id,
                    "variant_type": "base"
                })

        # STEP 3: Generate long-tail variants
        long_tail_variants = []

        FOR EACH base_keyword IN cluster_keywords:

            # "best arthritis treatment for dogs"
            variants = []

            # Variant: Add "best"
            variants.append("best " + base_keyword)

            # Variant: Add "review"
            variants.append(base_keyword + " review")

            # Variant: Add "vs" (comparison)
            if CLUSTER_TYPE in [product_category, comparison]:
                variants.append(base_keyword + " vs alternatives")

            # Variant: Add "for" (audience)
            variants.append(base_keyword + " for seniors")

            # Variant: Add "how to"
            variants.append("how to " + base_keyword)

            # Variant: Add "cost" (ecommerce intent)
            if site_intent == "sell":
                variants.append(base_keyword + " cost")

            FOR EACH variant IN variants:
                if NOT_DUPLICATE(variant, cluster_keywords):
                    cluster_keywords.append({
                        "keyword": variant,
                        "variant_type": "long_tail",
                        "parent": base_keyword
                    })

        # STEP 4: Enrich keywords with metadata
        enriched_keywords = []
        FOR EACH kw IN cluster_keywords:
            enriched = {
                "keyword": kw.keyword,
                "search_volume": ESTIMATE_SEARCH_VOLUME(kw.keyword, sector),
                "difficulty": ESTIMATE_DIFFICULTY(kw.keyword, sector),
                "intent": CLASSIFY_INTENT(kw.keyword),  # informational, transactional, navigational
                "generated_from": kw.generated_from,
                "variant_type": kw.variant_type
            }
            enriched_keywords.append(enriched)

        # STEP 5: Filter & sort
        filtered_keywords = SORT_BY_SEARCH_VOLUME(enriched_keywords)

        # Keep top 10-25 per cluster
        cluster_keywords_final = filtered_keywords[0:25]

        # Validate minimum
        if LEN(cluster_keywords_final) < 10:
            ADD_SUPPLEMENTARY_KEYWORDS(cluster_keywords_final, 5)

        all_keywords[cluster.id] = {
            "keywords": cluster_keywords_final,
            "keyword_count": len(cluster_keywords_final),
            "primary_intent": MODE(intent from all keywords),
            "search_volume_total": SUM(all search volumes)
        }

    # STEP 6: Global deduplication
    all_keywords_flat = FLATTEN(all_keywords)
    duplicates = FIND_DUPLICATES(all_keywords_flat)

    FOR EACH duplicate_set IN duplicates:
        primary_cluster = PRIMARY_CLUSTER(duplicate_set)  # best fit by dimensions
        REASSIGN_DUPLICATES_TO_PRIMARY(duplicate_set, primary_cluster)

    # STEP 7: Validate constraints
    total_keywords = SUM(keyword_count for each cluster)

    validation = {
        "within_min_per_cluster": all clusters >= 10,
        "within_max_per_cluster": all clusters <= 25,
        "total_within_target": total_keywords between 300-500,
        "no_duplicates": len(duplicates) == 0
    }

    if NOT validation.all_true:
        LOG_WARNING("Keyword generation constraints not fully met")

    # STEP 8: Return results
    return {
        "keywords_per_cluster": all_keywords,
        "deduplication": {
            "duplicates_removed": len(duplicates),
            "flagged_conflicts": identify_multi_cluster_fits()
        },
        "summary": {
            "total_unique_keywords": total_keywords,
            "per_cluster_avg": total_keywords / len(clusters),
            "total_search_volume": sum of all volumes,
            "within_constraints": validation.all_true
        }
    }

END FUNCTION

Keyword Template Structure (from SectorAttributeTemplate, 01B)

# Example for Pet Health ecommerce site
keyword_templates = {
    "site_type": "ecommerce",
    "templates": [
        {
            "id": "template_001",
            "pattern": "best {health_condition} treatment for {pet_type}",
            "weight": 5,  # prioritize this template
            "min_required_attrs": ["health_condition", "pet_type"]
        },
        {
            "id": "template_002",
            "pattern": "{pet_type} {health_condition} medication",
            "weight": 4,
            "min_required_attrs": ["pet_type", "health_condition"]
        },
        {
            "id": "template_003",
            "pattern": "affordable {health_condition} relief for {pet_type}",
            "weight": 3,
            "min_required_attrs": ["health_condition", "pet_type"]
        },
        // ... more templates
    ]
}

Long-tail Variant Rules

Variant Type	Pattern	Use Case	Example
Base	{keyword}	All clusters	"dog arthritis relief"
Best/Top	best {keyword}	All clusters	"best dog arthritis relief"
Review	{keyword} review	Product clusters	"arthritis supplement for dogs review"
Comparison	{keyword} vs	Comparison intent	"arthritis medication vs supplement for dogs"
Audience	{keyword} for {audience}	Audience-specific	"dog arthritis relief for senior dogs"
How-to	how to {verb} {keyword}	Problem-solution	"how to manage dog arthritis"
Cost/Price	{keyword} cost	Ecommerce intent	"arthritis treatment for dogs cost"
Quick	{keyword} fast	Urgency-driven	"fast arthritis relief for dogs"

2.3 Blueprint Assembly Service

File: sag/services/blueprint_service.py Primary Function: assemble_blueprint(site, attributes, clusters, keywords) Triggering Context: After keyword generation; creates SAGBlueprint (status=draft)

Input Contract

assemble_blueprint(
    site: Website,  # from 01A
    attributes: List[Tuple[name, values]],  # user-populated
    clusters: List[Dict],  # from cluster_formation()
    keywords: Dict[cluster_id, List[Dict]]  # from generate_keywords()
)

Execution Steps

Create SAGBlueprint Record

blueprint = SAGBlueprint.objects.create(
    site=site,
    status='draft',
    phase='phase_1_foundation',
    sector_id=site.sector_id,
    created_by=current_user,
    metadata={
        'version': '1.0',
        'created_date': now(),
        'last_modified': now()
    }
)

Create SAGAttribute Records

FOR EACH (attribute_name, values) IN attributes:
    attribute = SAGAttribute.objects.create(
        blueprint=blueprint,
        name=attribute_name,
        values=values,  # stored as JSON array
        is_primary=DETERMINE_PRIMACY(attribute_name, site.site_type),
        source='user_input'
    )

Create SAGCluster Records from Formed Clusters

FOR EACH cluster IN clusters:
    db_cluster = SAGCluster.objects.create(
        blueprint=blueprint,
        cluster_key=cluster['id'],
        title=cluster['hub_title'],
        description=GENERATE_CLUSTER_DESC(cluster),
        cluster_type=cluster['type'],
        dimensions=cluster['dimensions'],  # JSON
        intersection_depth=cluster['intersection_depth'],
        viability_score=cluster['viability_score'],
        hub_title=cluster['hub_title'],
        supporting_content_plan=cluster['supporting_content_plan'],  # JSON array
        status='draft',
        keyword_count=0  # updated in next step
    )

Populate auto_generated_keywords on Each Cluster

FOR EACH (cluster_id, keyword_list) IN keywords.items():
    cluster = SAGCluster.objects.get(cluster_key=cluster_id)

    keyword_records = []
    FOR EACH kw_data IN keyword_list:
        keyword = SAGKeyword.objects.create(
            cluster=cluster,
            keyword_text=kw_data['keyword'],
            search_volume=kw_data['search_volume'],
            difficulty=kw_data['difficulty'],
            intent=kw_data['intent'],
            generated_from=kw_data['generated_from'],
            variant_type=kw_data['variant_type'],
            source='auto_generated'
        )
        keyword_records.append(keyword)

    cluster.auto_generated_keywords.set(keyword_records)
    cluster.keyword_count = len(keyword_records)
    cluster.save()

Generate Taxonomy Plan

taxonomy_plan = {
    'wp_categories': [],
    'wp_tags': [],
    'hierarchy': {}
}

FOR EACH attribute IN blueprint.sagattribute_set.all():
    if attribute.is_primary:
        category = {
            'name': attribute.name,
            'slug': slugify(attribute.name),
            'description': f"Posts about {attribute.name}"
        }
        taxonomy_plan['wp_categories'].append(category)
    else:
        tag = {
            'name': v,
            'slug': slugify(v),
            'parent_category': primary_attr_name
        }
        FOR EACH v IN attribute.values:
            taxonomy_plan['wp_tags'].append(tag)

blueprint.taxonomy_plan = taxonomy_plan  # JSON field

Generate Execution Priority (Phased Approach)

execution_priority = {
    'phase': 'phase_1_hubs',
    'content_sequence': []
}

# Phase 1: Hub pages (1 per cluster)
hub_items = []
FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'):
    hub_items.append({
        'type': 'hub_page',
        'cluster_id': cluster.id,
        'title': cluster.hub_title,
        'priority': 1,
        'estimated_effort': 'high',
        'SEO_impact': 'critical'
    })

execution_priority['content_sequence'].extend(hub_items)

# Phase 2: Supporting content (5-8 articles per cluster)
supporting_items = []
FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'):
    FOR EACH content_title IN cluster.supporting_content_plan:
        supporting_items.append({
            'type': 'supporting_article',
            'cluster_id': cluster.id,
            'parent_hub': cluster.hub_title,
            'title': content_title,
            'priority': 2,
            'estimated_effort': 'medium',
            'SEO_impact': 'supporting'
        })

execution_priority['content_sequence'].extend(supporting_items)

# Phase 3: Term/pillar pages (keywords + long-tail)
term_items = []
FOR EACH cluster IN blueprint.sagcluster_set.filter(status='draft'):
    FOR EACH keyword IN cluster.auto_generated_keywords.all():
        term_items.append({
            'type': 'term_page',
            'cluster_id': cluster.id,
            'keyword': keyword.keyword_text,
            'priority': 3,
            'estimated_effort': 'low',
            'SEO_impact': 'supportive'
        })

execution_priority['content_sequence'].extend(term_items)

blueprint.execution_priority = execution_priority  # JSON field

Populate Denormalized JSON Fields

blueprint.attributes_json = {
    'total_attributes': blueprint.sagattribute_set.count(),
    'summary': [
        {
            'name': attr.name,
            'value_count': len(attr.values),
            'values': attr.values,
            'is_primary': attr.is_primary
        }
        FOR EACH attr IN blueprint.sagattribute_set.all()
    ]
}

blueprint.clusters_json = {
    'total_clusters': blueprint.sagcluster_set.count(),
    'summary': [
        {
            'id': cluster.cluster_key,
            'title': cluster.title,
            'type': cluster.cluster_type,
            'keyword_count': cluster.keyword_count,
            'viability_score': cluster.viability_score
        }
        FOR EACH cluster IN blueprint.sagcluster_set.all()
    ]
}

blueprint.save()

Return Blueprint ID & Status

return {
    'blueprint_id': blueprint.id,
    'status': 'draft',
    'created_at': blueprint.created_at,
    'summary': {
        'total_attributes': blueprint.sagattribute_set.count(),
        'total_clusters': blueprint.sagcluster_set.count(),
        'total_keywords': SAGKeyword.objects.filter(cluster__blueprint=blueprint).count(),
        'next_step': 'review blueprint in 01E (Pipeline Configuration)'
    }
}

2.4 Manual Keyword Supplementation (User Interface)

Feature: Add Keywords from Multiple Sources

IGNY8 Library Integration
- Users browse pre-curated keyword library per site_type
- Select keywords → auto-map to clusters by attribute match
- Unmatched keywords → flagged for review
Manual Entry
- Form field: paste or type keywords (comma-separated)
- System deduplicates against existing
- Prompts user to assign to cluster(s)
CSV Import
- Upload CSV with columns: keyword, search_volume (optional), difficulty (optional)
- Preview & validate before import
- Bulk assign to clusters or mark for review
Keyword API Integration (optional in Phase 1)
- Connect to SEMrush, Ahrefs, or similar
- Fetch keyword suggestions for cluster dimensions
- User approves additions

Keyword Mapping Logic

FUNCTION map_keyword_to_clusters(new_keyword, clusters, threshold=0.70):

    matches = []

    FOR EACH cluster IN clusters:

        # Extract all attribute values from cluster dimensions
        cluster_attrs = EXTRACT_ATTRIBUTES(cluster.dimensions)

        # Calculate semantic similarity
        similarity = CALCULATE_SIMILARITY(new_keyword, cluster_attrs)

        if similarity > threshold:
            matches.append({
                'cluster_id': cluster.id,
                'cluster_title': cluster.title,
                'similarity_score': similarity
            })

    return matches  # May be 0, 1, or multiple matches

END FUNCTION

Conflict Resolution: Multi-Cluster Keyword Assignment

Problem: A keyword fits multiple clusters (e.g., "arthritis relief for pets" fits both Dog Cluster and Cat Cluster)

Resolution Algorithm:

Identify Multi-Fit Keywords

potential_conflicts = []
FOR EACH new_keyword IN keywords_to_add:
    matching_clusters = map_keyword_to_clusters(new_keyword, all_clusters)
    if len(matching_clusters) > 1:
        potential_conflicts.append({
            'keyword': new_keyword,
            'matching_clusters': matching_clusters
        })

Apply Decision Criteria (in order)
- Criterion 1: Dimensional Intersection Count
  - Assign to cluster with MOST dimensional intersections
  - Example: "dog arthritis relief" → Dog cluster has 3 dimensions (pet type, condition, audience); Cat cluster has 2 → assign to Dog cluster
- Criterion 2: Specificity
  - If tied on intersection count, assign to MORE SPECIFIC cluster
  - Example: "arthritis relief" (general) vs "dog arthritis relief" (specific) → assign to Dog cluster
- Criterion 3: Primary User Intent Match
  - If still tied, assign to cluster whose hub_title best matches user intent
  - Example: Both Dog & Cat clusters have "arthritis relief" hub; Dog hub is "Best Arthritis Treatments for Dogs" → assign to Dog
- Criterion 4: Last Resort - Create New Cluster
  - If keyword doesn't fit any cluster well, flag as "potential_new_cluster"
  - User reviews and decides: split existing cluster, merge, or create new

Implementation

FUNCTION resolve_keyword_conflict(keyword, matching_clusters):

    # Step 1: Compare intersection depth
    sorted_by_depth = SORT_BY(matching_clusters, 'intersection_depth', DESC)
    best_by_depth = sorted_by_depth[0]

    if sorted_by_depth[0].intersection_depth > sorted_by_depth[1].intersection_depth:
        return best_by_depth

    # Step 2: Compare specificity
    specificity_scores = [CALC_SPECIFICITY(cluster, keyword) for cluster in sorted_by_depth]
    best_by_specificity = sorted_by_depth[ARGMAX(specificity_scores)]

    if specificity_scores[0] > specificity_scores[1]:
        return best_by_specificity

    # Step 3: Compare intent match
    intent_scores = [CALC_INTENT_MATCH(cluster.hub_title, keyword) for cluster in sorted_by_depth]
    best_by_intent = sorted_by_depth[ARGMAX(intent_scores)]

    if intent_scores[0] > intent_scores[1]:
        return best_by_intent

    # Step 4: Flag for user review
    return {
        'status': 'flagged_for_review',
        'keyword': keyword,
        'candidates': matching_clusters,
        'reason': 'ambiguous_assignment'
    }

END FUNCTION

3. Data Models / APIs

3.1 Database Models (Django ORM)

SAGBlueprint (existing from 01A, extended)

class SAGBlueprint(models.Model):
    STATUS_CHOICES = (
        ('draft', 'Draft'),
        ('cluster_formation_complete', 'Cluster Formation Complete'),
        ('keyword_generation_complete', 'Keyword Generation Complete'),
        ('keyword_supplemented', 'Keywords Supplemented'),
        ('ready_for_pipeline', 'Ready for Pipeline'),
        ('published', 'Published'),
    )

    site = models.ForeignKey(Website, on_delete=models.CASCADE)
    status = models.CharField(max_length=50, choices=STATUS_CHOICES, default='draft')
    phase = models.CharField(max_length=50, default='phase_1_foundation')
    sector_id = models.CharField(max_length=100)

    # Denormalized JSON for fast access
    attributes_json = models.JSONField(default=dict, blank=True)
    clusters_json = models.JSONField(default=dict, blank=True)
    taxonomy_plan = models.JSONField(default=dict, blank=True)
    execution_priority = models.JSONField(default=dict, blank=True)

    created_by = models.ForeignKey(User, on_delete=models.SET_NULL, null=True)
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    class Meta:
        db_table = 'sag_blueprint'
        ordering = ['-created_at']

SAGAttribute (existing from 01A, no changes required)

class SAGAttribute(models.Model):
    blueprint = models.ForeignKey(SAGBlueprint, on_delete=models.CASCADE)
    name = models.CharField(max_length=255)
    values = models.JSONField()  # array of strings
    is_primary = models.BooleanField(default=False)
    source = models.CharField(max_length=50)  # 'user_input', 'template', 'api'
    created_at = models.DateTimeField(auto_now_add=True)

    class Meta:
        db_table = 'sag_attribute'
        unique_together = ('blueprint', 'name')

SAGCluster (existing from 01A, extended)

class SAGCluster(models.Model):
    TYPE_CHOICES = (
        ('product_category', 'Product/Service Category'),
        ('condition_problem', 'Condition/Problem'),
        ('feature', 'Feature'),
        ('brand', 'Brand'),
        ('informational', 'Informational'),
        ('comparison', 'Comparison'),
        ('life_stage', 'Life Stage/Audience'),
    )

    STATUS_CHOICES = (
        ('draft', 'Draft'),
        ('validated', 'Validated'),
        ('keyword_assigned', 'Keywords Assigned'),
        ('content_created', 'Content Created'),
    )

    blueprint = models.ForeignKey(SAGBlueprint, on_delete=models.CASCADE)
    cluster_key = models.CharField(max_length=100)  # unique ID from cluster formation
    title = models.CharField(max_length=255)
    description = models.TextField(blank=True)

    cluster_type = models.CharField(max_length=50, choices=TYPE_CHOICES)
    dimensions = models.JSONField()  # ["dimension1", "dimension2", ...]
    intersection_depth = models.IntegerField()  # count of intersecting dimensions
    viability_score = models.FloatField()  # 0-1

    hub_title = models.CharField(max_length=255)
    supporting_content_plan = models.JSONField()  # array of content titles

    auto_generated_keywords = models.ManyToManyField(
        'SAGKeyword',
        related_name='clusters_auto',
        blank=True
    )
    supplemented_keywords = models.ManyToManyField(
        'SAGKeyword',
        related_name='clusters_supplemented',
        blank=True
    )

    keyword_count = models.IntegerField(default=0)
    status = models.CharField(max_length=50, choices=STATUS_CHOICES, default='draft')
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    class Meta:
        db_table = 'sag_cluster'
        unique_together = ('blueprint', 'cluster_key')
        ordering = ['-viability_score']

SAGKeyword (new)

class SAGKeyword(models.Model):
    INTENT_CHOICES = (
        ('informational', 'Informational'),
        ('transactional', 'Transactional'),
        ('navigational', 'Navigational'),
        ('commercial', 'Commercial Intent'),
    )

    VARIANT_TYPES = (
        ('base', 'Base Keyword'),
        ('long_tail', 'Long-tail Variant'),
        ('brand', 'Brand Variant'),
        ('comparison', 'Comparison'),
        ('review', 'Review'),
        ('how_to', 'How-to'),
    )

    SOURCE_CHOICES = (
        ('auto_generated', 'Auto-Generated'),
        ('manual_entry', 'Manual Entry'),
        ('csv_import', 'CSV Import'),
        ('api_fetch', 'API Fetch'),
        ('library', 'IGNY8 Library'),
    )

    cluster = models.ForeignKey(
        SAGCluster,
        on_delete=models.CASCADE,
        related_name='all_keywords'
    )
    keyword_text = models.CharField(max_length=255)
    search_volume = models.IntegerField(null=True, blank=True)
    difficulty = models.CharField(max_length=50, blank=True)  # 'easy', 'medium', 'hard'
    intent = models.CharField(max_length=50, choices=INTENT_CHOICES)

    generated_from = models.CharField(max_length=100, blank=True)  # template ID or source
    variant_type = models.CharField(max_length=50, choices=VARIANT_TYPES)
    source = models.CharField(max_length=50, choices=SOURCE_CHOICES)

    cpc = models.FloatField(null=True, blank=True)  # if available from API
    competition = models.CharField(max_length=50, blank=True)  # 'low', 'medium', 'high'

    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    class Meta:
        db_table = 'sag_keyword'
        unique_together = ('cluster', 'keyword_text')
        ordering = ['-search_volume']

3.2 API Endpoints

POST /api/v1/blueprints/{blueprint_id}/clusters/form/

Purpose: Trigger cluster formation AI function Authentication: Required (JWT) Input:

{
    "populated_attributes": [
        {"name": "Pet Type", "values": ["Dogs", "Cats"]},
        {"name": "Health Condition", "values": ["Allergies", "Arthritis"]}
    ],
    "max_clusters": 50
}

Output:

{
    "clusters": [...],
    "summary": {
        "total_clusters_formed": 12,
        "type_distribution": {...}
    },
    "status": "success"
}

Error Cases:

400: Invalid attributes structure
403: Unauthorized (wrong blueprint owner)
422: Insufficient attributes for cluster formation (< 2 dimensions)

POST /api/v1/blueprints/{blueprint_id}/keywords/generate/

Purpose: Trigger keyword generation AI function Authentication: Required Input:

{
    "use_cluster_ids": ["cluster_001", "cluster_002"],
    "target_keywords_per_cluster": 15,
    "include_long_tail_variants": true
}

Output:

{
    "keywords_per_cluster": {...},
    "deduplication": {
        "duplicates_removed": 5
    },
    "summary": {
        "total_unique_keywords": 180,
        "within_constraints": true
    }
}

POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/

Purpose: Add manual, CSV, library, or API-sourced keywords Authentication: Required Input (Multiple Scenarios):

Scenario 1: Manual Entry

{
    "source": "manual_entry",
    "keywords": ["arthritis relief dogs", "joint pain dogs"],
    "cluster_id": "cluster_001"
}

Scenario 2: CSV Import

{
    "source": "csv_import",
    "csv_url": "https://example.com/keywords.csv",
    "auto_cluster": true
}

Scenario 3: Library Selection

{
    "source": "library",
    "library_keyword_ids": [123, 456, 789],
    "auto_cluster": true
}

Output:

{
    "added_keywords": 10,
    "auto_clustered": 9,
    "flagged_for_review": 1,
    "conflicts_resolved": {
        "reassigned": 2,
        "deferred": 1
    }
}

POST /api/v1/blueprints/{blueprint_id}/assemble/

Purpose: Trigger blueprint assembly (create final SAGBlueprint with all records) Authentication: Required Input:

{
    "finalize_keyword_review": true,
    "set_status": "ready_for_pipeline"
}

Output:

{
    "blueprint_id": 42,
    "status": "ready_for_pipeline",
    "summary": {
        "total_attributes": 4,
        "total_clusters": 12,
        "total_keywords": 180,
        "execution_priority_phases": 3
    }
}

GET /api/v1/blueprints/{blueprint_id}/clusters/?status=draft&type=product_category

Purpose: List clusters with filtering Query Params:

status: draft, validated, keyword_assigned, content_created
type: product_category, condition_problem, feature, brand, informational, comparison
min_viability: 0.70
limit: 50, offset: 0

Output:

{
    "results": [
        {
            "id": 1,
            "cluster_key": "cluster_001",
            "title": "Dog Arthritis Relief Solutions",
            "hub_title": "Best Arthritis Treatments for Dogs",
            "keyword_count": 15,
            "viability_score": 0.92,
            "type": "product_category"
        }
    ],
    "total_count": 12,
    "total_keywords": 180
}

GET /api/v1/blueprints/{blueprint_id}/keywords/?cluster_id=cluster_001&source=auto_generated

Purpose: List keywords for a cluster Query Params:

cluster_id: filter by cluster
source: auto_generated, manual_entry, csv_import, api_fetch, library
intent: informational, transactional, navigational
min_search_volume: 100
order_by: search_volume (DESC), difficulty, intent

Output:

{
    "results": [
        {
            "id": 1,
            "keyword_text": "best arthritis treatment for dogs",
            "search_volume": 1200,
            "difficulty": "medium",
            "intent": "informational",
            "variant_type": "long_tail",
            "source": "auto_generated"
        }
    ],
    "total_count": 15
}

DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/

Purpose: Remove a keyword (before assembly) Authentication: Required Status: Only available if blueprint.status='draft' or 'keyword_generation_complete'

4. Implementation Steps

Phase 1: AI Functions Development (Week 1-2)

Step 1.1: Set up cluster_formation.py structure

Create sag/ai_functions/cluster_formation.py
Define input/output contracts
Implement intersection generation logic (2-value, 3-value)
Stub out AI evaluation function (ready for Claude integration)
Implement constraint filtering & sorting

Step 1.2: Implement cluster formation AI logic

Integrate Claude AI API for cluster viability evaluation
- Real topical ecosystem check
- User search demand validation
- Content support assessment
- Differentiation evaluation
Implement cluster type classification (using embeddings or rule-based logic)
Implement hub title & supporting content plan generation
Add viability scoring (0-1 scale)
Implement distribution validation

Step 1.3: Unit tests for cluster formation

Test intersection generation (2-value, 3-value)
Test AI evaluation with mock responses
Test constraint filtering (max 50 clusters)
Test type distribution analysis
Test handling of edge cases (0 intersections, all rejected, etc.)

Step 1.4: Create keyword_generation.py structure

Create sag/ai_functions/keyword_generation.py
Define input/output contracts
Implement template substitution logic
Implement long-tail variant generation
Implement deduplication logic

Step 1.5: Implement keyword generation AI logic

Integrate template loading from SectorAttributeTemplate (01B)
Implement keyword enrichment (search volume, difficulty, intent)
Implement filtering & sorting by search volume
Implement constraint validation (10-25 per cluster, 300-500 total)
Implement global deduplication & conflict resolution

Step 1.6: Unit tests for keyword generation

Test template substitution with various attribute combinations
Test long-tail variant generation
Test deduplication across clusters
Test constraint validation
Test conflict resolution (multi-cluster keywords)

Phase 2: Data Models & Service Layer (Week 2-3)

Step 2.1: Database migrations

Create SAGKeyword model
Add ManyToMany relations to SAGCluster (auto_generated_keywords, supplemented_keywords)
Extend SAGBlueprint with denormalized JSON fields (attributes_json, clusters_json, taxonomy_plan, execution_priority)
Extend SAGCluster with cluster_key, type, intersection_depth, viability_score, hub_title, supporting_content_plan
Run and test migrations on dev database

Step 2.2: Implement blueprint_service.py

Create sag/services/blueprint_service.py
Implement assemble_blueprint() function with 8 steps
Implement SAGBlueprint creation & status management
Implement SAGAttribute creation from user input
Implement SAGCluster creation from cluster formation results
Implement SAGKeyword creation & assignment
Implement taxonomy_plan generation
Implement execution_priority generation
Implement denormalized JSON population

Step 2.3: Unit tests for blueprint_service

Test blueprint creation & status transitions
Test attribute record creation
Test cluster record creation with all fields
Test keyword assignment to clusters
Test taxonomy plan generation
Test execution priority generation
Test denormalized JSON accuracy

Phase 3: API Endpoints & Integration (Week 3-4)

Step 3.1: Implement cluster formation API endpoint

Create POST /api/v1/blueprints/{blueprint_id}/clusters/form/
Validate input attributes
Call cluster_formation() AI function
Return results with summary
Error handling (400, 403, 422)

Step 3.2: Implement keyword generation API endpoint

Create POST /api/v1/blueprints/{blueprint_id}/keywords/generate/
Validate input & cluster availability
Call keyword_generation() AI function
Return results with deduplication summary
Error handling

Step 3.3: Implement keyword supplementation API endpoint

Create POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/
Support multiple input sources (manual, CSV, library, API)
Implement auto-clustering via map_keyword_to_clusters()
Implement conflict resolution via resolve_keyword_conflict()
Return summary of added, clustered, flagged keywords

Step 3.4: Implement blueprint assembly API endpoint

Create POST /api/v1/blueprints/{blueprint_id}/assemble/
Call blueprint_service.assemble_blueprint()
Manage status transitions
Return blueprint summary with next steps

Step 3.5: Implement read endpoints

Create GET /api/v1/blueprints/{blueprint_id}/clusters/?status=draft
Create GET /api/v1/blueprints/{blueprint_id}/keywords/?cluster_id=...
Implement filtering & pagination
Add ordering options

Step 3.6: Implement keyword removal endpoint

Create DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/
Validate blueprint status (only draft)
Cascade delete as needed

Phase 4: Integration with 01D & Testing (Week 4-5)

Step 4.1: Integrate with Setup Wizard (01D)

Call cluster_formation() after user populates attributes
Display clusters to user for review (optional: allow edits)
Call keyword_generation() if user confirms clusters
Display keywords for review
Allow manual supplementation before final assembly

Step 4.2: End-to-end testing

Test full flow: attributes → clusters → keywords → blueprint
Test with various sector/site_type combinations
Test constraint enforcement
Test conflict resolution with real scenarios
Performance test with large attribute sets (100+ values)

Step 4.3: Integration with 01E (Pipeline Configuration)

Verify blueprint is available to pipeline service
Test taxonomy plan usage in content generation
Test execution_priority ordering in pipeline

5. Acceptance Criteria

Cluster Formation AI Function (01C-CF)

CF-1: Generates all 2-value intersections from populated attributes
CF-2: Generates relevant 3-value intersections (at least 50% of possible combinations)
CF-3: AI evaluates each intersection on 5 decision criteria (ecosystem, demand, content support, differentiation, clarity)
CF-4: Classification assigns correct cluster type (product_category, condition_problem, feature, brand, informational, comparison)
CF-5: Hub titles are specific, actionable, and 5-12 words long
CF-6: Supporting content plans contain 5-8 titles, semantically related to hub, covering different angles
CF-7: Viability scores accurately reflect cluster strength (0-1 scale, with clear rationale)
CF-8: Hard constraint enforced: max 50 clusters per sector, sorted by viability score
CF-9: Type distribution meets targets: Product/Service 40-50%, Condition/Problem 20-30%, Feature 10-15%, Brand 5-10%, Life Stage 5-10%
CF-10: Clusters have 3+ dimensional intersections for strong coherence
CF-11: No duplicative clusters (semantic coherence check prevents near-duplicates like "Dog Joint Health" + "Dog Arthritis")
CF-12: API response includes summary with cluster count, type distribution, avg intersection depth

Keyword Generation AI Function (01C-KG)

KG-1: Loads keyword templates from SectorAttributeTemplate for correct site_type
KG-2: Substitutes attribute values into templates to generate base keywords
KG-3: Generates long-tail variants (best, review, vs, for, how to) for each base keyword
KG-4: Deduplicates keywords across all clusters (no keyword appears twice)
KG-5: Global deduplication identifies multi-cluster keywords and reassigns via conflict resolution
KG-6: Per-cluster keyword count: 10-25 keywords (soft target 15)
KG-7: Total keyword count: 300-500+ for site (configurable per sector)
KG-8: Keywords enriched with search volume, difficulty, intent classification
KG-9: API response includes per-cluster breakdown, deduplication summary, total keyword count
KG-10: Handles missing attribute values gracefully (skips template if required attrs not present)

Keyword Conflict Resolution (01C-CR)

CR-1: Identifies keywords matching multiple clusters (≥2 matches)
CR-2: Decision Criterion 1: assigns to cluster with most dimensional intersections
CR-3: Decision Criterion 2 (tiebreaker): assigns to more specific cluster
CR-4: Decision Criterion 3 (tiebreaker): assigns by primary user intent match
CR-5: Decision Criterion 4 (last resort): flags for user review with clear reasoning
CR-6: Reassignment logic preserves keyword integrity (no loss, duplication, or orphaning)

Blueprint Assembly Service (01C-BA)

BA-1: Creates SAGBlueprint record with status='draft'
BA-2: Creates SAGAttribute records from populated attributes (preserves name, values, is_primary flag)
BA-3: Creates SAGCluster records from cluster formation output (all fields populated)
BA-4: Creates SAGKeyword records from keyword generation output (all fields preserved)
BA-5: Associates keywords to clusters via ManyToMany relations
BA-6: Generates taxonomy_plan with WP categories (primary attributes) and tags (secondary)
BA-7: Generates execution_priority with 3 phases: hubs first, supporting articles, term pages
BA-8: Populates denormalized JSON fields (attributes_json, clusters_json) for fast queries
BA-9: Returns blueprint ID and summary (attribute count, cluster count, keyword count, next steps)
BA-10: Status transitions correctly: draft → ready_for_pipeline (or intermediate statuses as needed)

Manual Keyword Supplementation (01C-MKS)

MKS-1: Users can add keywords via: manual entry, CSV import, library selection, API fetch
MKS-2: Manual entry accepts comma-separated keywords, validates against duplicates
MKS-3: CSV import validates file structure (keyword, search_volume optional, difficulty optional)
MKS-4: Library integration allows browsing & selection per site_type
MKS-5: Auto-clustering maps new keywords to clusters via attribute similarity matching
MKS-6: Unmatched keywords flagged for user review: gap analysis, potential new cluster, or outlier
MKS-7: User can assign unmatched keywords to specific cluster or create new cluster
MKS-8: API returns summary: added count, auto-clustered count, flagged count, conflicts resolved

API Endpoints (01C-API)

API-1: POST /api/v1/blueprints/{blueprint_id}/clusters/form/ returns 200 + cluster results
API-2: POST /api/v1/blueprints/{blueprint_id}/keywords/generate/ returns 200 + keyword results
API-3: POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/ returns 200 + supplementation summary
API-4: POST /api/v1/blueprints/{blueprint_id}/assemble/ returns 200 + blueprint summary
API-5: GET /api/v1/blueprints/{blueprint_id}/clusters/ supports status, type, min_viability filters
API-6: GET /api/v1/blueprints/{blueprint_id}/keywords/ supports cluster_id, source, intent, min_search_volume filters
API-7: DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/ only works on draft blueprints
API-8: Error handling: 400 (bad input), 403 (unauthorized), 404 (not found), 422 (unprocessable)

Data Integrity (01C-DI)

DI-1: No keyword appears in multiple clusters (enforced via unique_together in SAGKeyword)
DI-2: Deleted clusters cascade-delete associated keywords (no orphaned keywords)
DI-3: Deleted blueprints cascade-delete all attributes, clusters, keywords
DI-4: Blueprint status transitions prevent invalid operations (e.g., can't supplement keywords on published blueprint)
DI-5: Denormalized JSON fields stay in sync with normalized records (updated on every change)

Performance (01C-PERF)

PERF-1: Cluster formation completes in <5 seconds for 100+ intersection combinations
PERF-2: Keyword generation completes in <10 seconds for 50 clusters
PERF-3: Blueprint assembly completes in <3 seconds (DB writes + JSON generation)
PERF-4: GET endpoints with filters return results in <2 seconds
PERF-5: CSV import (1000 keywords) completes in <15 seconds

6. Claude Code Instructions

6.1 Generating Cluster Formation Logic

Prompt Template for Claude:

Generate the cluster formation algorithm for an AI-powered content planning system.

Input:
- populated_attributes: List of attributes with values from user setup wizard
  Example: [
    {"name": "Pet Type", "values": ["Dogs", "Cats", "Birds"]},
    {"name": "Health Condition", "values": ["Allergies", "Arthritis", "Obesity"]}
  ]
- sector_context: Information about the sector (e.g., "pet health e-commerce")

Task:
1. Generate all meaningful 2-value intersections (Pet Type × Health Condition, Pet Type × Pet Type, etc.)
2. For each intersection, use Claude's reasoning to evaluate:
   - Is this a real topical ecosystem? (do the dimensions naturally fit together?)
   - Would users search for this? (assess search demand)
   - Can we build 1 hub + 3-8 supporting articles?
   - Is it differentiated from other clusters?
3. Classify valid clusters by type: product_category, condition_problem, feature, brand, informational
4. Generate a compelling hub title and 5-8 supporting content titles
5. Assign a viability score (0-1) based on coherence, search demand, content potential

Output:
- clusters: Array of cluster objects with all fields from the spec
- summary: Total clusters, type distribution, viability analysis

Constraints:
- Max 50 clusters per sector
- Minimum 3 dimensional intersections for strong clusters
- Quality over quantity: prefer 5 strong clusters over 15 weak ones

6.2 Generating Keyword Generation Logic

Prompt Template for Claude:

Generate keywords for content clusters using templates and AI-driven expansion.

Input:
- clusters: Array of clusters from cluster formation (with dimensions and hub title)
- keyword_templates: Pre-configured templates for site_type
  Example: [
    "best {health_condition} for {pet_type}",
    "{pet_type} {health_condition} treatment",
    "affordable {health_condition} relief for {pet_type}"
  ]
- sector_context: Site type (ecommerce, blog, saas, etc.)

Task:
1. Load keyword templates filtered by sector site_type
2. For each cluster:
   - Extract dimension values
   - Substitute values into matching templates
   - Generate long-tail variants: best, review, vs, for, how to
   - Enrich with search volume, difficulty, intent (informational, transactional, etc.)
3. Deduplicate globally across all clusters
4. Identify multi-cluster keywords and resolve conflicts via:
   - Highest dimensional intersection count
   - Most specific cluster (tiebreaker)
   - Primary user intent match (tiebreaker)
5. Validate constraints: 10-25 per cluster, 300-500 total

Output:
- keywords_per_cluster: Keywords organized by cluster ID
- deduplication: Count of duplicates removed, conflicts flagged
- summary: Total unique keywords, per-cluster average, search volume total

Constraints:
- Do NOT generate more than 25 keywords per cluster
- Do NOT allow duplicates
- Prioritize high search volume keywords
- Ensure diversity: mix of base keywords and long-tail variants

6.3 Integrating with Setup Wizard (01D)

Implementation Notes:

After user completes attribute population in wizard:
- Call POST /api/v1/blueprints/{blueprint_id}/clusters/form/
- Display clusters to user (preview mode)
- Allow user to: review, edit (rename hub titles, remove clusters), or confirm
After user confirms clusters:
- Call POST /api/v1/blueprints/{blueprint_id}/keywords/generate/
- Display keywords grouped by cluster (preview mode)
- Allow user to: supplement keywords, remove outliers, or confirm
Before finalizing blueprint:
- Optionally allow manual keyword supplementation (CSV, library, manual entry)
- Call POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/ for each source
- Resolve conflicts (auto or manual)
- Call POST /api/v1/blueprints/{blueprint_id}/assemble/ to finalize

6.4 Testing with Sample Data

Test Case 1: Pet Health E-commerce Site

populated_attributes = [
    {"name": "Pet Type", "values": ["Dogs", "Cats"]},
    {"name": "Health Condition", "values": ["Arthritis", "Allergies", "Obesity"]},
    {"name": "Target Audience", "values": ["Pet Owners", "Veterinarians"]}
]

sector_context = {
    "sector_id": "pet_health",
    "site_type": "ecommerce",
    "sector_name": "Pet Health Products"
}

# Expected clusters:
# 1. Dog Arthritis Relief (product_category)
# 2. Cat Allergies Nutrition (product_category)
# 3. Senior Dog Joint Support (life_stage)
# ... etc.

Test Case 2: Local Service (Veterinary Clinic)

populated_attributes = [
    {"name": "Service Type", "values": ["Surgery", "Preventive Care", "Emergency"]},
    {"name": "Pet Type", "values": ["Dogs", "Cats", "Exotic"]},
    {"name": "Location", "values": ["Downtown", "Suburbs"]}
]

sector_context = {
    "sector_id": "vet_clinic",
    "site_type": "local_service",
    "sector_name": "Veterinary Clinic"
}

# Expected clusters:
# 1. Emergency Dog Surgery Downtown (local_service + product_category)
# 2. Preventive Cat Care Suburbs (informational + local_service)
# ... etc.

7. Cross-Document References

Upstream Dependencies

01A (SAG Master Data Models): Provides SAGBlueprint, SAGAttribute, SAGCluster base models
01B (Sector Attribute Templates): Provides attribute framework, keyword templates, site_type configurations

Downstream Consumers

01D (Setup Wizard): Triggers cluster formation & keyword generation after attribute population
01E (Blueprint-aware Pipeline): Uses clusters, keywords, taxonomy_plan, execution_priority for content generation
01F (Existing Site Analysis): May feed competitor/existing keywords into supplementation process
01G (Health Monitoring): Tracks cluster completeness, keyword coverage, content generation progress against blueprint

8. Appendix: Algorithm Complexity & Performance Estimates

Cluster Formation Complexity

Input: N attributes with M average values each
Intersections Generated: O(M²) for 2-value, O(M³) for 3-value
AI Evaluations: O(M² or M³) function calls (largest cost)
Time Estimate: ~1-2 seconds per 100 intersections (depending on Claude API latency)
Bottleneck: Claude API response time for viability evaluation

Keyword Generation Complexity

Input: C clusters, T keyword templates per cluster
Base Keywords: O(C × T) (template substitution)
Long-tail Variants: O(C × T × V) where V ≈ 7 (base + 6 variants)
Deduplication: O(K log K) where K = total keywords (sort-based)
Time Estimate: ~3-5 seconds for 300+ keywords

Blueprint Assembly Complexity

DB Writes: O(A + C + K) where A=attributes, C=clusters, K=keywords
JSON Generation: O(A + C + K) for denormalization
Time Estimate: <1 second for typical blueprints (< 10 MB JSON)

Document Complete Status: Ready for Development Next Step: Implement Phase 1 (AI Functions) per Section 4

58 KiB Raw Blame History Unescape Escape

IGNY8 Phase 1: Cluster Formation & Keyword Engine (Doc 01C)

1. Current State

Existing Components

Current Limitations

Dependencies Ready

2. What to Build

2.1 Cluster Formation AI Function

Input Contract

Output Contract

Algorithm (Pseudocode)

AI Evaluation Criteria

Hard Constraints

2.2 Keyword Auto-Generation AI Function

Input Contract

Output Contract

Algorithm (Pseudocode)

Keyword Template Structure (from SectorAttributeTemplate, 01B)

Long-tail Variant Rules

2.3 Blueprint Assembly Service

Input Contract

Execution Steps

2.4 Manual Keyword Supplementation (User Interface)

Feature: Add Keywords from Multiple Sources

Keyword Mapping Logic

Conflict Resolution: Multi-Cluster Keyword Assignment

3. Data Models / APIs

3.1 Database Models (Django ORM)

SAGBlueprint (existing from 01A, extended)

SAGAttribute (existing from 01A, no changes required)

SAGCluster (existing from 01A, extended)

SAGKeyword (new)

3.2 API Endpoints

POST /api/v1/blueprints/{blueprint_id}/clusters/form/

POST /api/v1/blueprints/{blueprint_id}/keywords/generate/

POST /api/v1/blueprints/{blueprint_id}/keywords/supplement/

POST /api/v1/blueprints/{blueprint_id}/assemble/

GET /api/v1/blueprints/{blueprint_id}/clusters/?status=draft&type=product_category

GET /api/v1/blueprints/{blueprint_id}/keywords/?cluster_id=cluster_001&source=auto_generated

DELETE /api/v1/blueprints/{blueprint_id}/keywords/{keyword_id}/

4. Implementation Steps

Phase 1: AI Functions Development (Week 1-2)

Step 1.1: Set up cluster_formation.py structure

Step 1.2: Implement cluster formation AI logic

Step 1.3: Unit tests for cluster formation

Step 1.4: Create keyword_generation.py structure

Step 1.5: Implement keyword generation AI logic

Step 1.6: Unit tests for keyword generation

Phase 2: Data Models & Service Layer (Week 2-3)

Step 2.1: Database migrations

Step 2.2: Implement blueprint_service.py

Step 2.3: Unit tests for blueprint_service

Phase 3: API Endpoints & Integration (Week 3-4)

Step 3.1: Implement cluster formation API endpoint

Step 3.2: Implement keyword generation API endpoint

Step 3.3: Implement keyword supplementation API endpoint

Step 3.4: Implement blueprint assembly API endpoint

Step 3.5: Implement read endpoints

Step 3.6: Implement keyword removal endpoint

Phase 4: Integration with 01D & Testing (Week 4-5)

Step 4.1: Integrate with Setup Wizard (01D)

Step 4.2: End-to-end testing

Step 4.3: Integration with 01E (Pipeline Configuration)

5. Acceptance Criteria

Cluster Formation AI Function (01C-CF)

Keyword Generation AI Function (01C-KG)

Keyword Conflict Resolution (01C-CR)

Blueprint Assembly Service (01C-BA)

Manual Keyword Supplementation (01C-MKS)

API Endpoints (01C-API)

Data Integrity (01C-DI)

Performance (01C-PERF)

6. Claude Code Instructions

6.1 Generating Cluster Formation Logic

6.2 Generating Keyword Generation Logic

6.3 Integrating with Setup Wizard (01D)

6.4 Testing with Sample Data

7. Cross-Document References

Upstream Dependencies

Downstream Consumers

58 KiB

Raw Blame History