Files
igny8/docs/automation/auto-cluster-validation-fix-plan.md
IGNY8 VPS (Salman) 1fc7d3717d docs
2025-12-04 13:38:54 +00:00

21 KiB

Auto-Cluster Validation Fix Plan

Date: December 4, 2025
Status: Design Phase
Priority: MEDIUM


🎯 OBJECTIVE

Add validation to prevent auto-cluster from running with less than 5 keywords, and ensure both manual auto-cluster and automation pipeline use the same shared validation logic to maintain consistency.


🔍 CURRENT STATE ANALYSIS

Current Behavior

Auto-Cluster Function:

  • Located in: backend/igny8_core/ai/functions/auto_cluster.py
  • No minimum keyword validation
  • Accepts any number of keywords (even 1)
  • May produce poor quality clusters with insufficient data

Automation Pipeline:

  • Located in: backend/igny8_core/business/automation/services/automation_service.py
  • Uses auto-cluster in Stage 1
  • No pre-check for minimum keywords
  • May waste credits on insufficient data

Problems

  1. No Minimum Check: Auto-cluster runs with 1-4 keywords
  2. Poor Results: AI cannot create meaningful clusters with < 5 keywords
  3. Wasted Credits: Charges credits for insufficient analysis
  4. Inconsistent Validation: No shared validation between manual and automation
  5. User Confusion: Error occurs during processing, not at selection

PROPOSED SOLUTION

Validation Strategy

Single Source of Truth:

  • Create one validation function
  • Use it in both auto-cluster function AND automation pipeline
  • Consistent error messages
  • No code duplication

Error Behavior:

  • Manual Auto-Cluster: Return error before API call
  • Automation Pipeline: Skip Stage 1 with warning in logs

📋 IMPLEMENTATION PLAN

Step 1: Create Shared Validation Module

New File: backend/igny8_core/ai/validators/cluster_validators.py

"""
Cluster-specific validators
Shared between auto-cluster function and automation pipeline
"""
import logging
from typing import Dict, List

logger = logging.getLogger(__name__)


def validate_minimum_keywords(
    keyword_ids: List[int],
    account=None,
    min_required: int = 5
) -> Dict:
    """
    Validate that sufficient keywords are available for clustering
    
    Args:
        keyword_ids: List of keyword IDs to cluster
        account: Account object for filtering
        min_required: Minimum number of keywords required (default: 5)
    
    Returns:
        Dict with 'valid' (bool) and 'error' (str) or 'count' (int)
    """
    from igny8_core.modules.planner.models import Keywords
    
    # Build queryset
    queryset = Keywords.objects.filter(id__in=keyword_ids, status='new')
    
    if account:
        queryset = queryset.filter(account=account)
    
    # Count available keywords
    count = queryset.count()
    
    # Validate minimum
    if count < min_required:
        return {
            'valid': False,
            'error': f'Insufficient keywords for clustering. Need at least {min_required} keywords, but only {count} available.',
            'count': count,
            'required': min_required
        }
    
    return {
        'valid': True,
        'count': count,
        'required': min_required
    }


def validate_keyword_selection(
    selected_ids: List[int],
    available_count: int,
    min_required: int = 5
) -> Dict:
    """
    Validate keyword selection (for frontend validation)
    
    Args:
        selected_ids: List of selected keyword IDs
        available_count: Total count of available keywords
        min_required: Minimum required
    
    Returns:
        Dict with validation result
    """
    selected_count = len(selected_ids)
    
    # Check if any keywords selected
    if selected_count == 0:
        return {
            'valid': False,
            'error': 'No keywords selected',
            'type': 'NO_SELECTION'
        }
    
    # Check if enough selected
    if selected_count < min_required:
        return {
            'valid': False,
            'error': f'Please select at least {min_required} keywords. Currently selected: {selected_count}',
            'type': 'INSUFFICIENT_SELECTION',
            'selected': selected_count,
            'required': min_required
        }
    
    # Check if enough available (even if not all selected)
    if available_count < min_required:
        return {
            'valid': False,
            'error': f'Not enough keywords available. Need at least {min_required} keywords, but only {available_count} exist.',
            'type': 'INSUFFICIENT_AVAILABLE',
            'available': available_count,
            'required': min_required
        }
    
    return {
        'valid': True,
        'selected': selected_count,
        'available': available_count,
        'required': min_required
    }

Step 2: Update Auto-Cluster Function

File: backend/igny8_core/ai/functions/auto_cluster.py

Add import:

from igny8_core.ai.validators.cluster_validators import validate_minimum_keywords

Update validate() method:

def validate(self, payload: dict, account=None) -> Dict:
    """Validate keyword IDs and minimum count"""
    result = super().validate(payload, account)
    if not result['valid']:
        return result
    
    keyword_ids = payload.get('keyword_ids', [])
    
    if not keyword_ids:
        return {'valid': False, 'error': 'No keyword IDs provided'}
    
    # NEW: Validate minimum keywords using shared validator
    min_validation = validate_minimum_keywords(
        keyword_ids=keyword_ids,
        account=account,
        min_required=5  # Configurable constant
    )
    
    if not min_validation['valid']:
        # Log the validation failure
        logger.warning(
            f"[AutoCluster] Validation failed: {min_validation['error']}"
        )
        return min_validation
    
    # Log successful validation
    logger.info(
        f"[AutoCluster] Validation passed: {min_validation['count']} keywords available (min: {min_validation['required']})"
    )
    
    return {'valid': True}

Step 3: Update Automation Pipeline

File: backend/igny8_core/business/automation/services/automation_service.py

Add import:

from igny8_core.ai.validators.cluster_validators import validate_minimum_keywords

Update run_stage_1() method:

def run_stage_1(self):
    """Stage 1: Keywords → Clusters (AI)"""
    stage_number = 1
    stage_name = "Keywords → Clusters (AI)"
    start_time = time.time()
    
    # Query pending keywords
    pending_keywords = Keywords.objects.filter(
        site=self.site,
        status='new'
    )
    
    total_count = pending_keywords.count()
    
    # NEW: Pre-stage validation for minimum keywords
    keyword_ids = list(pending_keywords.values_list('id', flat=True))
    
    min_validation = validate_minimum_keywords(
        keyword_ids=keyword_ids,
        account=self.account,
        min_required=5
    )
    
    if not min_validation['valid']:
        # Log validation failure
        self.logger.log_stage_start(
            self.run.run_id, self.account.id, self.site.id,
            stage_number, stage_name, total_count
        )
        
        error_msg = min_validation['error']
        self.logger.log_stage_error(
            self.run.run_id, self.account.id, self.site.id,
            stage_number, error_msg
        )
        
        # Skip stage with proper result
        self.run.stage_1_result = {
            'keywords_processed': 0,
            'clusters_created': 0,
            'skipped': True,
            'skip_reason': error_msg,
            'credits_used': 0
        }
        self.run.current_stage = 2
        self.run.save()
        
        logger.warning(f"[AutomationService] Stage 1 skipped: {error_msg}")
        return
    
    # Log stage start
    self.logger.log_stage_start(
        self.run.run_id, self.account.id, self.site.id,
        stage_number, stage_name, total_count
    )
    
    # ... rest of existing stage logic ...

Step 4: Update API Endpoint

File: backend/igny8_core/modules/planner/views.py (KeywordsViewSet)

Update auto_cluster action:

@action(detail=False, methods=['post'], url_path='auto_cluster', url_name='auto_cluster')
def auto_cluster(self, request):
    """Auto-cluster keywords using AI"""
    from igny8_core.ai.tasks import run_ai_task
    from igny8_core.ai.validators.cluster_validators import validate_minimum_keywords
    
    account = getattr(request, 'account', None)
    keyword_ids = request.data.get('ids', [])
    
    if not keyword_ids:
        return error_response(
            error='No keyword IDs provided',
            status_code=status.HTTP_400_BAD_REQUEST,
            request=request
        )
    
    # NEW: Validate minimum keywords BEFORE queuing task
    validation = validate_minimum_keywords(
        keyword_ids=keyword_ids,
        account=account,
        min_required=5
    )
    
    if not validation['valid']:
        return error_response(
            error=validation['error'],
            status_code=status.HTTP_400_BAD_REQUEST,
            request=request,
            extra_data={
                'count': validation.get('count'),
                'required': validation.get('required')
            }
        )
    
    # Validation passed - proceed with clustering
    account_id = account.id if account else None
    
    try:
        if hasattr(run_ai_task, 'delay'):
            task = run_ai_task.delay(
                function_name='auto_cluster',
                payload={'keyword_ids': keyword_ids},
                account_id=account_id
            )
            return success_response(
                data={'task_id': str(task.id)},
                message=f'Auto-cluster started with {validation["count"]} keywords',
                request=request
            )
        else:
            # Synchronous fallback
            result = run_ai_task(
                function_name='auto_cluster',
                payload={'keyword_ids': keyword_ids},
                account_id=account_id
            )
            return success_response(data=result, request=request)
            
    except Exception as e:
        logger.error(f"Failed to start auto-cluster: {e}", exc_info=True)
        return error_response(
            error=f'Failed to start clustering: {str(e)}',
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            request=request
        )

File: frontend/src/pages/Planner/Keywords.tsx

Update handleAutoCluster function:

const handleAutoCluster = async () => {
  try {
    const selectedIds = selectedKeywords.map(k => k.id);
    
    // Frontend validation (pre-check before API call)
    if (selectedIds.length < 5) {
      toast.error(
        `Please select at least 5 keywords for auto-clustering. Currently selected: ${selectedIds.length}`,
        { duration: 5000 }
      );
      return;
    }
    
    // Check total available
    const availableCount = keywords.filter(k => k.status === 'new').length;
    if (availableCount < 5) {
      toast.error(
        `Not enough keywords available. Need at least 5 keywords, but only ${availableCount} exist.`,
        { duration: 5000 }
      );
      return;
    }
    
    // Proceed with API call
    const result = await autoClusterKeywords(selectedIds);
    
    if (result.task_id) {
      toast.success(`Auto-cluster started with ${selectedIds.length} keywords`);
      setTaskId(result.task_id);
    } else {
      toast.error('Failed to start auto-cluster');
    }
    
  } catch (error: any) {
    // Backend validation error (in case frontend check was bypassed)
    const errorMsg = error.response?.data?.error || error.message;
    toast.error(errorMsg);
  }
};

🗂️ FILE STRUCTURE

New Files

backend/igny8_core/ai/validators/
├── __init__.py
└── cluster_validators.py  (NEW)

Modified Files

backend/igny8_core/ai/functions/auto_cluster.py
backend/igny8_core/business/automation/services/automation_service.py
backend/igny8_core/modules/planner/views.py
frontend/src/pages/Planner/Keywords.tsx

🧪 TESTING PLAN

Unit Tests

File: backend/igny8_core/ai/validators/tests/test_cluster_validators.py

import pytest
from django.test import TestCase
from igny8_core.ai.validators.cluster_validators import (
    validate_minimum_keywords,
    validate_keyword_selection
)
from igny8_core.modules.planner.models import Keywords
from igny8_core.auth.models import Account, Site


class ClusterValidatorsTestCase(TestCase):
    def setUp(self):
        self.account = Account.objects.create(name='Test Account')
        self.site = Site.objects.create(name='Test Site', account=self.account)
    
    def test_validate_minimum_keywords_success(self):
        """Test with sufficient keywords (>= 5)"""
        # Create 10 keywords
        keyword_ids = []
        for i in range(10):
            kw = Keywords.objects.create(
                keyword=f'keyword {i}',
                status='new',
                account=self.account,
                site=self.site
            )
            keyword_ids.append(kw.id)
        
        result = validate_minimum_keywords(keyword_ids, self.account)
        
        assert result['valid'] is True
        assert result['count'] == 10
        assert result['required'] == 5
    
    def test_validate_minimum_keywords_failure(self):
        """Test with insufficient keywords (< 5)"""
        # Create only 3 keywords
        keyword_ids = []
        for i in range(3):
            kw = Keywords.objects.create(
                keyword=f'keyword {i}',
                status='new',
                account=self.account,
                site=self.site
            )
            keyword_ids.append(kw.id)
        
        result = validate_minimum_keywords(keyword_ids, self.account)
        
        assert result['valid'] is False
        assert 'Insufficient keywords' in result['error']
        assert result['count'] == 3
        assert result['required'] == 5
    
    def test_validate_minimum_keywords_edge_case_exactly_5(self):
        """Test with exactly 5 keywords (boundary)"""
        keyword_ids = []
        for i in range(5):
            kw = Keywords.objects.create(
                keyword=f'keyword {i}',
                status='new',
                account=self.account,
                site=self.site
            )
            keyword_ids.append(kw.id)
        
        result = validate_minimum_keywords(keyword_ids, self.account)
        
        assert result['valid'] is True
        assert result['count'] == 5
    
    def test_validate_keyword_selection_insufficient(self):
        """Test frontend selection validation"""
        result = validate_keyword_selection(
            selected_ids=[1, 2, 3],  # Only 3
            available_count=10,
            min_required=5
        )
        
        assert result['valid'] is False
        assert result['type'] == 'INSUFFICIENT_SELECTION'
        assert result['selected'] == 3
        assert result['required'] == 5

Integration Tests

class AutoClusterIntegrationTestCase(TestCase):
    def test_auto_cluster_with_insufficient_keywords(self):
        """Test auto-cluster endpoint rejects < 5 keywords"""
        # Create only 3 keywords
        keyword_ids = self._create_keywords(3)
        
        response = self.client.post(
            '/api/planner/keywords/auto_cluster/',
            data={'ids': keyword_ids},
            HTTP_AUTHORIZATION=f'Bearer {self.token}'
        )
        
        assert response.status_code == 400
        assert 'Insufficient keywords' in response.json()['error']
    
    def test_automation_skips_stage_1_with_insufficient_keywords(self):
        """Test automation skips Stage 1 if < 5 keywords"""
        # Create only 2 keywords
        self._create_keywords(2)
        
        # Start automation
        run_id = self.automation_service.start_automation('manual')
        
        # Verify Stage 1 was skipped
        run = AutomationRun.objects.get(run_id=run_id)
        assert run.stage_1_result['skipped'] is True
        assert 'Insufficient keywords' in run.stage_1_result['skip_reason']
        assert run.current_stage == 2  # Moved to next stage

Manual Test Cases

  • Test 1: Try auto-cluster with 0 keywords selected

    • Expected: Error message "No keywords selected"
  • Test 2: Try auto-cluster with 3 keywords selected

    • Expected: Error message "Please select at least 5 keywords. Currently selected: 3"
  • Test 3: Try auto-cluster with exactly 5 keywords

    • Expected: Success, clustering starts
  • Test 4: Run automation with 2 keywords in site

    • Expected: Stage 1 skipped with warning in logs
  • Test 5: Run automation with 10 keywords in site

    • Expected: Stage 1 runs normally

📊 ERROR MESSAGES

Frontend (User-Facing)

No Selection:

❌ No keywords selected
Please select keywords to cluster.

Insufficient Selection:

❌ Please select at least 5 keywords for auto-clustering
Currently selected: 3 keywords
You need at least 5 keywords to create meaningful clusters.

Insufficient Available:

❌ Not enough keywords available
Need at least 5 keywords, but only 2 exist.
Add more keywords before running auto-cluster.

Backend (Logs)

Validation Failed:

[AutoCluster] Validation failed: Insufficient keywords for clustering. Need at least 5 keywords, but only 3 available.

Validation Passed:

[AutoCluster] Validation passed: 15 keywords available (min: 5)

Automation Stage Skipped:

[AutomationService] Stage 1 skipped: Insufficient keywords for clustering. Need at least 5 keywords, but only 2 available.

🎯 CONFIGURATION

Constants File

File: backend/igny8_core/ai/constants.py (or create if doesn't exist)

"""
AI Function Configuration Constants
"""

# Cluster Configuration
MIN_KEYWORDS_FOR_CLUSTERING = 5  # Minimum keywords needed for meaningful clusters
OPTIMAL_KEYWORDS_FOR_CLUSTERING = 20  # Recommended for best results

# Other AI limits...

Usage in validators:

from igny8_core.ai.constants import MIN_KEYWORDS_FOR_CLUSTERING

def validate_minimum_keywords(keyword_ids, account=None):
    min_required = MIN_KEYWORDS_FOR_CLUSTERING
    # ... validation logic

🔄 SHARED VALIDATION PATTERN

Why This Approach Works

Single Source of Truth:

  • One function: validate_minimum_keywords()
  • Used by both auto-cluster function and automation
  • Update in one place applies everywhere

Consistent Behavior:

  • Same error messages
  • Same validation logic
  • Same minimum requirements

Easy to Maintain:

  • Want to change minimum from 5 to 10? Change one constant
  • Want to add new validation? Add to one function
  • Want to test? Test one module

No Code Duplication:

  • DRY principle followed
  • Reduces bugs from inconsistency
  • Easier code review

Pattern for Future Validators

# backend/igny8_core/ai/validators/content_validators.py

def validate_minimum_content_length(content_text: str, min_words: int = 100):
    """
    Shared validator for content minimum length
    Used by: GenerateContentFunction, Automation Stage 4, Content creation
    """
    word_count = len(content_text.split())
    
    if word_count < min_words:
        return {
            'valid': False,
            'error': f'Content too short. Minimum {min_words} words required, got {word_count}.'
        }
    
    return {'valid': True, 'word_count': word_count}

🚀 IMPLEMENTATION STEPS

Phase 1: Create Validator (Day 1)

  • Create cluster_validators.py
  • Implement validate_minimum_keywords()
  • Implement validate_keyword_selection()
  • Write unit tests

Phase 2: Integrate Backend (Day 1)

  • Update AutoClusterFunction.validate()
  • Update AutomationService.run_stage_1()
  • Update KeywordsViewSet.auto_cluster()
  • Write integration tests

Phase 3: Frontend (Day 2)

  • Add frontend validation in Keywords page
  • Add user-friendly error messages
  • Test error scenarios

Phase 4: Testing & Deployment (Day 2)

  • Run all tests
  • Manual QA testing
  • Deploy to production
  • Monitor first few auto-cluster runs

🎯 SUCCESS CRITERIA

Auto-cluster returns error if < 5 keywords selected
Automation skips Stage 1 if < 5 keywords available
Both use same validation function (no duplication)
Clear error messages guide users
Frontend validation provides instant feedback
Backend validation catches edge cases
All tests pass
No regression in existing functionality


📈 FUTURE ENHANCEMENTS

V2 Features

  1. Configurable Minimum:

    • Allow admin to set minimum via settings
    • Default: 5, Range: 3-20
  2. Quality Scoring:

    • Show quality indicator based on keyword count
    • 5-10: "Fair", 11-20: "Good", 21+: "Excellent"
  3. Smart Recommendations:

    • "You have 4 keywords. Add 1 more for best results"
    • "15 keywords selected. Good for clustering!"
  4. Batch Size Validation:

    • Warn if too many keywords selected (> 100)
    • Suggest splitting into multiple runs

END OF PLAN

This plan ensures robust, consistent validation for auto-cluster across all entry points (manual and automation) using shared, well-tested validation logic.