# Auto-Cluster Validation Fix Plan **Date:** December 4, 2025 **Status:** Design Phase **Priority:** MEDIUM --- ## ๐ŸŽฏ OBJECTIVE Add validation to prevent auto-cluster from running with less than 5 keywords, and ensure both manual auto-cluster and automation pipeline use the same shared validation logic to maintain consistency. --- ## ๐Ÿ” CURRENT STATE ANALYSIS ### Current Behavior **Auto-Cluster Function:** - Located in: `backend/igny8_core/ai/functions/auto_cluster.py` - No minimum keyword validation - Accepts any number of keywords (even 1) - May produce poor quality clusters with insufficient data **Automation Pipeline:** - Located in: `backend/igny8_core/business/automation/services/automation_service.py` - Uses auto-cluster in Stage 1 - No pre-check for minimum keywords - May waste credits on insufficient data ### Problems 1. โŒ **No Minimum Check:** Auto-cluster runs with 1-4 keywords 2. โŒ **Poor Results:** AI cannot create meaningful clusters with < 5 keywords 3. โŒ **Wasted Credits:** Charges credits for insufficient analysis 4. โŒ **Inconsistent Validation:** No shared validation between manual and automation 5. โŒ **User Confusion:** Error occurs during processing, not at selection --- ## โœ… PROPOSED SOLUTION ### Validation Strategy **Single Source of Truth:** - Create one validation function - Use it in both auto-cluster function AND automation pipeline - Consistent error messages - No code duplication **Error Behavior:** - **Manual Auto-Cluster:** Return error before API call - **Automation Pipeline:** Skip Stage 1 with warning in logs --- ## ๐Ÿ“‹ IMPLEMENTATION PLAN ### Step 1: Create Shared Validation Module **New File:** `backend/igny8_core/ai/validators/cluster_validators.py` ```python """ Cluster-specific validators Shared between auto-cluster function and automation pipeline """ import logging from typing import Dict, List logger = logging.getLogger(__name__) def validate_minimum_keywords( keyword_ids: List[int], account=None, min_required: int = 5 ) -> Dict: """ Validate that sufficient keywords are available for clustering Args: keyword_ids: List of keyword IDs to cluster account: Account object for filtering min_required: Minimum number of keywords required (default: 5) Returns: Dict with 'valid' (bool) and 'error' (str) or 'count' (int) """ from igny8_core.modules.planner.models import Keywords # Build queryset queryset = Keywords.objects.filter(id__in=keyword_ids, status='new') if account: queryset = queryset.filter(account=account) # Count available keywords count = queryset.count() # Validate minimum if count < min_required: return { 'valid': False, 'error': f'Insufficient keywords for clustering. Need at least {min_required} keywords, but only {count} available.', 'count': count, 'required': min_required } return { 'valid': True, 'count': count, 'required': min_required } def validate_keyword_selection( selected_ids: List[int], available_count: int, min_required: int = 5 ) -> Dict: """ Validate keyword selection (for frontend validation) Args: selected_ids: List of selected keyword IDs available_count: Total count of available keywords min_required: Minimum required Returns: Dict with validation result """ selected_count = len(selected_ids) # Check if any keywords selected if selected_count == 0: return { 'valid': False, 'error': 'No keywords selected', 'type': 'NO_SELECTION' } # Check if enough selected if selected_count < min_required: return { 'valid': False, 'error': f'Please select at least {min_required} keywords. Currently selected: {selected_count}', 'type': 'INSUFFICIENT_SELECTION', 'selected': selected_count, 'required': min_required } # Check if enough available (even if not all selected) if available_count < min_required: return { 'valid': False, 'error': f'Not enough keywords available. Need at least {min_required} keywords, but only {available_count} exist.', 'type': 'INSUFFICIENT_AVAILABLE', 'available': available_count, 'required': min_required } return { 'valid': True, 'selected': selected_count, 'available': available_count, 'required': min_required } ``` ### Step 2: Update Auto-Cluster Function **File:** `backend/igny8_core/ai/functions/auto_cluster.py` **Add import:** ```python from igny8_core.ai.validators.cluster_validators import validate_minimum_keywords ``` **Update validate() method:** ```python def validate(self, payload: dict, account=None) -> Dict: """Validate keyword IDs and minimum count""" result = super().validate(payload, account) if not result['valid']: return result keyword_ids = payload.get('keyword_ids', []) if not keyword_ids: return {'valid': False, 'error': 'No keyword IDs provided'} # NEW: Validate minimum keywords using shared validator min_validation = validate_minimum_keywords( keyword_ids=keyword_ids, account=account, min_required=5 # Configurable constant ) if not min_validation['valid']: # Log the validation failure logger.warning( f"[AutoCluster] Validation failed: {min_validation['error']}" ) return min_validation # Log successful validation logger.info( f"[AutoCluster] Validation passed: {min_validation['count']} keywords available (min: {min_validation['required']})" ) return {'valid': True} ``` ### Step 3: Update Automation Pipeline **File:** `backend/igny8_core/business/automation/services/automation_service.py` **Add import:** ```python from igny8_core.ai.validators.cluster_validators import validate_minimum_keywords ``` **Update run_stage_1() method:** ```python def run_stage_1(self): """Stage 1: Keywords โ†’ Clusters (AI)""" stage_number = 1 stage_name = "Keywords โ†’ Clusters (AI)" start_time = time.time() # Query pending keywords pending_keywords = Keywords.objects.filter( site=self.site, status='new' ) total_count = pending_keywords.count() # NEW: Pre-stage validation for minimum keywords keyword_ids = list(pending_keywords.values_list('id', flat=True)) min_validation = validate_minimum_keywords( keyword_ids=keyword_ids, account=self.account, min_required=5 ) if not min_validation['valid']: # Log validation failure self.logger.log_stage_start( self.run.run_id, self.account.id, self.site.id, stage_number, stage_name, total_count ) error_msg = min_validation['error'] self.logger.log_stage_error( self.run.run_id, self.account.id, self.site.id, stage_number, error_msg ) # Skip stage with proper result self.run.stage_1_result = { 'keywords_processed': 0, 'clusters_created': 0, 'skipped': True, 'skip_reason': error_msg, 'credits_used': 0 } self.run.current_stage = 2 self.run.save() logger.warning(f"[AutomationService] Stage 1 skipped: {error_msg}") return # Log stage start self.logger.log_stage_start( self.run.run_id, self.account.id, self.site.id, stage_number, stage_name, total_count ) # ... rest of existing stage logic ... ``` ### Step 4: Update API Endpoint **File:** `backend/igny8_core/modules/planner/views.py` (KeywordsViewSet) **Update auto_cluster action:** ```python @action(detail=False, methods=['post'], url_path='auto_cluster', url_name='auto_cluster') def auto_cluster(self, request): """Auto-cluster keywords using AI""" from igny8_core.ai.tasks import run_ai_task from igny8_core.ai.validators.cluster_validators import validate_minimum_keywords account = getattr(request, 'account', None) keyword_ids = request.data.get('ids', []) if not keyword_ids: return error_response( error='No keyword IDs provided', status_code=status.HTTP_400_BAD_REQUEST, request=request ) # NEW: Validate minimum keywords BEFORE queuing task validation = validate_minimum_keywords( keyword_ids=keyword_ids, account=account, min_required=5 ) if not validation['valid']: return error_response( error=validation['error'], status_code=status.HTTP_400_BAD_REQUEST, request=request, extra_data={ 'count': validation.get('count'), 'required': validation.get('required') } ) # Validation passed - proceed with clustering account_id = account.id if account else None try: if hasattr(run_ai_task, 'delay'): task = run_ai_task.delay( function_name='auto_cluster', payload={'keyword_ids': keyword_ids}, account_id=account_id ) return success_response( data={'task_id': str(task.id)}, message=f'Auto-cluster started with {validation["count"]} keywords', request=request ) else: # Synchronous fallback result = run_ai_task( function_name='auto_cluster', payload={'keyword_ids': keyword_ids}, account_id=account_id ) return success_response(data=result, request=request) except Exception as e: logger.error(f"Failed to start auto-cluster: {e}", exc_info=True) return error_response( error=f'Failed to start clustering: {str(e)}', status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, request=request ) ``` ### Step 5: Add Frontend Validation (Optional but Recommended) **File:** `frontend/src/pages/Planner/Keywords.tsx` **Update handleAutoCluster function:** ```typescript const handleAutoCluster = async () => { try { const selectedIds = selectedKeywords.map(k => k.id); // Frontend validation (pre-check before API call) if (selectedIds.length < 5) { toast.error( `Please select at least 5 keywords for auto-clustering. Currently selected: ${selectedIds.length}`, { duration: 5000 } ); return; } // Check total available const availableCount = keywords.filter(k => k.status === 'new').length; if (availableCount < 5) { toast.error( `Not enough keywords available. Need at least 5 keywords, but only ${availableCount} exist.`, { duration: 5000 } ); return; } // Proceed with API call const result = await autoClusterKeywords(selectedIds); if (result.task_id) { toast.success(`Auto-cluster started with ${selectedIds.length} keywords`); setTaskId(result.task_id); } else { toast.error('Failed to start auto-cluster'); } } catch (error: any) { // Backend validation error (in case frontend check was bypassed) const errorMsg = error.response?.data?.error || error.message; toast.error(errorMsg); } }; ``` --- ## ๐Ÿ—‚๏ธ FILE STRUCTURE ### New Files ``` backend/igny8_core/ai/validators/ โ”œโ”€โ”€ __init__.py โ””โ”€โ”€ cluster_validators.py (NEW) ``` ### Modified Files ``` backend/igny8_core/ai/functions/auto_cluster.py backend/igny8_core/business/automation/services/automation_service.py backend/igny8_core/modules/planner/views.py frontend/src/pages/Planner/Keywords.tsx ``` --- ## ๐Ÿงช TESTING PLAN ### Unit Tests **File:** `backend/igny8_core/ai/validators/tests/test_cluster_validators.py` ```python import pytest from django.test import TestCase from igny8_core.ai.validators.cluster_validators import ( validate_minimum_keywords, validate_keyword_selection ) from igny8_core.modules.planner.models import Keywords from igny8_core.auth.models import Account, Site class ClusterValidatorsTestCase(TestCase): def setUp(self): self.account = Account.objects.create(name='Test Account') self.site = Site.objects.create(name='Test Site', account=self.account) def test_validate_minimum_keywords_success(self): """Test with sufficient keywords (>= 5)""" # Create 10 keywords keyword_ids = [] for i in range(10): kw = Keywords.objects.create( keyword=f'keyword {i}', status='new', account=self.account, site=self.site ) keyword_ids.append(kw.id) result = validate_minimum_keywords(keyword_ids, self.account) assert result['valid'] is True assert result['count'] == 10 assert result['required'] == 5 def test_validate_minimum_keywords_failure(self): """Test with insufficient keywords (< 5)""" # Create only 3 keywords keyword_ids = [] for i in range(3): kw = Keywords.objects.create( keyword=f'keyword {i}', status='new', account=self.account, site=self.site ) keyword_ids.append(kw.id) result = validate_minimum_keywords(keyword_ids, self.account) assert result['valid'] is False assert 'Insufficient keywords' in result['error'] assert result['count'] == 3 assert result['required'] == 5 def test_validate_minimum_keywords_edge_case_exactly_5(self): """Test with exactly 5 keywords (boundary)""" keyword_ids = [] for i in range(5): kw = Keywords.objects.create( keyword=f'keyword {i}', status='new', account=self.account, site=self.site ) keyword_ids.append(kw.id) result = validate_minimum_keywords(keyword_ids, self.account) assert result['valid'] is True assert result['count'] == 5 def test_validate_keyword_selection_insufficient(self): """Test frontend selection validation""" result = validate_keyword_selection( selected_ids=[1, 2, 3], # Only 3 available_count=10, min_required=5 ) assert result['valid'] is False assert result['type'] == 'INSUFFICIENT_SELECTION' assert result['selected'] == 3 assert result['required'] == 5 ``` ### Integration Tests ```python class AutoClusterIntegrationTestCase(TestCase): def test_auto_cluster_with_insufficient_keywords(self): """Test auto-cluster endpoint rejects < 5 keywords""" # Create only 3 keywords keyword_ids = self._create_keywords(3) response = self.client.post( '/api/planner/keywords/auto_cluster/', data={'ids': keyword_ids}, HTTP_AUTHORIZATION=f'Bearer {self.token}' ) assert response.status_code == 400 assert 'Insufficient keywords' in response.json()['error'] def test_automation_skips_stage_1_with_insufficient_keywords(self): """Test automation skips Stage 1 if < 5 keywords""" # Create only 2 keywords self._create_keywords(2) # Start automation run_id = self.automation_service.start_automation('manual') # Verify Stage 1 was skipped run = AutomationRun.objects.get(run_id=run_id) assert run.stage_1_result['skipped'] is True assert 'Insufficient keywords' in run.stage_1_result['skip_reason'] assert run.current_stage == 2 # Moved to next stage ``` ### Manual Test Cases - [ ] **Test 1:** Try auto-cluster with 0 keywords selected - Expected: Error message "No keywords selected" - [ ] **Test 2:** Try auto-cluster with 3 keywords selected - Expected: Error message "Please select at least 5 keywords. Currently selected: 3" - [ ] **Test 3:** Try auto-cluster with exactly 5 keywords - Expected: Success, clustering starts - [ ] **Test 4:** Run automation with 2 keywords in site - Expected: Stage 1 skipped with warning in logs - [ ] **Test 5:** Run automation with 10 keywords in site - Expected: Stage 1 runs normally --- ## ๐Ÿ“Š ERROR MESSAGES ### Frontend (User-Facing) **No Selection:** ``` โŒ No keywords selected Please select keywords to cluster. ``` **Insufficient Selection:** ``` โŒ Please select at least 5 keywords for auto-clustering Currently selected: 3 keywords You need at least 5 keywords to create meaningful clusters. ``` **Insufficient Available:** ``` โŒ Not enough keywords available Need at least 5 keywords, but only 2 exist. Add more keywords before running auto-cluster. ``` ### Backend (Logs) **Validation Failed:** ``` [AutoCluster] Validation failed: Insufficient keywords for clustering. Need at least 5 keywords, but only 3 available. ``` **Validation Passed:** ``` [AutoCluster] Validation passed: 15 keywords available (min: 5) ``` **Automation Stage Skipped:** ``` [AutomationService] Stage 1 skipped: Insufficient keywords for clustering. Need at least 5 keywords, but only 2 available. ``` --- ## ๐ŸŽฏ CONFIGURATION ### Constants File **File:** `backend/igny8_core/ai/constants.py` (or create if doesn't exist) ```python """ AI Function Configuration Constants """ # Cluster Configuration MIN_KEYWORDS_FOR_CLUSTERING = 5 # Minimum keywords needed for meaningful clusters OPTIMAL_KEYWORDS_FOR_CLUSTERING = 20 # Recommended for best results # Other AI limits... ``` **Usage in validators:** ```python from igny8_core.ai.constants import MIN_KEYWORDS_FOR_CLUSTERING def validate_minimum_keywords(keyword_ids, account=None): min_required = MIN_KEYWORDS_FOR_CLUSTERING # ... validation logic ``` --- ## ๐Ÿ”„ SHARED VALIDATION PATTERN ### Why This Approach Works **โœ… Single Source of Truth:** - One function: `validate_minimum_keywords()` - Used by both auto-cluster function and automation - Update in one place applies everywhere **โœ… Consistent Behavior:** - Same error messages - Same validation logic - Same minimum requirements **โœ… Easy to Maintain:** - Want to change minimum from 5 to 10? Change one constant - Want to add new validation? Add to one function - Want to test? Test one module **โœ… No Code Duplication:** - DRY principle followed - Reduces bugs from inconsistency - Easier code review ### Pattern for Future Validators ```python # backend/igny8_core/ai/validators/content_validators.py def validate_minimum_content_length(content_text: str, min_words: int = 100): """ Shared validator for content minimum length Used by: GenerateContentFunction, Automation Stage 4, Content creation """ word_count = len(content_text.split()) if word_count < min_words: return { 'valid': False, 'error': f'Content too short. Minimum {min_words} words required, got {word_count}.' } return {'valid': True, 'word_count': word_count} ``` --- ## ๐Ÿš€ IMPLEMENTATION STEPS ### Phase 1: Create Validator (Day 1) - [ ] Create `cluster_validators.py` - [ ] Implement `validate_minimum_keywords()` - [ ] Implement `validate_keyword_selection()` - [ ] Write unit tests ### Phase 2: Integrate Backend (Day 1) - [ ] Update `AutoClusterFunction.validate()` - [ ] Update `AutomationService.run_stage_1()` - [ ] Update `KeywordsViewSet.auto_cluster()` - [ ] Write integration tests ### Phase 3: Frontend (Day 2) - [ ] Add frontend validation in Keywords page - [ ] Add user-friendly error messages - [ ] Test error scenarios ### Phase 4: Testing & Deployment (Day 2) - [ ] Run all tests - [ ] Manual QA testing - [ ] Deploy to production - [ ] Monitor first few auto-cluster runs --- ## ๐ŸŽฏ SUCCESS CRITERIA โœ… Auto-cluster returns error if < 5 keywords selected โœ… Automation skips Stage 1 if < 5 keywords available โœ… Both use same validation function (no duplication) โœ… Clear error messages guide users โœ… Frontend validation provides instant feedback โœ… Backend validation catches edge cases โœ… All tests pass โœ… No regression in existing functionality --- ## ๐Ÿ“ˆ FUTURE ENHANCEMENTS ### V2 Features 1. **Configurable Minimum:** - Allow admin to set minimum via settings - Default: 5, Range: 3-20 2. **Quality Scoring:** - Show quality indicator based on keyword count - 5-10: "Fair", 11-20: "Good", 21+: "Excellent" 3. **Smart Recommendations:** - "You have 4 keywords. Add 1 more for best results" - "15 keywords selected. Good for clustering!" 4. **Batch Size Validation:** - Warn if too many keywords selected (> 100) - Suggest splitting into multiple runs --- ## END OF PLAN This plan ensures robust, consistent validation for auto-cluster across all entry points (manual and automation) using shared, well-tested validation logic.