21 KiB
Auto-Cluster Validation Fix Plan
Date: December 4, 2025
Status: Design Phase
Priority: MEDIUM
🎯 OBJECTIVE
Add validation to prevent auto-cluster from running with less than 5 keywords, and ensure both manual auto-cluster and automation pipeline use the same shared validation logic to maintain consistency.
🔍 CURRENT STATE ANALYSIS
Current Behavior
Auto-Cluster Function:
- Located in:
backend/igny8_core/ai/functions/auto_cluster.py - No minimum keyword validation
- Accepts any number of keywords (even 1)
- May produce poor quality clusters with insufficient data
Automation Pipeline:
- Located in:
backend/igny8_core/business/automation/services/automation_service.py - Uses auto-cluster in Stage 1
- No pre-check for minimum keywords
- May waste credits on insufficient data
Problems
- ❌ No Minimum Check: Auto-cluster runs with 1-4 keywords
- ❌ Poor Results: AI cannot create meaningful clusters with < 5 keywords
- ❌ Wasted Credits: Charges credits for insufficient analysis
- ❌ Inconsistent Validation: No shared validation between manual and automation
- ❌ User Confusion: Error occurs during processing, not at selection
✅ PROPOSED SOLUTION
Validation Strategy
Single Source of Truth:
- Create one validation function
- Use it in both auto-cluster function AND automation pipeline
- Consistent error messages
- No code duplication
Error Behavior:
- Manual Auto-Cluster: Return error before API call
- Automation Pipeline: Skip Stage 1 with warning in logs
📋 IMPLEMENTATION PLAN
Step 1: Create Shared Validation Module
New File: backend/igny8_core/ai/validators/cluster_validators.py
"""
Cluster-specific validators
Shared between auto-cluster function and automation pipeline
"""
import logging
from typing import Dict, List
logger = logging.getLogger(__name__)
def validate_minimum_keywords(
keyword_ids: List[int],
account=None,
min_required: int = 5
) -> Dict:
"""
Validate that sufficient keywords are available for clustering
Args:
keyword_ids: List of keyword IDs to cluster
account: Account object for filtering
min_required: Minimum number of keywords required (default: 5)
Returns:
Dict with 'valid' (bool) and 'error' (str) or 'count' (int)
"""
from igny8_core.modules.planner.models import Keywords
# Build queryset
queryset = Keywords.objects.filter(id__in=keyword_ids, status='new')
if account:
queryset = queryset.filter(account=account)
# Count available keywords
count = queryset.count()
# Validate minimum
if count < min_required:
return {
'valid': False,
'error': f'Insufficient keywords for clustering. Need at least {min_required} keywords, but only {count} available.',
'count': count,
'required': min_required
}
return {
'valid': True,
'count': count,
'required': min_required
}
def validate_keyword_selection(
selected_ids: List[int],
available_count: int,
min_required: int = 5
) -> Dict:
"""
Validate keyword selection (for frontend validation)
Args:
selected_ids: List of selected keyword IDs
available_count: Total count of available keywords
min_required: Minimum required
Returns:
Dict with validation result
"""
selected_count = len(selected_ids)
# Check if any keywords selected
if selected_count == 0:
return {
'valid': False,
'error': 'No keywords selected',
'type': 'NO_SELECTION'
}
# Check if enough selected
if selected_count < min_required:
return {
'valid': False,
'error': f'Please select at least {min_required} keywords. Currently selected: {selected_count}',
'type': 'INSUFFICIENT_SELECTION',
'selected': selected_count,
'required': min_required
}
# Check if enough available (even if not all selected)
if available_count < min_required:
return {
'valid': False,
'error': f'Not enough keywords available. Need at least {min_required} keywords, but only {available_count} exist.',
'type': 'INSUFFICIENT_AVAILABLE',
'available': available_count,
'required': min_required
}
return {
'valid': True,
'selected': selected_count,
'available': available_count,
'required': min_required
}
Step 2: Update Auto-Cluster Function
File: backend/igny8_core/ai/functions/auto_cluster.py
Add import:
from igny8_core.ai.validators.cluster_validators import validate_minimum_keywords
Update validate() method:
def validate(self, payload: dict, account=None) -> Dict:
"""Validate keyword IDs and minimum count"""
result = super().validate(payload, account)
if not result['valid']:
return result
keyword_ids = payload.get('keyword_ids', [])
if not keyword_ids:
return {'valid': False, 'error': 'No keyword IDs provided'}
# NEW: Validate minimum keywords using shared validator
min_validation = validate_minimum_keywords(
keyword_ids=keyword_ids,
account=account,
min_required=5 # Configurable constant
)
if not min_validation['valid']:
# Log the validation failure
logger.warning(
f"[AutoCluster] Validation failed: {min_validation['error']}"
)
return min_validation
# Log successful validation
logger.info(
f"[AutoCluster] Validation passed: {min_validation['count']} keywords available (min: {min_validation['required']})"
)
return {'valid': True}
Step 3: Update Automation Pipeline
File: backend/igny8_core/business/automation/services/automation_service.py
Add import:
from igny8_core.ai.validators.cluster_validators import validate_minimum_keywords
Update run_stage_1() method:
def run_stage_1(self):
"""Stage 1: Keywords → Clusters (AI)"""
stage_number = 1
stage_name = "Keywords → Clusters (AI)"
start_time = time.time()
# Query pending keywords
pending_keywords = Keywords.objects.filter(
site=self.site,
status='new'
)
total_count = pending_keywords.count()
# NEW: Pre-stage validation for minimum keywords
keyword_ids = list(pending_keywords.values_list('id', flat=True))
min_validation = validate_minimum_keywords(
keyword_ids=keyword_ids,
account=self.account,
min_required=5
)
if not min_validation['valid']:
# Log validation failure
self.logger.log_stage_start(
self.run.run_id, self.account.id, self.site.id,
stage_number, stage_name, total_count
)
error_msg = min_validation['error']
self.logger.log_stage_error(
self.run.run_id, self.account.id, self.site.id,
stage_number, error_msg
)
# Skip stage with proper result
self.run.stage_1_result = {
'keywords_processed': 0,
'clusters_created': 0,
'skipped': True,
'skip_reason': error_msg,
'credits_used': 0
}
self.run.current_stage = 2
self.run.save()
logger.warning(f"[AutomationService] Stage 1 skipped: {error_msg}")
return
# Log stage start
self.logger.log_stage_start(
self.run.run_id, self.account.id, self.site.id,
stage_number, stage_name, total_count
)
# ... rest of existing stage logic ...
Step 4: Update API Endpoint
File: backend/igny8_core/modules/planner/views.py (KeywordsViewSet)
Update auto_cluster action:
@action(detail=False, methods=['post'], url_path='auto_cluster', url_name='auto_cluster')
def auto_cluster(self, request):
"""Auto-cluster keywords using AI"""
from igny8_core.ai.tasks import run_ai_task
from igny8_core.ai.validators.cluster_validators import validate_minimum_keywords
account = getattr(request, 'account', None)
keyword_ids = request.data.get('ids', [])
if not keyword_ids:
return error_response(
error='No keyword IDs provided',
status_code=status.HTTP_400_BAD_REQUEST,
request=request
)
# NEW: Validate minimum keywords BEFORE queuing task
validation = validate_minimum_keywords(
keyword_ids=keyword_ids,
account=account,
min_required=5
)
if not validation['valid']:
return error_response(
error=validation['error'],
status_code=status.HTTP_400_BAD_REQUEST,
request=request,
extra_data={
'count': validation.get('count'),
'required': validation.get('required')
}
)
# Validation passed - proceed with clustering
account_id = account.id if account else None
try:
if hasattr(run_ai_task, 'delay'):
task = run_ai_task.delay(
function_name='auto_cluster',
payload={'keyword_ids': keyword_ids},
account_id=account_id
)
return success_response(
data={'task_id': str(task.id)},
message=f'Auto-cluster started with {validation["count"]} keywords',
request=request
)
else:
# Synchronous fallback
result = run_ai_task(
function_name='auto_cluster',
payload={'keyword_ids': keyword_ids},
account_id=account_id
)
return success_response(data=result, request=request)
except Exception as e:
logger.error(f"Failed to start auto-cluster: {e}", exc_info=True)
return error_response(
error=f'Failed to start clustering: {str(e)}',
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
request=request
)
Step 5: Add Frontend Validation (Optional but Recommended)
File: frontend/src/pages/Planner/Keywords.tsx
Update handleAutoCluster function:
const handleAutoCluster = async () => {
try {
const selectedIds = selectedKeywords.map(k => k.id);
// Frontend validation (pre-check before API call)
if (selectedIds.length < 5) {
toast.error(
`Please select at least 5 keywords for auto-clustering. Currently selected: ${selectedIds.length}`,
{ duration: 5000 }
);
return;
}
// Check total available
const availableCount = keywords.filter(k => k.status === 'new').length;
if (availableCount < 5) {
toast.error(
`Not enough keywords available. Need at least 5 keywords, but only ${availableCount} exist.`,
{ duration: 5000 }
);
return;
}
// Proceed with API call
const result = await autoClusterKeywords(selectedIds);
if (result.task_id) {
toast.success(`Auto-cluster started with ${selectedIds.length} keywords`);
setTaskId(result.task_id);
} else {
toast.error('Failed to start auto-cluster');
}
} catch (error: any) {
// Backend validation error (in case frontend check was bypassed)
const errorMsg = error.response?.data?.error || error.message;
toast.error(errorMsg);
}
};
🗂️ FILE STRUCTURE
New Files
backend/igny8_core/ai/validators/
├── __init__.py
└── cluster_validators.py (NEW)
Modified Files
backend/igny8_core/ai/functions/auto_cluster.py
backend/igny8_core/business/automation/services/automation_service.py
backend/igny8_core/modules/planner/views.py
frontend/src/pages/Planner/Keywords.tsx
🧪 TESTING PLAN
Unit Tests
File: backend/igny8_core/ai/validators/tests/test_cluster_validators.py
import pytest
from django.test import TestCase
from igny8_core.ai.validators.cluster_validators import (
validate_minimum_keywords,
validate_keyword_selection
)
from igny8_core.modules.planner.models import Keywords
from igny8_core.auth.models import Account, Site
class ClusterValidatorsTestCase(TestCase):
def setUp(self):
self.account = Account.objects.create(name='Test Account')
self.site = Site.objects.create(name='Test Site', account=self.account)
def test_validate_minimum_keywords_success(self):
"""Test with sufficient keywords (>= 5)"""
# Create 10 keywords
keyword_ids = []
for i in range(10):
kw = Keywords.objects.create(
keyword=f'keyword {i}',
status='new',
account=self.account,
site=self.site
)
keyword_ids.append(kw.id)
result = validate_minimum_keywords(keyword_ids, self.account)
assert result['valid'] is True
assert result['count'] == 10
assert result['required'] == 5
def test_validate_minimum_keywords_failure(self):
"""Test with insufficient keywords (< 5)"""
# Create only 3 keywords
keyword_ids = []
for i in range(3):
kw = Keywords.objects.create(
keyword=f'keyword {i}',
status='new',
account=self.account,
site=self.site
)
keyword_ids.append(kw.id)
result = validate_minimum_keywords(keyword_ids, self.account)
assert result['valid'] is False
assert 'Insufficient keywords' in result['error']
assert result['count'] == 3
assert result['required'] == 5
def test_validate_minimum_keywords_edge_case_exactly_5(self):
"""Test with exactly 5 keywords (boundary)"""
keyword_ids = []
for i in range(5):
kw = Keywords.objects.create(
keyword=f'keyword {i}',
status='new',
account=self.account,
site=self.site
)
keyword_ids.append(kw.id)
result = validate_minimum_keywords(keyword_ids, self.account)
assert result['valid'] is True
assert result['count'] == 5
def test_validate_keyword_selection_insufficient(self):
"""Test frontend selection validation"""
result = validate_keyword_selection(
selected_ids=[1, 2, 3], # Only 3
available_count=10,
min_required=5
)
assert result['valid'] is False
assert result['type'] == 'INSUFFICIENT_SELECTION'
assert result['selected'] == 3
assert result['required'] == 5
Integration Tests
class AutoClusterIntegrationTestCase(TestCase):
def test_auto_cluster_with_insufficient_keywords(self):
"""Test auto-cluster endpoint rejects < 5 keywords"""
# Create only 3 keywords
keyword_ids = self._create_keywords(3)
response = self.client.post(
'/api/planner/keywords/auto_cluster/',
data={'ids': keyword_ids},
HTTP_AUTHORIZATION=f'Bearer {self.token}'
)
assert response.status_code == 400
assert 'Insufficient keywords' in response.json()['error']
def test_automation_skips_stage_1_with_insufficient_keywords(self):
"""Test automation skips Stage 1 if < 5 keywords"""
# Create only 2 keywords
self._create_keywords(2)
# Start automation
run_id = self.automation_service.start_automation('manual')
# Verify Stage 1 was skipped
run = AutomationRun.objects.get(run_id=run_id)
assert run.stage_1_result['skipped'] is True
assert 'Insufficient keywords' in run.stage_1_result['skip_reason']
assert run.current_stage == 2 # Moved to next stage
Manual Test Cases
-
Test 1: Try auto-cluster with 0 keywords selected
- Expected: Error message "No keywords selected"
-
Test 2: Try auto-cluster with 3 keywords selected
- Expected: Error message "Please select at least 5 keywords. Currently selected: 3"
-
Test 3: Try auto-cluster with exactly 5 keywords
- Expected: Success, clustering starts
-
Test 4: Run automation with 2 keywords in site
- Expected: Stage 1 skipped with warning in logs
-
Test 5: Run automation with 10 keywords in site
- Expected: Stage 1 runs normally
📊 ERROR MESSAGES
Frontend (User-Facing)
No Selection:
❌ No keywords selected
Please select keywords to cluster.
Insufficient Selection:
❌ Please select at least 5 keywords for auto-clustering
Currently selected: 3 keywords
You need at least 5 keywords to create meaningful clusters.
Insufficient Available:
❌ Not enough keywords available
Need at least 5 keywords, but only 2 exist.
Add more keywords before running auto-cluster.
Backend (Logs)
Validation Failed:
[AutoCluster] Validation failed: Insufficient keywords for clustering. Need at least 5 keywords, but only 3 available.
Validation Passed:
[AutoCluster] Validation passed: 15 keywords available (min: 5)
Automation Stage Skipped:
[AutomationService] Stage 1 skipped: Insufficient keywords for clustering. Need at least 5 keywords, but only 2 available.
🎯 CONFIGURATION
Constants File
File: backend/igny8_core/ai/constants.py (or create if doesn't exist)
"""
AI Function Configuration Constants
"""
# Cluster Configuration
MIN_KEYWORDS_FOR_CLUSTERING = 5 # Minimum keywords needed for meaningful clusters
OPTIMAL_KEYWORDS_FOR_CLUSTERING = 20 # Recommended for best results
# Other AI limits...
Usage in validators:
from igny8_core.ai.constants import MIN_KEYWORDS_FOR_CLUSTERING
def validate_minimum_keywords(keyword_ids, account=None):
min_required = MIN_KEYWORDS_FOR_CLUSTERING
# ... validation logic
🔄 SHARED VALIDATION PATTERN
Why This Approach Works
✅ Single Source of Truth:
- One function:
validate_minimum_keywords() - Used by both auto-cluster function and automation
- Update in one place applies everywhere
✅ Consistent Behavior:
- Same error messages
- Same validation logic
- Same minimum requirements
✅ Easy to Maintain:
- Want to change minimum from 5 to 10? Change one constant
- Want to add new validation? Add to one function
- Want to test? Test one module
✅ No Code Duplication:
- DRY principle followed
- Reduces bugs from inconsistency
- Easier code review
Pattern for Future Validators
# backend/igny8_core/ai/validators/content_validators.py
def validate_minimum_content_length(content_text: str, min_words: int = 100):
"""
Shared validator for content minimum length
Used by: GenerateContentFunction, Automation Stage 4, Content creation
"""
word_count = len(content_text.split())
if word_count < min_words:
return {
'valid': False,
'error': f'Content too short. Minimum {min_words} words required, got {word_count}.'
}
return {'valid': True, 'word_count': word_count}
🚀 IMPLEMENTATION STEPS
Phase 1: Create Validator (Day 1)
- Create
cluster_validators.py - Implement
validate_minimum_keywords() - Implement
validate_keyword_selection() - Write unit tests
Phase 2: Integrate Backend (Day 1)
- Update
AutoClusterFunction.validate() - Update
AutomationService.run_stage_1() - Update
KeywordsViewSet.auto_cluster() - Write integration tests
Phase 3: Frontend (Day 2)
- Add frontend validation in Keywords page
- Add user-friendly error messages
- Test error scenarios
Phase 4: Testing & Deployment (Day 2)
- Run all tests
- Manual QA testing
- Deploy to production
- Monitor first few auto-cluster runs
🎯 SUCCESS CRITERIA
✅ Auto-cluster returns error if < 5 keywords selected
✅ Automation skips Stage 1 if < 5 keywords available
✅ Both use same validation function (no duplication)
✅ Clear error messages guide users
✅ Frontend validation provides instant feedback
✅ Backend validation catches edge cases
✅ All tests pass
✅ No regression in existing functionality
📈 FUTURE ENHANCEMENTS
V2 Features
-
Configurable Minimum:
- Allow admin to set minimum via settings
- Default: 5, Range: 3-20
-
Quality Scoring:
- Show quality indicator based on keyword count
- 5-10: "Fair", 11-20: "Good", 21+: "Excellent"
-
Smart Recommendations:
- "You have 4 keywords. Add 1 more for best results"
- "15 keywords selected. Good for clustering!"
-
Batch Size Validation:
- Warn if too many keywords selected (> 100)
- Suggest splitting into multiple runs
END OF PLAN
This plan ensures robust, consistent validation for auto-cluster across all entry points (manual and automation) using shared, well-tested validation logic.