Files
igny8/docs/automation/auto-cluster-validation-fix-plan.md
IGNY8 VPS (Salman) 1fc7d3717d docs
2025-12-04 13:38:54 +00:00

754 lines
21 KiB
Markdown

# Auto-Cluster Validation Fix Plan
**Date:** December 4, 2025
**Status:** Design Phase
**Priority:** MEDIUM
---
## 🎯 OBJECTIVE
Add validation to prevent auto-cluster from running with less than 5 keywords, and ensure both manual auto-cluster and automation pipeline use the same shared validation logic to maintain consistency.
---
## 🔍 CURRENT STATE ANALYSIS
### Current Behavior
**Auto-Cluster Function:**
- Located in: `backend/igny8_core/ai/functions/auto_cluster.py`
- No minimum keyword validation
- Accepts any number of keywords (even 1)
- May produce poor quality clusters with insufficient data
**Automation Pipeline:**
- Located in: `backend/igny8_core/business/automation/services/automation_service.py`
- Uses auto-cluster in Stage 1
- No pre-check for minimum keywords
- May waste credits on insufficient data
### Problems
1.**No Minimum Check:** Auto-cluster runs with 1-4 keywords
2.**Poor Results:** AI cannot create meaningful clusters with < 5 keywords
3.**Wasted Credits:** Charges credits for insufficient analysis
4.**Inconsistent Validation:** No shared validation between manual and automation
5.**User Confusion:** Error occurs during processing, not at selection
---
## ✅ PROPOSED SOLUTION
### Validation Strategy
**Single Source of Truth:**
- Create one validation function
- Use it in both auto-cluster function AND automation pipeline
- Consistent error messages
- No code duplication
**Error Behavior:**
- **Manual Auto-Cluster:** Return error before API call
- **Automation Pipeline:** Skip Stage 1 with warning in logs
---
## 📋 IMPLEMENTATION PLAN
### Step 1: Create Shared Validation Module
**New File:** `backend/igny8_core/ai/validators/cluster_validators.py`
```python
"""
Cluster-specific validators
Shared between auto-cluster function and automation pipeline
"""
import logging
from typing import Dict, List
logger = logging.getLogger(__name__)
def validate_minimum_keywords(
keyword_ids: List[int],
account=None,
min_required: int = 5
) -> Dict:
"""
Validate that sufficient keywords are available for clustering
Args:
keyword_ids: List of keyword IDs to cluster
account: Account object for filtering
min_required: Minimum number of keywords required (default: 5)
Returns:
Dict with 'valid' (bool) and 'error' (str) or 'count' (int)
"""
from igny8_core.modules.planner.models import Keywords
# Build queryset
queryset = Keywords.objects.filter(id__in=keyword_ids, status='new')
if account:
queryset = queryset.filter(account=account)
# Count available keywords
count = queryset.count()
# Validate minimum
if count < min_required:
return {
'valid': False,
'error': f'Insufficient keywords for clustering. Need at least {min_required} keywords, but only {count} available.',
'count': count,
'required': min_required
}
return {
'valid': True,
'count': count,
'required': min_required
}
def validate_keyword_selection(
selected_ids: List[int],
available_count: int,
min_required: int = 5
) -> Dict:
"""
Validate keyword selection (for frontend validation)
Args:
selected_ids: List of selected keyword IDs
available_count: Total count of available keywords
min_required: Minimum required
Returns:
Dict with validation result
"""
selected_count = len(selected_ids)
# Check if any keywords selected
if selected_count == 0:
return {
'valid': False,
'error': 'No keywords selected',
'type': 'NO_SELECTION'
}
# Check if enough selected
if selected_count < min_required:
return {
'valid': False,
'error': f'Please select at least {min_required} keywords. Currently selected: {selected_count}',
'type': 'INSUFFICIENT_SELECTION',
'selected': selected_count,
'required': min_required
}
# Check if enough available (even if not all selected)
if available_count < min_required:
return {
'valid': False,
'error': f'Not enough keywords available. Need at least {min_required} keywords, but only {available_count} exist.',
'type': 'INSUFFICIENT_AVAILABLE',
'available': available_count,
'required': min_required
}
return {
'valid': True,
'selected': selected_count,
'available': available_count,
'required': min_required
}
```
### Step 2: Update Auto-Cluster Function
**File:** `backend/igny8_core/ai/functions/auto_cluster.py`
**Add import:**
```python
from igny8_core.ai.validators.cluster_validators import validate_minimum_keywords
```
**Update validate() method:**
```python
def validate(self, payload: dict, account=None) -> Dict:
"""Validate keyword IDs and minimum count"""
result = super().validate(payload, account)
if not result['valid']:
return result
keyword_ids = payload.get('keyword_ids', [])
if not keyword_ids:
return {'valid': False, 'error': 'No keyword IDs provided'}
# NEW: Validate minimum keywords using shared validator
min_validation = validate_minimum_keywords(
keyword_ids=keyword_ids,
account=account,
min_required=5 # Configurable constant
)
if not min_validation['valid']:
# Log the validation failure
logger.warning(
f"[AutoCluster] Validation failed: {min_validation['error']}"
)
return min_validation
# Log successful validation
logger.info(
f"[AutoCluster] Validation passed: {min_validation['count']} keywords available (min: {min_validation['required']})"
)
return {'valid': True}
```
### Step 3: Update Automation Pipeline
**File:** `backend/igny8_core/business/automation/services/automation_service.py`
**Add import:**
```python
from igny8_core.ai.validators.cluster_validators import validate_minimum_keywords
```
**Update run_stage_1() method:**
```python
def run_stage_1(self):
"""Stage 1: Keywords → Clusters (AI)"""
stage_number = 1
stage_name = "Keywords → Clusters (AI)"
start_time = time.time()
# Query pending keywords
pending_keywords = Keywords.objects.filter(
site=self.site,
status='new'
)
total_count = pending_keywords.count()
# NEW: Pre-stage validation for minimum keywords
keyword_ids = list(pending_keywords.values_list('id', flat=True))
min_validation = validate_minimum_keywords(
keyword_ids=keyword_ids,
account=self.account,
min_required=5
)
if not min_validation['valid']:
# Log validation failure
self.logger.log_stage_start(
self.run.run_id, self.account.id, self.site.id,
stage_number, stage_name, total_count
)
error_msg = min_validation['error']
self.logger.log_stage_error(
self.run.run_id, self.account.id, self.site.id,
stage_number, error_msg
)
# Skip stage with proper result
self.run.stage_1_result = {
'keywords_processed': 0,
'clusters_created': 0,
'skipped': True,
'skip_reason': error_msg,
'credits_used': 0
}
self.run.current_stage = 2
self.run.save()
logger.warning(f"[AutomationService] Stage 1 skipped: {error_msg}")
return
# Log stage start
self.logger.log_stage_start(
self.run.run_id, self.account.id, self.site.id,
stage_number, stage_name, total_count
)
# ... rest of existing stage logic ...
```
### Step 4: Update API Endpoint
**File:** `backend/igny8_core/modules/planner/views.py` (KeywordsViewSet)
**Update auto_cluster action:**
```python
@action(detail=False, methods=['post'], url_path='auto_cluster', url_name='auto_cluster')
def auto_cluster(self, request):
"""Auto-cluster keywords using AI"""
from igny8_core.ai.tasks import run_ai_task
from igny8_core.ai.validators.cluster_validators import validate_minimum_keywords
account = getattr(request, 'account', None)
keyword_ids = request.data.get('ids', [])
if not keyword_ids:
return error_response(
error='No keyword IDs provided',
status_code=status.HTTP_400_BAD_REQUEST,
request=request
)
# NEW: Validate minimum keywords BEFORE queuing task
validation = validate_minimum_keywords(
keyword_ids=keyword_ids,
account=account,
min_required=5
)
if not validation['valid']:
return error_response(
error=validation['error'],
status_code=status.HTTP_400_BAD_REQUEST,
request=request,
extra_data={
'count': validation.get('count'),
'required': validation.get('required')
}
)
# Validation passed - proceed with clustering
account_id = account.id if account else None
try:
if hasattr(run_ai_task, 'delay'):
task = run_ai_task.delay(
function_name='auto_cluster',
payload={'keyword_ids': keyword_ids},
account_id=account_id
)
return success_response(
data={'task_id': str(task.id)},
message=f'Auto-cluster started with {validation["count"]} keywords',
request=request
)
else:
# Synchronous fallback
result = run_ai_task(
function_name='auto_cluster',
payload={'keyword_ids': keyword_ids},
account_id=account_id
)
return success_response(data=result, request=request)
except Exception as e:
logger.error(f"Failed to start auto-cluster: {e}", exc_info=True)
return error_response(
error=f'Failed to start clustering: {str(e)}',
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
request=request
)
```
### Step 5: Add Frontend Validation (Optional but Recommended)
**File:** `frontend/src/pages/Planner/Keywords.tsx`
**Update handleAutoCluster function:**
```typescript
const handleAutoCluster = async () => {
try {
const selectedIds = selectedKeywords.map(k => k.id);
// Frontend validation (pre-check before API call)
if (selectedIds.length < 5) {
toast.error(
`Please select at least 5 keywords for auto-clustering. Currently selected: ${selectedIds.length}`,
{ duration: 5000 }
);
return;
}
// Check total available
const availableCount = keywords.filter(k => k.status === 'new').length;
if (availableCount < 5) {
toast.error(
`Not enough keywords available. Need at least 5 keywords, but only ${availableCount} exist.`,
{ duration: 5000 }
);
return;
}
// Proceed with API call
const result = await autoClusterKeywords(selectedIds);
if (result.task_id) {
toast.success(`Auto-cluster started with ${selectedIds.length} keywords`);
setTaskId(result.task_id);
} else {
toast.error('Failed to start auto-cluster');
}
} catch (error: any) {
// Backend validation error (in case frontend check was bypassed)
const errorMsg = error.response?.data?.error || error.message;
toast.error(errorMsg);
}
};
```
---
## 🗂️ FILE STRUCTURE
### New Files
```
backend/igny8_core/ai/validators/
├── __init__.py
└── cluster_validators.py (NEW)
```
### Modified Files
```
backend/igny8_core/ai/functions/auto_cluster.py
backend/igny8_core/business/automation/services/automation_service.py
backend/igny8_core/modules/planner/views.py
frontend/src/pages/Planner/Keywords.tsx
```
---
## 🧪 TESTING PLAN
### Unit Tests
**File:** `backend/igny8_core/ai/validators/tests/test_cluster_validators.py`
```python
import pytest
from django.test import TestCase
from igny8_core.ai.validators.cluster_validators import (
validate_minimum_keywords,
validate_keyword_selection
)
from igny8_core.modules.planner.models import Keywords
from igny8_core.auth.models import Account, Site
class ClusterValidatorsTestCase(TestCase):
def setUp(self):
self.account = Account.objects.create(name='Test Account')
self.site = Site.objects.create(name='Test Site', account=self.account)
def test_validate_minimum_keywords_success(self):
"""Test with sufficient keywords (>= 5)"""
# Create 10 keywords
keyword_ids = []
for i in range(10):
kw = Keywords.objects.create(
keyword=f'keyword {i}',
status='new',
account=self.account,
site=self.site
)
keyword_ids.append(kw.id)
result = validate_minimum_keywords(keyword_ids, self.account)
assert result['valid'] is True
assert result['count'] == 10
assert result['required'] == 5
def test_validate_minimum_keywords_failure(self):
"""Test with insufficient keywords (< 5)"""
# Create only 3 keywords
keyword_ids = []
for i in range(3):
kw = Keywords.objects.create(
keyword=f'keyword {i}',
status='new',
account=self.account,
site=self.site
)
keyword_ids.append(kw.id)
result = validate_minimum_keywords(keyword_ids, self.account)
assert result['valid'] is False
assert 'Insufficient keywords' in result['error']
assert result['count'] == 3
assert result['required'] == 5
def test_validate_minimum_keywords_edge_case_exactly_5(self):
"""Test with exactly 5 keywords (boundary)"""
keyword_ids = []
for i in range(5):
kw = Keywords.objects.create(
keyword=f'keyword {i}',
status='new',
account=self.account,
site=self.site
)
keyword_ids.append(kw.id)
result = validate_minimum_keywords(keyword_ids, self.account)
assert result['valid'] is True
assert result['count'] == 5
def test_validate_keyword_selection_insufficient(self):
"""Test frontend selection validation"""
result = validate_keyword_selection(
selected_ids=[1, 2, 3], # Only 3
available_count=10,
min_required=5
)
assert result['valid'] is False
assert result['type'] == 'INSUFFICIENT_SELECTION'
assert result['selected'] == 3
assert result['required'] == 5
```
### Integration Tests
```python
class AutoClusterIntegrationTestCase(TestCase):
def test_auto_cluster_with_insufficient_keywords(self):
"""Test auto-cluster endpoint rejects < 5 keywords"""
# Create only 3 keywords
keyword_ids = self._create_keywords(3)
response = self.client.post(
'/api/planner/keywords/auto_cluster/',
data={'ids': keyword_ids},
HTTP_AUTHORIZATION=f'Bearer {self.token}'
)
assert response.status_code == 400
assert 'Insufficient keywords' in response.json()['error']
def test_automation_skips_stage_1_with_insufficient_keywords(self):
"""Test automation skips Stage 1 if < 5 keywords"""
# Create only 2 keywords
self._create_keywords(2)
# Start automation
run_id = self.automation_service.start_automation('manual')
# Verify Stage 1 was skipped
run = AutomationRun.objects.get(run_id=run_id)
assert run.stage_1_result['skipped'] is True
assert 'Insufficient keywords' in run.stage_1_result['skip_reason']
assert run.current_stage == 2 # Moved to next stage
```
### Manual Test Cases
- [ ] **Test 1:** Try auto-cluster with 0 keywords selected
- Expected: Error message "No keywords selected"
- [ ] **Test 2:** Try auto-cluster with 3 keywords selected
- Expected: Error message "Please select at least 5 keywords. Currently selected: 3"
- [ ] **Test 3:** Try auto-cluster with exactly 5 keywords
- Expected: Success, clustering starts
- [ ] **Test 4:** Run automation with 2 keywords in site
- Expected: Stage 1 skipped with warning in logs
- [ ] **Test 5:** Run automation with 10 keywords in site
- Expected: Stage 1 runs normally
---
## 📊 ERROR MESSAGES
### Frontend (User-Facing)
**No Selection:**
```
❌ No keywords selected
Please select keywords to cluster.
```
**Insufficient Selection:**
```
❌ Please select at least 5 keywords for auto-clustering
Currently selected: 3 keywords
You need at least 5 keywords to create meaningful clusters.
```
**Insufficient Available:**
```
❌ Not enough keywords available
Need at least 5 keywords, but only 2 exist.
Add more keywords before running auto-cluster.
```
### Backend (Logs)
**Validation Failed:**
```
[AutoCluster] Validation failed: Insufficient keywords for clustering. Need at least 5 keywords, but only 3 available.
```
**Validation Passed:**
```
[AutoCluster] Validation passed: 15 keywords available (min: 5)
```
**Automation Stage Skipped:**
```
[AutomationService] Stage 1 skipped: Insufficient keywords for clustering. Need at least 5 keywords, but only 2 available.
```
---
## 🎯 CONFIGURATION
### Constants File
**File:** `backend/igny8_core/ai/constants.py` (or create if doesn't exist)
```python
"""
AI Function Configuration Constants
"""
# Cluster Configuration
MIN_KEYWORDS_FOR_CLUSTERING = 5 # Minimum keywords needed for meaningful clusters
OPTIMAL_KEYWORDS_FOR_CLUSTERING = 20 # Recommended for best results
# Other AI limits...
```
**Usage in validators:**
```python
from igny8_core.ai.constants import MIN_KEYWORDS_FOR_CLUSTERING
def validate_minimum_keywords(keyword_ids, account=None):
min_required = MIN_KEYWORDS_FOR_CLUSTERING
# ... validation logic
```
---
## 🔄 SHARED VALIDATION PATTERN
### Why This Approach Works
**✅ Single Source of Truth:**
- One function: `validate_minimum_keywords()`
- Used by both auto-cluster function and automation
- Update in one place applies everywhere
**✅ Consistent Behavior:**
- Same error messages
- Same validation logic
- Same minimum requirements
**✅ Easy to Maintain:**
- Want to change minimum from 5 to 10? Change one constant
- Want to add new validation? Add to one function
- Want to test? Test one module
**✅ No Code Duplication:**
- DRY principle followed
- Reduces bugs from inconsistency
- Easier code review
### Pattern for Future Validators
```python
# backend/igny8_core/ai/validators/content_validators.py
def validate_minimum_content_length(content_text: str, min_words: int = 100):
"""
Shared validator for content minimum length
Used by: GenerateContentFunction, Automation Stage 4, Content creation
"""
word_count = len(content_text.split())
if word_count < min_words:
return {
'valid': False,
'error': f'Content too short. Minimum {min_words} words required, got {word_count}.'
}
return {'valid': True, 'word_count': word_count}
```
---
## 🚀 IMPLEMENTATION STEPS
### Phase 1: Create Validator (Day 1)
- [ ] Create `cluster_validators.py`
- [ ] Implement `validate_minimum_keywords()`
- [ ] Implement `validate_keyword_selection()`
- [ ] Write unit tests
### Phase 2: Integrate Backend (Day 1)
- [ ] Update `AutoClusterFunction.validate()`
- [ ] Update `AutomationService.run_stage_1()`
- [ ] Update `KeywordsViewSet.auto_cluster()`
- [ ] Write integration tests
### Phase 3: Frontend (Day 2)
- [ ] Add frontend validation in Keywords page
- [ ] Add user-friendly error messages
- [ ] Test error scenarios
### Phase 4: Testing & Deployment (Day 2)
- [ ] Run all tests
- [ ] Manual QA testing
- [ ] Deploy to production
- [ ] Monitor first few auto-cluster runs
---
## 🎯 SUCCESS CRITERIA
✅ Auto-cluster returns error if < 5 keywords selected
✅ Automation skips Stage 1 if < 5 keywords available
✅ Both use same validation function (no duplication)
✅ Clear error messages guide users
✅ Frontend validation provides instant feedback
✅ Backend validation catches edge cases
✅ All tests pass
✅ No regression in existing functionality
---
## 📈 FUTURE ENHANCEMENTS
### V2 Features
1. **Configurable Minimum:**
- Allow admin to set minimum via settings
- Default: 5, Range: 3-20
2. **Quality Scoring:**
- Show quality indicator based on keyword count
- 5-10: "Fair", 11-20: "Good", 21+: "Excellent"
3. **Smart Recommendations:**
- "You have 4 keywords. Add 1 more for best results"
- "15 keywords selected. Good for clustering!"
4. **Batch Size Validation:**
- Warn if too many keywords selected (> 100)
- Suggest splitting into multiple runs
---
## END OF PLAN
This plan ensures robust, consistent validation for auto-cluster across all entry points (manual and automation) using shared, well-tested validation logic.