# Automation System Enhancement Plan **Created:** January 17, 2026 **Updated:** January 17, 2026 (IMPLEMENTATION COMPLETE) **Status:** ✅ ALL PHASES COMPLETE **Priority:** 🔴 CRITICAL - Blocks Production Launch --- ## Implementation Progress ### ✅ PHASE 1: Bug Fixes (COMPLETE) 1. **Bug #1:** Cancel releases lock - [views.py](../../backend/igny8_core/business/automation/views.py) 2. **Bug #2:** Scheduled check includes 'paused' - [tasks.py](../../backend/igny8_core/business/automation/tasks.py) 3. **Bug #3:** Resume reacquires lock - [tasks.py](../../backend/igny8_core/business/automation/tasks.py) 4. **Bug #4:** Resume has pause/cancel checks - [tasks.py](../../backend/igny8_core/business/automation/tasks.py) 5. **Bug #5:** Pause logs to files - [views.py](../../backend/igny8_core/business/automation/views.py) 6. **Bug #6:** Resume exception releases lock - [tasks.py](../../backend/igny8_core/business/automation/tasks.py) ### ✅ PHASE 2: Per-Run Item Limits (COMPLETE) - Added 8 new fields to `AutomationConfig` model: - `max_keywords_per_run`, `max_clusters_per_run`, `max_ideas_per_run` - `max_tasks_per_run`, `max_content_per_run`, `max_images_per_run` - `max_approvals_per_run`, `max_credits_per_run` - Migration: [0014_automation_per_run_limits.py](../../backend/migrations/0014_automation_per_run_limits.py) - Service: Updated `automation_service.py` with `_get_per_run_limit()`, `_apply_per_run_limit()`, `_check_credit_budget()` - API: Updated config endpoints in views.py ### ✅ PHASE 3: Publishing Settings Overhaul (COMPLETE) - Added scheduling modes: `time_slots`, `stagger`, `immediate` - New fields: `scheduling_mode`, `stagger_start_time`, `stagger_end_time`, `stagger_interval_minutes`, `queue_limit` - Migration: [0015_publishing_settings_overhaul.py](../../backend/migrations/0015_publishing_settings_overhaul.py) - Scheduler: Updated `_calculate_available_slots()` with three mode handlers ### ✅ PHASE 4: Credit % Allocation per AI Function (COMPLETE) - New model: `SiteAIBudgetAllocation` in billing/models.py - Default allocations: 15% clustering, 10% ideas, 40% content, 5% prompts, 30% images - Migration: [0016_site_ai_budget_allocation.py](../../backend/migrations/0016_site_ai_budget_allocation.py) - API: New viewset at `/api/v1/billing/sites/{site_id}/ai-budget/` ### ✅ PHASE 5: UI Updates (COMPLETE) - Updated `AutomationConfig` interface in `automationService.ts` with new per-run limit fields - GlobalProgressBar already implements correct calculation using `initial_snapshot` --- ## Migrations To Run ```bash cd /data/app/igny8/backend python manage.py migrate ``` ## Files Modified ### Backend - `backend/igny8_core/business/automation/views.py` - Cancel releases lock, pause logs - `backend/igny8_core/business/automation/tasks.py` - Resume fixes, scheduled check - `backend/igny8_core/business/automation/models.py` - Per-run limit fields - `backend/igny8_core/business/automation/services/automation_service.py` - Limit enforcement - `backend/igny8_core/business/integration/models.py` - Publishing modes - `backend/igny8_core/business/billing/models.py` - SiteAIBudgetAllocation - `backend/igny8_core/modules/billing/views.py` - AI budget viewset - `backend/igny8_core/modules/billing/urls.py` - AI budget route - `backend/igny8_core/modules/integration/views.py` - Publishing serializer - `backend/igny8_core/tasks/publishing_scheduler.py` - Scheduling modes ### Frontend - `frontend/src/services/automationService.ts` - Config interface updated ### Migrations - `backend/migrations/0014_automation_per_run_limits.py` - `backend/migrations/0015_publishing_settings_overhaul.py` - `backend/migrations/0016_site_ai_budget_allocation.py` --- ## Executive Summary This plan addresses critical automation bugs and introduces 4 major enhancements: 1. **Fix Critical Automation Bugs** - Lock management, scheduled runs, logging 2. **Credit Budget Allocation** - Configurable % per AI function 3. **Publishing Schedule Overhaul** - Robust, predictable scheduling 4. **Per-Run Item Limits** - Control throughput per automation run --- ## Part 1: Critical Bug Fixes ✅ COMPLETE ### 🔴 BUG #1: Cancel Action Doesn't Release Lock **Location:** `backend/igny8_core/business/automation/views.py` line ~1614 **Current Code:** ```python def cancel_automation(self, request): run.status = 'cancelled' run.cancelled_at = timezone.now() run.completed_at = timezone.now() run.save(update_fields=['status', 'cancelled_at', 'completed_at']) # ❌ MISSING: cache.delete(f'automation_lock_{run.site.id}') ``` **Fix:** ```python def cancel_automation(self, request): run.status = 'cancelled' run.cancelled_at = timezone.now() run.completed_at = timezone.now() run.save(update_fields=['status', 'cancelled_at', 'completed_at']) # Release the lock so user can start new automation from django.core.cache import cache cache.delete(f'automation_lock_{run.site.id}') # Log the cancellation from igny8_core.business.automation.services.automation_logger import AutomationLogger logger = AutomationLogger() logger.log_stage_progress( run.run_id, run.account.id, run.site.id, run.current_stage, f"Automation cancelled by user" ) ``` **Impact:** Users can immediately start new automation after cancelling --- ### 🔴 BUG #2: Scheduled Automation Doesn't Check 'paused' Status **Location:** `backend/igny8_core/business/automation/tasks.py` line ~52 **Current Code:** ```python # Check if already running if AutomationRun.objects.filter(site=config.site, status='running').exists(): logger.info(f"[AutomationTask] Skipping site {config.site.id} - already running") continue ``` **Fix:** ```python # Check if already running OR paused if AutomationRun.objects.filter(site=config.site, status__in=['running', 'paused']).exists(): logger.info(f"[AutomationTask] Skipping site {config.site.id} - automation in progress (running/paused)") continue ``` **Impact:** Prevents duplicate runs when one is paused --- ### 🔴 BUG #3: Resume Doesn't Reacquire Lock **Location:** `backend/igny8_core/business/automation/tasks.py` line ~164 **Current Code:** ```python def resume_automation_task(self, run_id: str): service = AutomationService.from_run_id(run_id) # ❌ No lock check - could run unprotected after 6hr expiry ``` **Fix:** ```python def resume_automation_task(self, run_id: str): """Resume paused automation run from current stage""" logger.info(f"[AutomationTask] Resuming automation run: {run_id}") try: run = AutomationRun.objects.get(run_id=run_id) # Verify run is actually in 'running' status (set by views.resume) if run.status != 'running': logger.warning(f"[AutomationTask] Run {run_id} status is {run.status}, not 'running'. Aborting resume.") return # Reacquire lock in case it expired during long pause from django.core.cache import cache lock_key = f'automation_lock_{run.site.id}' # Try to acquire - if fails, another run may have started if not cache.add(lock_key, 'locked', timeout=21600): # Check if WE still own it (compare run_id if stored) existing = cache.get(lock_key) if existing and existing != 'locked': logger.warning(f"[AutomationTask] Lock held by different run. Aborting resume for {run_id}") run.status = 'failed' run.error_message = 'Lock acquired by another run during pause' run.save() return # Lock exists but may be ours - proceed cautiously service = AutomationService.from_run_id(run_id) # ... rest of processing with pause/cancel checks between stages ``` --- ### 🔴 BUG #4: Resume Missing Pause/Cancel Checks Between Stages **Location:** `backend/igny8_core/business/automation/tasks.py` line ~183 **Current Code:** ```python for stage in range(run.current_stage - 1, 7): if stage_enabled[stage]: stage_methods[stage]() # ❌ No pause/cancel check after each stage ``` **Fix:** ```python for stage in range(run.current_stage - 1, 7): if stage_enabled[stage]: stage_methods[stage]() # Check for pause/cancel AFTER each stage (same as run_automation_task) service.run.refresh_from_db() if service.run.status in ['paused', 'cancelled']: logger.info(f"[AutomationTask] Resumed automation {service.run.status} after stage {stage + 1}") return else: logger.info(f"[AutomationTask] Stage {stage + 1} is disabled, skipping") ``` --- ### 🟡 BUG #5: Pause Missing File Log Entry **Location:** `backend/igny8_core/business/automation/views.py` pause action **Fix:** Add logging call: ```python def pause(self, request): # ... existing code ... service.pause_automation() # Log to automation files service.logger.log_stage_progress( service.run.run_id, service.account.id, service.site.id, service.run.current_stage, f"Automation paused by user" ) return Response({'message': 'Automation paused'}) ``` --- ## Part 2: Credit Budget Allocation System ### Overview Add configurable credit % allocation per AI function. Users can: - Use global defaults (configured by admin) - Override with site-specific allocations ### Database Changes **Extend `CreditCostConfig` model:** ```python class CreditCostConfig(models.Model): # ... existing fields ... # NEW: Budget allocation percentage budget_percentage = models.DecimalField( max_digits=5, decimal_places=2, default=0, validators=[MinValueValidator(0), MaxValueValidator(100)], help_text="Default % of credits allocated to this operation (0-100)" ) ``` **New `SiteAIBudgetAllocation` model:** ```python class SiteAIBudgetAllocation(AccountBaseModel): """Site-specific credit budget allocation overrides""" site = models.OneToOneField( 'igny8_core_auth.Site', on_delete=models.CASCADE, related_name='ai_budget_allocation' ) use_global_defaults = models.BooleanField( default=True, help_text="Use global CreditCostConfig percentages" ) # Per-operation overrides (only used when use_global_defaults=False) clustering_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=10) idea_generation_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=10) content_generation_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=40) image_prompt_extraction_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=5) image_generation_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=35) class Meta: db_table = 'igny8_site_ai_budget_allocations' ``` ### Service Changes **New `BudgetAllocationService`:** ```python class BudgetAllocationService: @staticmethod def get_operation_budget(site, operation_type, total_credits): """ Get credits allocated for an operation based on site settings. Args: site: Site instance operation_type: 'clustering', 'content_generation', etc. total_credits: Total credits available Returns: int: Credits allocated for this operation """ allocation = SiteAIBudgetAllocation.objects.filter(site=site).first() if not allocation or allocation.use_global_defaults: # Use global CreditCostConfig percentages config = CreditCostConfig.objects.filter( operation_type=operation_type, is_active=True ).first() percentage = config.budget_percentage if config else 0 else: # Use site-specific override field_map = { 'clustering': 'clustering_percentage', 'idea_generation': 'idea_generation_percentage', 'content_generation': 'content_generation_percentage', 'image_prompt_extraction': 'image_prompt_extraction_percentage', 'image_generation': 'image_generation_percentage', } field = field_map.get(operation_type) percentage = getattr(allocation, field, 0) if field else 0 return int(total_credits * (percentage / 100)) ``` ### Frontend Changes **Site Settings > AI Settings Tab:** - Add "Credit Budget Allocation" section - Toggle: "Use Global Defaults" / "Custom Allocation" - If custom: Show sliders for each operation (must sum to 100%) - Visual pie chart showing allocation --- ## Part 3: Publishing Schedule Overhaul ### Current Issues 1. Limits are confusing - daily/weekly/monthly are treated as hard caps 2. Items not getting scheduled (30% missed in last run) 3. Time slot calculation doesn't account for stagger intervals 4. No visibility into WHY items weren't scheduled ### New Publishing Model **Replace `PublishingSettings` with enhanced version:** ```python class PublishingSettings(AccountBaseModel): site = models.OneToOneField('igny8_core_auth.Site', on_delete=models.CASCADE) # Auto-approval/publish toggles (keep existing) auto_approval_enabled = models.BooleanField(default=True) auto_publish_enabled = models.BooleanField(default=True) # NEW: Scheduling configuration (replaces hard limits) scheduling_mode = models.CharField( max_length=20, choices=[ ('slots', 'Time Slots'), # Publish at specific times ('stagger', 'Staggered'), # Spread evenly throughout day ('immediate', 'Immediate'), # Publish as soon as approved ], default='slots' ) # Time slot configuration publish_days = models.JSONField( default=['mon', 'tue', 'wed', 'thu', 'fri'], help_text="Days allowed for publishing" ) publish_time_slots = models.JSONField( default=['09:00', '14:00', '18:00'], help_text="Specific times for slot mode" ) # Stagger mode configuration stagger_start_time = models.TimeField(default='09:00') stagger_end_time = models.TimeField(default='18:00') stagger_interval_minutes = models.IntegerField( default=15, help_text="Minutes between publications in stagger mode" ) # Daily TARGET (soft limit - for estimation, not blocking) daily_publish_target = models.IntegerField( default=3, help_text="Target articles per day (for scheduling spread)" ) # Weekly/Monthly targets (informational only) weekly_publish_target = models.IntegerField(default=15) monthly_publish_target = models.IntegerField(default=50) # NEW: Maximum queue depth (actual limit) max_scheduled_queue = models.IntegerField( default=100, help_text="Maximum items that can be in 'scheduled' status at once" ) ``` ### New Scheduling Algorithm ```python def calculate_publishing_slots(settings, site, count_needed): """ Calculate publishing slots with NO arbitrary limits. Returns: List of (datetime, slot_info) tuples """ slots = [] now = timezone.now() if settings.scheduling_mode == 'immediate': # Return 'now' for all items return [(now + timedelta(seconds=i*60), {'mode': 'immediate'}) for i in range(count_needed)] elif settings.scheduling_mode == 'stagger': # Spread throughout each day return _calculate_stagger_slots(settings, site, count_needed, now) else: # 'slots' mode return _calculate_time_slot_slots(settings, site, count_needed, now) def _calculate_stagger_slots(settings, site, count_needed, now): """ Stagger mode: Spread publications evenly throughout publish hours. """ slots = [] day_map = {'mon': 0, 'tue': 1, 'wed': 2, 'thu': 3, 'fri': 4, 'sat': 5, 'sun': 6} allowed_days = [day_map[d] for d in settings.publish_days if d in day_map] current_date = now.date() interval = timedelta(minutes=settings.stagger_interval_minutes) for day_offset in range(90): # Look up to 90 days ahead check_date = current_date + timedelta(days=day_offset) if check_date.weekday() not in allowed_days: continue # Generate slots for this day day_start = timezone.make_aware( datetime.combine(check_date, settings.stagger_start_time) ) day_end = timezone.make_aware( datetime.combine(check_date, settings.stagger_end_time) ) # Get existing scheduled for this day existing = Content.objects.filter( site=site, site_status='scheduled', scheduled_publish_at__date=check_date ).values_list('scheduled_publish_at', flat=True) existing_times = set(existing) current_slot = day_start if check_date == current_date and now > day_start: # Start from next interval after now minutes_since_start = (now - day_start).total_seconds() / 60 intervals_passed = int(minutes_since_start / settings.stagger_interval_minutes) + 1 current_slot = day_start + timedelta(minutes=intervals_passed * settings.stagger_interval_minutes) while current_slot <= day_end and len(slots) < count_needed: if current_slot not in existing_times: slots.append((current_slot, {'mode': 'stagger', 'date': str(check_date)})) current_slot += interval if len(slots) >= count_needed: break return slots ``` ### Frontend Changes **Site Settings > Publishing Tab - Redesign:** ``` ┌─────────────────────────────────────────────────────────────────┐ │ Publishing Schedule │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Auto-Approval: [✓] Automatically approve content │ │ Auto-Publish: [✓] Automatically publish approved content │ │ │ │ ─── Scheduling Mode ─── │ │ ○ Time Slots - Publish at specific times each day │ │ ● Staggered - Spread evenly throughout publish hours │ │ ○ Immediate - Publish as soon as approved │ │ │ │ ─── Stagger Settings ─── │ │ Start Time: [09:00] End Time: [18:00] │ │ Interval: [15] minutes between publications │ │ │ │ ─── Publish Days ─── │ │ [✓] Mon [✓] Tue [✓] Wed [✓] Thu [✓] Fri [ ] Sat [ ] Sun │ │ │ │ ─── Targets (for estimation) ─── │ │ Daily: [3] Weekly: [15] Monthly: [50] │ │ │ │ ─── Current Queue ─── │ │ 📊 23 items scheduled │ Queue limit: 100 │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## Part 4: Per-Run Item Limits ### Overview Allow users to limit how many items are processed per automation run. This enables: - Balancing content production with publishing capacity - Predictable credit usage per run - Gradual pipeline processing ### Database Changes **Extend `AutomationConfig`:** ```python class AutomationConfig(models.Model): # ... existing fields ... # NEW: Per-run limits (0 = unlimited) max_keywords_per_run = models.IntegerField( default=0, help_text="Max keywords to cluster per run (0=unlimited)" ) max_clusters_per_run = models.IntegerField( default=0, help_text="Max clusters to generate ideas for per run (0=unlimited)" ) max_ideas_per_run = models.IntegerField( default=0, help_text="Max ideas to convert to tasks per run (0=unlimited)" ) max_tasks_per_run = models.IntegerField( default=0, help_text="Max tasks to generate content for per run (0=unlimited)" ) max_content_per_run = models.IntegerField( default=0, help_text="Max content to extract image prompts for per run (0=unlimited)" ) max_images_per_run = models.IntegerField( default=0, help_text="Max images to generate per run (0=unlimited)" ) max_approvals_per_run = models.IntegerField( default=0, help_text="Max content to auto-approve per run (0=unlimited)" ) ``` ### Service Changes **Modify stage methods to respect limits:** ```python def run_stage_1(self): """Stage 1: Keywords → Clusters""" # ... existing setup ... # Apply per-run limit max_per_run = self.config.max_keywords_per_run if max_per_run > 0: pending_keywords = pending_keywords[:max_per_run] self.logger.log_stage_progress( self.run.run_id, self.account.id, self.site.id, 1, f"Per-run limit: Processing up to {max_per_run} keywords" ) total_count = pending_keywords.count() # ... rest of processing ... ``` ### Frontend Changes **Automation Settings Panel - Enhanced:** ``` ┌─────────────────────────────────────────────────────────────────┐ │ Per-Run Limits │ │ Control how much is processed in each automation run │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Stage 1: Keywords → Clusters │ │ [ 50 ] keywords per run │ Current pending: 150 │ │ ⚡ Will take ~3 runs to process all │ │ │ │ Stage 2: Clusters → Ideas │ │ [ 10 ] clusters per run │ Current pending: 25 │ │ │ │ Stage 3: Ideas → Tasks │ │ [ 0 ] (unlimited) │ Current pending: 30 │ │ │ │ Stage 4: Tasks → Content │ │ [ 5 ] tasks per run │ Current pending: 30 │ │ 💡 Tip: Match with daily publish target for balanced flow │ │ │ │ Stage 5: Content → Image Prompts │ │ [ 5 ] content per run │ Current pending: 10 │ │ │ │ Stage 6: Image Prompts → Images │ │ [ 20 ] images per run │ Current pending: 50 │ │ │ │ Stage 7: Review → Approved │ │ [ 5 ] approvals per run│ Current in review: 15 │ │ ⚠️ Limited by publishing schedule capacity │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## Part 5: UI/UX Fixes ### Automation Dashboard Issues 1. **Wrong metrics display** - Fix counts to show accurate pipeline state 2. **Confusing progress bars** - Use consistent calculation 3. **Missing explanations** - Add tooltips explaining each metric ### Run Detail Page Issues 1. **Stage results showing wrong data** - Fix JSON field mapping 2. **Missing "items remaining" after partial run** - Calculate from initial_snapshot 3. **No clear indication of WHY run stopped** - Show stopped_reason prominently ### Fixes **GlobalProgressBar.tsx - Fix progress calculation:** ```typescript // Use initial_snapshot as denominator, stage results as numerator const calculateGlobalProgress = (run: AutomationRun): number => { if (!run.initial_snapshot) return 0; const total = run.initial_snapshot.total_initial_items || 0; if (total === 0) return 0; let processed = 0; processed += run.stage_1_result?.keywords_processed || 0; processed += run.stage_2_result?.clusters_processed || 0; processed += run.stage_3_result?.ideas_processed || 0; processed += run.stage_4_result?.tasks_processed || 0; processed += run.stage_5_result?.content_processed || 0; processed += run.stage_6_result?.images_processed || 0; processed += run.stage_7_result?.approved_count || 0; return Math.min(100, Math.round((processed / total) * 100)); }; ``` --- ## Implementation Order ### Phase 1: Critical Bug Fixes (Day 1) 1. ✅ Cancel releases lock 2. ✅ Scheduled check includes 'paused' 3. ✅ Resume reacquires lock 4. ✅ Resume has pause/cancel checks 5. ✅ Pause logs to files ### Phase 2: Per-Run Limits (Day 2) 1. Add model fields to AutomationConfig 2. Migration 3. Update automation_service.py stage methods 4. Frontend settings panel 5. Test with small limits ### Phase 3: Publishing Overhaul (Day 3) 1. Update PublishingSettings model 2. Migration 3. New scheduling algorithm 4. Frontend redesign 5. Test scheduling edge cases ### Phase 4: Credit Budget (Day 4) 1. Add model fields/new model 2. Migration 3. BudgetAllocationService 4. Frontend AI Settings section 5. Test budget calculations ### Phase 5: UI Fixes (Day 5) 1. Fix GlobalProgressBar 2. Fix AutomationPage metrics 3. Fix RunDetail display 4. Add helpful tooltips 5. End-to-end testing --- ## Testing Checklist ### Automation Flow - [ ] Manual run starts, pauses, resumes, completes - [ ] Manual run cancels, lock released, new run can start - [ ] Scheduled run starts on time - [ ] Scheduled run skips if manual run paused - [ ] Resume after 7+ hour pause works - [ ] Per-run limits respected - [ ] Remaining items processed in next run ### Publishing - [ ] Stagger mode spreads correctly - [ ] Time slot mode uses exact times - [ ] Immediate mode publishes right away - [ ] No items missed due to limits - [ ] Queue shows accurate count ### Credits - [ ] Budget allocation calculates correctly - [ ] Site override works - [ ] Global defaults work - [ ] Estimation uses budget ### UI - [ ] Progress bar accurate during run - [ ] Metrics match database counts - [ ] Run detail shows correct stage results - [ ] Stopped reason displayed clearly --- ## Rollback Plan If issues arise: 1. All changes in separate migrations - can rollback individually 2. Feature flags for new behaviors (use_new_scheduling, use_budget_allocation) 3. Keep existing fields alongside new ones initially 4. Frontend changes are purely additive --- ## Success Criteria 1. **Zero lock issues** - Users never stuck unable to start automation 2. **100% scheduling** - All approved content gets scheduled 3. **Predictable runs** - Per-run limits produce consistent results 4. **Clear visibility** - UI shows exactly what's happening and why 5. **No regressions** - All existing functionality continues working