Files
igny8/docs/plans/automation/AUTOMATION-ENHANCEMENT-PLAN.md
2026-01-17 17:47:16 +00:00

28 KiB

Automation System Enhancement Plan

Created: January 17, 2026
Updated: January 17, 2026 (IMPLEMENTATION COMPLETE) Status: ALL PHASES COMPLETE
Priority: 🔴 CRITICAL - Blocks Production Launch


Implementation Progress

PHASE 1: Bug Fixes (COMPLETE)

  1. Bug #1: Cancel releases lock - views.py
  2. Bug #2: Scheduled check includes 'paused' - tasks.py
  3. Bug #3: Resume reacquires lock - tasks.py
  4. Bug #4: Resume has pause/cancel checks - tasks.py
  5. Bug #5: Pause logs to files - views.py
  6. Bug #6: Resume exception releases lock - tasks.py

PHASE 2: Per-Run Item Limits (COMPLETE)

  • Added 8 new fields to AutomationConfig model:
    • max_keywords_per_run, max_clusters_per_run, max_ideas_per_run
    • max_tasks_per_run, max_content_per_run, max_images_per_run
    • max_approvals_per_run, max_credits_per_run
  • Migration: 0014_automation_per_run_limits.py
  • Service: Updated automation_service.py with _get_per_run_limit(), _apply_per_run_limit(), _check_credit_budget()
  • API: Updated config endpoints in views.py

PHASE 3: Publishing Settings Overhaul (COMPLETE)

  • Added scheduling modes: time_slots, stagger, immediate
  • New fields: scheduling_mode, stagger_start_time, stagger_end_time, stagger_interval_minutes, queue_limit
  • Migration: 0015_publishing_settings_overhaul.py
  • Scheduler: Updated _calculate_available_slots() with three mode handlers

PHASE 4: Credit % Allocation per AI Function (COMPLETE)

  • New model: SiteAIBudgetAllocation in billing/models.py
  • Default allocations: 15% clustering, 10% ideas, 40% content, 5% prompts, 30% images
  • Migration: 0016_site_ai_budget_allocation.py
  • API: New viewset at /api/v1/billing/sites/{site_id}/ai-budget/

PHASE 5: UI Updates (COMPLETE)

  • Updated AutomationConfig interface in automationService.ts with new per-run limit fields
  • GlobalProgressBar already implements correct calculation using initial_snapshot

Migrations To Run

cd /data/app/igny8/backend
python manage.py migrate

Files Modified

Backend

  • backend/igny8_core/business/automation/views.py - Cancel releases lock, pause logs
  • backend/igny8_core/business/automation/tasks.py - Resume fixes, scheduled check
  • backend/igny8_core/business/automation/models.py - Per-run limit fields
  • backend/igny8_core/business/automation/services/automation_service.py - Limit enforcement
  • backend/igny8_core/business/integration/models.py - Publishing modes
  • backend/igny8_core/business/billing/models.py - SiteAIBudgetAllocation
  • backend/igny8_core/modules/billing/views.py - AI budget viewset
  • backend/igny8_core/modules/billing/urls.py - AI budget route
  • backend/igny8_core/modules/integration/views.py - Publishing serializer
  • backend/igny8_core/tasks/publishing_scheduler.py - Scheduling modes

Frontend

  • frontend/src/services/automationService.ts - Config interface updated

Migrations

  • backend/migrations/0014_automation_per_run_limits.py
  • backend/migrations/0015_publishing_settings_overhaul.py
  • backend/migrations/0016_site_ai_budget_allocation.py

Executive Summary

This plan addresses critical automation bugs and introduces 4 major enhancements:

  1. Fix Critical Automation Bugs - Lock management, scheduled runs, logging
  2. Credit Budget Allocation - Configurable % per AI function
  3. Publishing Schedule Overhaul - Robust, predictable scheduling
  4. Per-Run Item Limits - Control throughput per automation run

Part 1: Critical Bug Fixes COMPLETE

🔴 BUG #1: Cancel Action Doesn't Release Lock

Location: backend/igny8_core/business/automation/views.py line ~1614

Current Code:

def cancel_automation(self, request):
    run.status = 'cancelled'
    run.cancelled_at = timezone.now()
    run.completed_at = timezone.now()
    run.save(update_fields=['status', 'cancelled_at', 'completed_at'])
    # ❌ MISSING: cache.delete(f'automation_lock_{run.site.id}')

Fix:

def cancel_automation(self, request):
    run.status = 'cancelled'
    run.cancelled_at = timezone.now()
    run.completed_at = timezone.now()
    run.save(update_fields=['status', 'cancelled_at', 'completed_at'])
    
    # Release the lock so user can start new automation
    from django.core.cache import cache
    cache.delete(f'automation_lock_{run.site.id}')
    
    # Log the cancellation
    from igny8_core.business.automation.services.automation_logger import AutomationLogger
    logger = AutomationLogger()
    logger.log_stage_progress(
        run.run_id, run.account.id, run.site.id, run.current_stage,
        f"Automation cancelled by user"
    )

Impact: Users can immediately start new automation after cancelling


🔴 BUG #2: Scheduled Automation Doesn't Check 'paused' Status

Location: backend/igny8_core/business/automation/tasks.py line ~52

Current Code:

# Check if already running
if AutomationRun.objects.filter(site=config.site, status='running').exists():
    logger.info(f"[AutomationTask] Skipping site {config.site.id} - already running")
    continue

Fix:

# Check if already running OR paused
if AutomationRun.objects.filter(site=config.site, status__in=['running', 'paused']).exists():
    logger.info(f"[AutomationTask] Skipping site {config.site.id} - automation in progress (running/paused)")
    continue

Impact: Prevents duplicate runs when one is paused


🔴 BUG #3: Resume Doesn't Reacquire Lock

Location: backend/igny8_core/business/automation/tasks.py line ~164

Current Code:

def resume_automation_task(self, run_id: str):
    service = AutomationService.from_run_id(run_id)
    # ❌ No lock check - could run unprotected after 6hr expiry

Fix:

def resume_automation_task(self, run_id: str):
    """Resume paused automation run from current stage"""
    logger.info(f"[AutomationTask] Resuming automation run: {run_id}")
    
    try:
        run = AutomationRun.objects.get(run_id=run_id)
        
        # Verify run is actually in 'running' status (set by views.resume)
        if run.status != 'running':
            logger.warning(f"[AutomationTask] Run {run_id} status is {run.status}, not 'running'. Aborting resume.")
            return
        
        # Reacquire lock in case it expired during long pause
        from django.core.cache import cache
        lock_key = f'automation_lock_{run.site.id}'
        
        # Try to acquire - if fails, another run may have started
        if not cache.add(lock_key, 'locked', timeout=21600):
            # Check if WE still own it (compare run_id if stored)
            existing = cache.get(lock_key)
            if existing and existing != 'locked':
                logger.warning(f"[AutomationTask] Lock held by different run. Aborting resume for {run_id}")
                run.status = 'failed'
                run.error_message = 'Lock acquired by another run during pause'
                run.save()
                return
            # Lock exists but may be ours - proceed cautiously
        
        service = AutomationService.from_run_id(run_id)
        # ... rest of processing with pause/cancel checks between stages

🔴 BUG #4: Resume Missing Pause/Cancel Checks Between Stages

Location: backend/igny8_core/business/automation/tasks.py line ~183

Current Code:

for stage in range(run.current_stage - 1, 7):
    if stage_enabled[stage]:
        stage_methods[stage]()
    # ❌ No pause/cancel check after each stage

Fix:

for stage in range(run.current_stage - 1, 7):
    if stage_enabled[stage]:
        stage_methods[stage]()
        
        # Check for pause/cancel AFTER each stage (same as run_automation_task)
        service.run.refresh_from_db()
        if service.run.status in ['paused', 'cancelled']:
            logger.info(f"[AutomationTask] Resumed automation {service.run.status} after stage {stage + 1}")
            return
    else:
        logger.info(f"[AutomationTask] Stage {stage + 1} is disabled, skipping")

🟡 BUG #5: Pause Missing File Log Entry

Location: backend/igny8_core/business/automation/views.py pause action

Fix: Add logging call:

def pause(self, request):
    # ... existing code ...
    service.pause_automation()
    
    # Log to automation files
    service.logger.log_stage_progress(
        service.run.run_id, service.account.id, service.site.id,
        service.run.current_stage, f"Automation paused by user"
    )
    
    return Response({'message': 'Automation paused'})

Part 2: Credit Budget Allocation System

Overview

Add configurable credit % allocation per AI function. Users can:

  • Use global defaults (configured by admin)
  • Override with site-specific allocations

Database Changes

Extend CreditCostConfig model:

class CreditCostConfig(models.Model):
    # ... existing fields ...
    
    # NEW: Budget allocation percentage
    budget_percentage = models.DecimalField(
        max_digits=5,
        decimal_places=2,
        default=0,
        validators=[MinValueValidator(0), MaxValueValidator(100)],
        help_text="Default % of credits allocated to this operation (0-100)"
    )

New SiteAIBudgetAllocation model:

class SiteAIBudgetAllocation(AccountBaseModel):
    """Site-specific credit budget allocation overrides"""
    
    site = models.OneToOneField(
        'igny8_core_auth.Site',
        on_delete=models.CASCADE,
        related_name='ai_budget_allocation'
    )
    
    use_global_defaults = models.BooleanField(
        default=True,
        help_text="Use global CreditCostConfig percentages"
    )
    
    # Per-operation overrides (only used when use_global_defaults=False)
    clustering_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=10)
    idea_generation_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=10)
    content_generation_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=40)
    image_prompt_extraction_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=5)
    image_generation_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=35)
    
    class Meta:
        db_table = 'igny8_site_ai_budget_allocations'

Service Changes

New BudgetAllocationService:

class BudgetAllocationService:
    @staticmethod
    def get_operation_budget(site, operation_type, total_credits):
        """
        Get credits allocated for an operation based on site settings.
        
        Args:
            site: Site instance
            operation_type: 'clustering', 'content_generation', etc.
            total_credits: Total credits available
            
        Returns:
            int: Credits allocated for this operation
        """
        allocation = SiteAIBudgetAllocation.objects.filter(site=site).first()
        
        if not allocation or allocation.use_global_defaults:
            # Use global CreditCostConfig percentages
            config = CreditCostConfig.objects.filter(
                operation_type=operation_type,
                is_active=True
            ).first()
            percentage = config.budget_percentage if config else 0
        else:
            # Use site-specific override
            field_map = {
                'clustering': 'clustering_percentage',
                'idea_generation': 'idea_generation_percentage',
                'content_generation': 'content_generation_percentage',
                'image_prompt_extraction': 'image_prompt_extraction_percentage',
                'image_generation': 'image_generation_percentage',
            }
            field = field_map.get(operation_type)
            percentage = getattr(allocation, field, 0) if field else 0
        
        return int(total_credits * (percentage / 100))

Frontend Changes

Site Settings > AI Settings Tab:

  • Add "Credit Budget Allocation" section
  • Toggle: "Use Global Defaults" / "Custom Allocation"
  • If custom: Show sliders for each operation (must sum to 100%)
  • Visual pie chart showing allocation

Part 3: Publishing Schedule Overhaul

Current Issues

  1. Limits are confusing - daily/weekly/monthly are treated as hard caps
  2. Items not getting scheduled (30% missed in last run)
  3. Time slot calculation doesn't account for stagger intervals
  4. No visibility into WHY items weren't scheduled

New Publishing Model

Replace PublishingSettings with enhanced version:

class PublishingSettings(AccountBaseModel):
    site = models.OneToOneField('igny8_core_auth.Site', on_delete=models.CASCADE)
    
    # Auto-approval/publish toggles (keep existing)
    auto_approval_enabled = models.BooleanField(default=True)
    auto_publish_enabled = models.BooleanField(default=True)
    
    # NEW: Scheduling configuration (replaces hard limits)
    scheduling_mode = models.CharField(
        max_length=20,
        choices=[
            ('slots', 'Time Slots'),  # Publish at specific times
            ('stagger', 'Staggered'),  # Spread evenly throughout day
            ('immediate', 'Immediate'),  # Publish as soon as approved
        ],
        default='slots'
    )
    
    # Time slot configuration
    publish_days = models.JSONField(
        default=['mon', 'tue', 'wed', 'thu', 'fri'],
        help_text="Days allowed for publishing"
    )
    
    publish_time_slots = models.JSONField(
        default=['09:00', '14:00', '18:00'],
        help_text="Specific times for slot mode"
    )
    
    # Stagger mode configuration
    stagger_start_time = models.TimeField(default='09:00')
    stagger_end_time = models.TimeField(default='18:00')
    stagger_interval_minutes = models.IntegerField(
        default=15,
        help_text="Minutes between publications in stagger mode"
    )
    
    # Daily TARGET (soft limit - for estimation, not blocking)
    daily_publish_target = models.IntegerField(
        default=3,
        help_text="Target articles per day (for scheduling spread)"
    )
    
    # Weekly/Monthly targets (informational only)
    weekly_publish_target = models.IntegerField(default=15)
    monthly_publish_target = models.IntegerField(default=50)
    
    # NEW: Maximum queue depth (actual limit)
    max_scheduled_queue = models.IntegerField(
        default=100,
        help_text="Maximum items that can be in 'scheduled' status at once"
    )

New Scheduling Algorithm

def calculate_publishing_slots(settings, site, count_needed):
    """
    Calculate publishing slots with NO arbitrary limits.
    
    Returns:
        List of (datetime, slot_info) tuples
    """
    slots = []
    now = timezone.now()
    
    if settings.scheduling_mode == 'immediate':
        # Return 'now' for all items
        return [(now + timedelta(seconds=i*60), {'mode': 'immediate'}) for i in range(count_needed)]
    
    elif settings.scheduling_mode == 'stagger':
        # Spread throughout each day
        return _calculate_stagger_slots(settings, site, count_needed, now)
    
    else:  # 'slots' mode
        return _calculate_time_slot_slots(settings, site, count_needed, now)


def _calculate_stagger_slots(settings, site, count_needed, now):
    """
    Stagger mode: Spread publications evenly throughout publish hours.
    """
    slots = []
    day_map = {'mon': 0, 'tue': 1, 'wed': 2, 'thu': 3, 'fri': 4, 'sat': 5, 'sun': 6}
    allowed_days = [day_map[d] for d in settings.publish_days if d in day_map]
    
    current_date = now.date()
    interval = timedelta(minutes=settings.stagger_interval_minutes)
    
    for day_offset in range(90):  # Look up to 90 days ahead
        check_date = current_date + timedelta(days=day_offset)
        
        if check_date.weekday() not in allowed_days:
            continue
        
        # Generate slots for this day
        day_start = timezone.make_aware(
            datetime.combine(check_date, settings.stagger_start_time)
        )
        day_end = timezone.make_aware(
            datetime.combine(check_date, settings.stagger_end_time)
        )
        
        # Get existing scheduled for this day
        existing = Content.objects.filter(
            site=site,
            site_status='scheduled',
            scheduled_publish_at__date=check_date
        ).values_list('scheduled_publish_at', flat=True)
        existing_times = set(existing)
        
        current_slot = day_start
        if check_date == current_date and now > day_start:
            # Start from next interval after now
            minutes_since_start = (now - day_start).total_seconds() / 60
            intervals_passed = int(minutes_since_start / settings.stagger_interval_minutes) + 1
            current_slot = day_start + timedelta(minutes=intervals_passed * settings.stagger_interval_minutes)
        
        while current_slot <= day_end and len(slots) < count_needed:
            if current_slot not in existing_times:
                slots.append((current_slot, {'mode': 'stagger', 'date': str(check_date)}))
            current_slot += interval
        
        if len(slots) >= count_needed:
            break
    
    return slots

Frontend Changes

Site Settings > Publishing Tab - Redesign:

┌─────────────────────────────────────────────────────────────────┐
│ Publishing Schedule                                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Auto-Approval: [✓] Automatically approve content                │
│  Auto-Publish:  [✓] Automatically publish approved content       │
│                                                                  │
│  ─── Scheduling Mode ───                                         │
│  ○ Time Slots - Publish at specific times each day               │
│  ● Staggered - Spread evenly throughout publish hours            │
│  ○ Immediate - Publish as soon as approved                       │
│                                                                  │
│  ─── Stagger Settings ───                                        │
│  Start Time: [09:00]  End Time: [18:00]                         │
│  Interval: [15] minutes between publications                     │
│                                                                  │
│  ─── Publish Days ───                                            │
│  [✓] Mon [✓] Tue [✓] Wed [✓] Thu [✓] Fri [ ] Sat [ ] Sun        │
│                                                                  │
│  ─── Targets (for estimation) ───                                │
│  Daily: [3]  Weekly: [15]  Monthly: [50]                        │
│                                                                  │
│  ─── Current Queue ───                                           │
│  📊 23 items scheduled  │  Queue limit: 100                     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Part 4: Per-Run Item Limits

Overview

Allow users to limit how many items are processed per automation run. This enables:

  • Balancing content production with publishing capacity
  • Predictable credit usage per run
  • Gradual pipeline processing

Database Changes

Extend AutomationConfig:

class AutomationConfig(models.Model):
    # ... existing fields ...
    
    # NEW: Per-run limits (0 = unlimited)
    max_keywords_per_run = models.IntegerField(
        default=0,
        help_text="Max keywords to cluster per run (0=unlimited)"
    )
    max_clusters_per_run = models.IntegerField(
        default=0,
        help_text="Max clusters to generate ideas for per run (0=unlimited)"
    )
    max_ideas_per_run = models.IntegerField(
        default=0,
        help_text="Max ideas to convert to tasks per run (0=unlimited)"
    )
    max_tasks_per_run = models.IntegerField(
        default=0,
        help_text="Max tasks to generate content for per run (0=unlimited)"
    )
    max_content_per_run = models.IntegerField(
        default=0,
        help_text="Max content to extract image prompts for per run (0=unlimited)"
    )
    max_images_per_run = models.IntegerField(
        default=0,
        help_text="Max images to generate per run (0=unlimited)"
    )
    max_approvals_per_run = models.IntegerField(
        default=0,
        help_text="Max content to auto-approve per run (0=unlimited)"
    )

Service Changes

Modify stage methods to respect limits:

def run_stage_1(self):
    """Stage 1: Keywords → Clusters"""
    # ... existing setup ...
    
    # Apply per-run limit
    max_per_run = self.config.max_keywords_per_run
    if max_per_run > 0:
        pending_keywords = pending_keywords[:max_per_run]
        self.logger.log_stage_progress(
            self.run.run_id, self.account.id, self.site.id,
            1, f"Per-run limit: Processing up to {max_per_run} keywords"
        )
    
    total_count = pending_keywords.count()
    # ... rest of processing ...

Frontend Changes

Automation Settings Panel - Enhanced:

┌─────────────────────────────────────────────────────────────────┐
│ Per-Run Limits                                                   │
│ Control how much is processed in each automation run            │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Stage 1: Keywords → Clusters                                    │
│  [  50  ] keywords per run │ Current pending: 150               │
│  ⚡ Will take ~3 runs to process all                            │
│                                                                  │
│  Stage 2: Clusters → Ideas                                       │
│  [  10  ] clusters per run │ Current pending: 25                │
│                                                                  │
│  Stage 3: Ideas → Tasks                                          │
│  [   0  ] (unlimited)      │ Current pending: 30                │
│                                                                  │
│  Stage 4: Tasks → Content                                        │
│  [   5  ] tasks per run    │ Current pending: 30                │
│  💡 Tip: Match with daily publish target for balanced flow      │
│                                                                  │
│  Stage 5: Content → Image Prompts                                │
│  [   5  ] content per run  │ Current pending: 10                │
│                                                                  │
│  Stage 6: Image Prompts → Images                                 │
│  [  20  ] images per run   │ Current pending: 50                │
│                                                                  │
│  Stage 7: Review → Approved                                      │
│  [   5  ] approvals per run│ Current in review: 15              │
│  ⚠️ Limited by publishing schedule capacity                     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Part 5: UI/UX Fixes

Automation Dashboard Issues

  1. Wrong metrics display - Fix counts to show accurate pipeline state
  2. Confusing progress bars - Use consistent calculation
  3. Missing explanations - Add tooltips explaining each metric

Run Detail Page Issues

  1. Stage results showing wrong data - Fix JSON field mapping
  2. Missing "items remaining" after partial run - Calculate from initial_snapshot
  3. No clear indication of WHY run stopped - Show stopped_reason prominently

Fixes

GlobalProgressBar.tsx - Fix progress calculation:

// Use initial_snapshot as denominator, stage results as numerator
const calculateGlobalProgress = (run: AutomationRun): number => {
  if (!run.initial_snapshot) return 0;
  
  const total = run.initial_snapshot.total_initial_items || 0;
  if (total === 0) return 0;
  
  let processed = 0;
  processed += run.stage_1_result?.keywords_processed || 0;
  processed += run.stage_2_result?.clusters_processed || 0;
  processed += run.stage_3_result?.ideas_processed || 0;
  processed += run.stage_4_result?.tasks_processed || 0;
  processed += run.stage_5_result?.content_processed || 0;
  processed += run.stage_6_result?.images_processed || 0;
  processed += run.stage_7_result?.approved_count || 0;
  
  return Math.min(100, Math.round((processed / total) * 100));
};

Implementation Order

Phase 1: Critical Bug Fixes (Day 1)

  1. Cancel releases lock
  2. Scheduled check includes 'paused'
  3. Resume reacquires lock
  4. Resume has pause/cancel checks
  5. Pause logs to files

Phase 2: Per-Run Limits (Day 2)

  1. Add model fields to AutomationConfig
  2. Migration
  3. Update automation_service.py stage methods
  4. Frontend settings panel
  5. Test with small limits

Phase 3: Publishing Overhaul (Day 3)

  1. Update PublishingSettings model
  2. Migration
  3. New scheduling algorithm
  4. Frontend redesign
  5. Test scheduling edge cases

Phase 4: Credit Budget (Day 4)

  1. Add model fields/new model
  2. Migration
  3. BudgetAllocationService
  4. Frontend AI Settings section
  5. Test budget calculations

Phase 5: UI Fixes (Day 5)

  1. Fix GlobalProgressBar
  2. Fix AutomationPage metrics
  3. Fix RunDetail display
  4. Add helpful tooltips
  5. End-to-end testing

Testing Checklist

Automation Flow

  • Manual run starts, pauses, resumes, completes
  • Manual run cancels, lock released, new run can start
  • Scheduled run starts on time
  • Scheduled run skips if manual run paused
  • Resume after 7+ hour pause works
  • Per-run limits respected
  • Remaining items processed in next run

Publishing

  • Stagger mode spreads correctly
  • Time slot mode uses exact times
  • Immediate mode publishes right away
  • No items missed due to limits
  • Queue shows accurate count

Credits

  • Budget allocation calculates correctly
  • Site override works
  • Global defaults work
  • Estimation uses budget

UI

  • Progress bar accurate during run
  • Metrics match database counts
  • Run detail shows correct stage results
  • Stopped reason displayed clearly

Rollback Plan

If issues arise:

  1. All changes in separate migrations - can rollback individually
  2. Feature flags for new behaviors (use_new_scheduling, use_budget_allocation)
  3. Keep existing fields alongside new ones initially
  4. Frontend changes are purely additive

Success Criteria

  1. Zero lock issues - Users never stuck unable to start automation
  2. 100% scheduling - All approved content gets scheduled
  3. Predictable runs - Per-run limits produce consistent results
  4. Clear visibility - UI shows exactly what's happening and why
  5. No regressions - All existing functionality continues working