igny8/docs/plans/automation/AUTOMATION-ENHANCEMENT-PLAN.md

# Automation System Enhancement Plan

**Created:** January 17, 2026
**Updated:** January 17, 2026 (IMPLEMENTATION COMPLETE)
**Status:** ✅ ALL PHASES COMPLETE
**Priority:** 🔴 CRITICAL - Blocks Production Launch

---

## Implementation Progress

### ✅ PHASE 1: Bug Fixes (COMPLETE)
1. **Bug #1:** Cancel releases lock - [views.py](../../backend/igny8_core/business/automation/views.py)
2. **Bug #2:** Scheduled check includes 'paused' - [tasks.py](../../backend/igny8_core/business/automation/tasks.py)
3. **Bug #3:** Resume reacquires lock - [tasks.py](../../backend/igny8_core/business/automation/tasks.py)
4. **Bug #4:** Resume has pause/cancel checks - [tasks.py](../../backend/igny8_core/business/automation/tasks.py)
5. **Bug #5:** Pause logs to files - [views.py](../../backend/igny8_core/business/automation/views.py)
6. **Bug #6:** Resume exception releases lock - [tasks.py](../../backend/igny8_core/business/automation/tasks.py)

### ✅ PHASE 2: Per-Run Item Limits (COMPLETE)
- Added 8 new fields to `AutomationConfig` model:
  - `max_keywords_per_run`, `max_clusters_per_run`, `max_ideas_per_run`
  - `max_tasks_per_run`, `max_content_per_run`, `max_images_per_run`
  - `max_approvals_per_run`, `max_credits_per_run`
- Migration: [0014_automation_per_run_limits.py](../../backend/migrations/0014_automation_per_run_limits.py)
- Service: Updated `automation_service.py` with `_get_per_run_limit()`, `_apply_per_run_limit()`, `_check_credit_budget()`
- API: Updated config endpoints in views.py

### ✅ PHASE 3: Publishing Settings Overhaul (COMPLETE)
- Added scheduling modes: `time_slots`, `stagger`, `immediate`
- New fields: `scheduling_mode`, `stagger_start_time`, `stagger_end_time`, `stagger_interval_minutes`, `queue_limit`
- Migration: [0015_publishing_settings_overhaul.py](../../backend/migrations/0015_publishing_settings_overhaul.py)
- Scheduler: Updated `_calculate_available_slots()` with three mode handlers

### ✅ PHASE 4: Credit % Allocation per AI Function (COMPLETE)
- New model: `SiteAIBudgetAllocation` in billing/models.py
- Default allocations: 15% clustering, 10% ideas, 40% content, 5% prompts, 30% images
- Migration: [0016_site_ai_budget_allocation.py](../../backend/migrations/0016_site_ai_budget_allocation.py)
- API: New viewset at `/api/v1/billing/sites/{site_id}/ai-budget/`

### ✅ PHASE 5: UI Updates (COMPLETE)
- Updated `AutomationConfig` interface in `automationService.ts` with new per-run limit fields
- GlobalProgressBar already implements correct calculation using `initial_snapshot`

---

## Migrations To Run

```bash
cd /data/app/igny8/backend
python manage.py migrate
```

## Files Modified

### Backend
- `backend/igny8_core/business/automation/views.py` - Cancel releases lock, pause logs
- `backend/igny8_core/business/automation/tasks.py` - Resume fixes, scheduled check
- `backend/igny8_core/business/automation/models.py` - Per-run limit fields
- `backend/igny8_core/business/automation/services/automation_service.py` - Limit enforcement
- `backend/igny8_core/business/integration/models.py` - Publishing modes
- `backend/igny8_core/business/billing/models.py` - SiteAIBudgetAllocation
- `backend/igny8_core/modules/billing/views.py` - AI budget viewset
- `backend/igny8_core/modules/billing/urls.py` - AI budget route
- `backend/igny8_core/modules/integration/views.py` - Publishing serializer
- `backend/igny8_core/tasks/publishing_scheduler.py` - Scheduling modes

### Frontend
- `frontend/src/services/automationService.ts` - Config interface updated

### Migrations
- `backend/migrations/0014_automation_per_run_limits.py`
- `backend/migrations/0015_publishing_settings_overhaul.py`
- `backend/migrations/0016_site_ai_budget_allocation.py`

---

## Executive Summary

This plan addresses critical automation bugs and introduces 4 major enhancements:
1. **Fix Critical Automation Bugs** - Lock management, scheduled runs, logging
2. **Credit Budget Allocation** - Configurable % per AI function
3. **Publishing Schedule Overhaul** - Robust, predictable scheduling
4. **Per-Run Item Limits** - Control throughput per automation run

---

## Part 1: Critical Bug Fixes ✅ COMPLETE

### 🔴 BUG #1: Cancel Action Doesn't Release Lock

**Location:** `backend/igny8_core/business/automation/views.py` line ~1614

**Current Code:**
```python
def cancel_automation(self, request):
    run.status = 'cancelled'
    run.cancelled_at = timezone.now()
    run.completed_at = timezone.now()
    run.save(update_fields=['status', 'cancelled_at', 'completed_at'])
    # ❌ MISSING: cache.delete(f'automation_lock_{run.site.id}')
```

**Fix:**
```python
def cancel_automation(self, request):
    run.status = 'cancelled'
    run.cancelled_at = timezone.now()
    run.completed_at = timezone.now()
    run.save(update_fields=['status', 'cancelled_at', 'completed_at'])

    # Release the lock so user can start new automation
    from django.core.cache import cache
    cache.delete(f'automation_lock_{run.site.id}')

    # Log the cancellation
    from igny8_core.business.automation.services.automation_logger import AutomationLogger
    logger = AutomationLogger()
    logger.log_stage_progress(
        run.run_id, run.account.id, run.site.id, run.current_stage,
        f"Automation cancelled by user"
    )
```

**Impact:** Users can immediately start new automation after cancelling

---

### 🔴 BUG #2: Scheduled Automation Doesn't Check 'paused' Status

**Location:** `backend/igny8_core/business/automation/tasks.py` line ~52

**Current Code:**
```python
# Check if already running
if AutomationRun.objects.filter(site=config.site, status='running').exists():
    logger.info(f"[AutomationTask] Skipping site {config.site.id} - already running")
    continue
```

**Fix:**
```python
# Check if already running OR paused
if AutomationRun.objects.filter(site=config.site, status__in=['running', 'paused']).exists():
    logger.info(f"[AutomationTask] Skipping site {config.site.id} - automation in progress (running/paused)")
    continue
```

**Impact:** Prevents duplicate runs when one is paused

---

### 🔴 BUG #3: Resume Doesn't Reacquire Lock

**Location:** `backend/igny8_core/business/automation/tasks.py` line ~164

**Current Code:**
```python
def resume_automation_task(self, run_id: str):
    service = AutomationService.from_run_id(run_id)
    # ❌ No lock check - could run unprotected after 6hr expiry
```

**Fix:**
```python
def resume_automation_task(self, run_id: str):
    """Resume paused automation run from current stage"""
    logger.info(f"[AutomationTask] Resuming automation run: {run_id}")

    try:
        run = AutomationRun.objects.get(run_id=run_id)

        # Verify run is actually in 'running' status (set by views.resume)
        if run.status != 'running':
            logger.warning(f"[AutomationTask] Run {run_id} status is {run.status}, not 'running'. Aborting resume.")
            return

        # Reacquire lock in case it expired during long pause
        from django.core.cache import cache
        lock_key = f'automation_lock_{run.site.id}'

        # Try to acquire - if fails, another run may have started
        if not cache.add(lock_key, 'locked', timeout=21600):
            # Check if WE still own it (compare run_id if stored)
            existing = cache.get(lock_key)
            if existing and existing != 'locked':
                logger.warning(f"[AutomationTask] Lock held by different run. Aborting resume for {run_id}")
                run.status = 'failed'
                run.error_message = 'Lock acquired by another run during pause'
                run.save()
                return
            # Lock exists but may be ours - proceed cautiously

        service = AutomationService.from_run_id(run_id)
        # ... rest of processing with pause/cancel checks between stages
```

---

### 🔴 BUG #4: Resume Missing Pause/Cancel Checks Between Stages

**Location:** `backend/igny8_core/business/automation/tasks.py` line ~183

**Current Code:**
```python
for stage in range(run.current_stage - 1, 7):
    if stage_enabled[stage]:
        stage_methods[stage]()
    # ❌ No pause/cancel check after each stage
```

**Fix:**
```python
for stage in range(run.current_stage - 1, 7):
    if stage_enabled[stage]:
        stage_methods[stage]()

        # Check for pause/cancel AFTER each stage (same as run_automation_task)
        service.run.refresh_from_db()
        if service.run.status in ['paused', 'cancelled']:
            logger.info(f"[AutomationTask] Resumed automation {service.run.status} after stage {stage + 1}")
            return
    else:
        logger.info(f"[AutomationTask] Stage {stage + 1} is disabled, skipping")
```

---

### 🟡 BUG #5: Pause Missing File Log Entry

**Location:** `backend/igny8_core/business/automation/views.py` pause action

**Fix:** Add logging call:
```python
def pause(self, request):
    # ... existing code ...
    service.pause_automation()

    # Log to automation files
    service.logger.log_stage_progress(
        service.run.run_id, service.account.id, service.site.id,
        service.run.current_stage, f"Automation paused by user"
    )

    return Response({'message': 'Automation paused'})
```

---

## Part 2: Credit Budget Allocation System

### Overview

Add configurable credit % allocation per AI function. Users can:
- Use global defaults (configured by admin)
- Override with site-specific allocations

### Database Changes

**Extend `CreditCostConfig` model:**
```python
class CreditCostConfig(models.Model):
    # ... existing fields ...

    # NEW: Budget allocation percentage
    budget_percentage = models.DecimalField(
        max_digits=5,
        decimal_places=2,
        default=0,
        validators=[MinValueValidator(0), MaxValueValidator(100)],
        help_text="Default % of credits allocated to this operation (0-100)"
    )
```

**New `SiteAIBudgetAllocation` model:**
```python
class SiteAIBudgetAllocation(AccountBaseModel):
    """Site-specific credit budget allocation overrides"""

    site = models.OneToOneField(
        'igny8_core_auth.Site',
        on_delete=models.CASCADE,
        related_name='ai_budget_allocation'
    )

    use_global_defaults = models.BooleanField(
        default=True,
        help_text="Use global CreditCostConfig percentages"
    )

    # Per-operation overrides (only used when use_global_defaults=False)
    clustering_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=10)
    idea_generation_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=10)
    content_generation_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=40)
    image_prompt_extraction_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=5)
    image_generation_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=35)

    class Meta:
        db_table = 'igny8_site_ai_budget_allocations'
```

### Service Changes

**New `BudgetAllocationService`:**
```python
class BudgetAllocationService:
    @staticmethod
    def get_operation_budget(site, operation_type, total_credits):
        """
        Get credits allocated for an operation based on site settings.

        Args:
            site: Site instance
            operation_type: 'clustering', 'content_generation', etc.
            total_credits: Total credits available

        Returns:
            int: Credits allocated for this operation
        """
        allocation = SiteAIBudgetAllocation.objects.filter(site=site).first()

        if not allocation or allocation.use_global_defaults:
            # Use global CreditCostConfig percentages
            config = CreditCostConfig.objects.filter(
                operation_type=operation_type,
                is_active=True
            ).first()
            percentage = config.budget_percentage if config else 0
        else:
            # Use site-specific override
            field_map = {
                'clustering': 'clustering_percentage',
                'idea_generation': 'idea_generation_percentage',
                'content_generation': 'content_generation_percentage',
                'image_prompt_extraction': 'image_prompt_extraction_percentage',
                'image_generation': 'image_generation_percentage',
            }
            field = field_map.get(operation_type)
            percentage = getattr(allocation, field, 0) if field else 0

        return int(total_credits * (percentage / 100))
```

### Frontend Changes

**Site Settings > AI Settings Tab:**
- Add "Credit Budget Allocation" section
- Toggle: "Use Global Defaults" / "Custom Allocation"
- If custom: Show sliders for each operation (must sum to 100%)
- Visual pie chart showing allocation

---

## Part 3: Publishing Schedule Overhaul

### Current Issues

1. Limits are confusing - daily/weekly/monthly are treated as hard caps
2. Items not getting scheduled (30% missed in last run)
3. Time slot calculation doesn't account for stagger intervals
4. No visibility into WHY items weren't scheduled

### New Publishing Model

**Replace `PublishingSettings` with enhanced version:**
```python
class PublishingSettings(AccountBaseModel):
    site = models.OneToOneField('igny8_core_auth.Site', on_delete=models.CASCADE)

    # Auto-approval/publish toggles (keep existing)
    auto_approval_enabled = models.BooleanField(default=True)
    auto_publish_enabled = models.BooleanField(default=True)

    # NEW: Scheduling configuration (replaces hard limits)
    scheduling_mode = models.CharField(
        max_length=20,
        choices=[
            ('slots', 'Time Slots'),  # Publish at specific times
            ('stagger', 'Staggered'),  # Spread evenly throughout day
            ('immediate', 'Immediate'),  # Publish as soon as approved
        ],
        default='slots'
    )

    # Time slot configuration
    publish_days = models.JSONField(
        default=['mon', 'tue', 'wed', 'thu', 'fri'],
        help_text="Days allowed for publishing"
    )

    publish_time_slots = models.JSONField(
        default=['09:00', '14:00', '18:00'],
        help_text="Specific times for slot mode"
    )

    # Stagger mode configuration
    stagger_start_time = models.TimeField(default='09:00')
    stagger_end_time = models.TimeField(default='18:00')
    stagger_interval_minutes = models.IntegerField(
        default=15,
        help_text="Minutes between publications in stagger mode"
    )

    # Daily TARGET (soft limit - for estimation, not blocking)
    daily_publish_target = models.IntegerField(
        default=3,
        help_text="Target articles per day (for scheduling spread)"
    )

    # Weekly/Monthly targets (informational only)
    weekly_publish_target = models.IntegerField(default=15)
    monthly_publish_target = models.IntegerField(default=50)

    # NEW: Maximum queue depth (actual limit)
    max_scheduled_queue = models.IntegerField(
        default=100,
        help_text="Maximum items that can be in 'scheduled' status at once"
    )
```

### New Scheduling Algorithm

```python
def calculate_publishing_slots(settings, site, count_needed):
    """
    Calculate publishing slots with NO arbitrary limits.

    Returns:
        List of (datetime, slot_info) tuples
    """
    slots = []
    now = timezone.now()

    if settings.scheduling_mode == 'immediate':
        # Return 'now' for all items
        return [(now + timedelta(seconds=i*60), {'mode': 'immediate'}) for i in range(count_needed)]

    elif settings.scheduling_mode == 'stagger':
        # Spread throughout each day
        return _calculate_stagger_slots(settings, site, count_needed, now)

    else:  # 'slots' mode
        return _calculate_time_slot_slots(settings, site, count_needed, now)


def _calculate_stagger_slots(settings, site, count_needed, now):
    """
    Stagger mode: Spread publications evenly throughout publish hours.
    """
    slots = []
    day_map = {'mon': 0, 'tue': 1, 'wed': 2, 'thu': 3, 'fri': 4, 'sat': 5, 'sun': 6}
    allowed_days = [day_map[d] for d in settings.publish_days if d in day_map]

    current_date = now.date()
    interval = timedelta(minutes=settings.stagger_interval_minutes)

    for day_offset in range(90):  # Look up to 90 days ahead
        check_date = current_date + timedelta(days=day_offset)

        if check_date.weekday() not in allowed_days:
            continue

        # Generate slots for this day
        day_start = timezone.make_aware(
            datetime.combine(check_date, settings.stagger_start_time)
        )
        day_end = timezone.make_aware(
            datetime.combine(check_date, settings.stagger_end_time)
        )

        # Get existing scheduled for this day
        existing = Content.objects.filter(
            site=site,
            site_status='scheduled',
            scheduled_publish_at__date=check_date
        ).values_list('scheduled_publish_at', flat=True)
        existing_times = set(existing)

        current_slot = day_start
        if check_date == current_date and now > day_start:
            # Start from next interval after now
            minutes_since_start = (now - day_start).total_seconds() / 60
            intervals_passed = int(minutes_since_start / settings.stagger_interval_minutes) + 1
            current_slot = day_start + timedelta(minutes=intervals_passed * settings.stagger_interval_minutes)

        while current_slot <= day_end and len(slots) < count_needed:
            if current_slot not in existing_times:
                slots.append((current_slot, {'mode': 'stagger', 'date': str(check_date)}))
            current_slot += interval

        if len(slots) >= count_needed:
            break

    return slots
```

### Frontend Changes

**Site Settings > Publishing Tab - Redesign:**

```
┌─────────────────────────────────────────────────────────────────┐
│ Publishing Schedule                                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Auto-Approval: [✓] Automatically approve content                │
│  Auto-Publish:  [✓] Automatically publish approved content       │
│                                                                  │
│  ─── Scheduling Mode ───                                         │
│  ○ Time Slots - Publish at specific times each day               │
│  ● Staggered - Spread evenly throughout publish hours            │
│  ○ Immediate - Publish as soon as approved                       │
│                                                                  │
│  ─── Stagger Settings ───                                        │
│  Start Time: [09:00]  End Time: [18:00]                         │
│  Interval: [15] minutes between publications                     │
│                                                                  │
│  ─── Publish Days ───                                            │
│  [✓] Mon [✓] Tue [✓] Wed [✓] Thu [✓] Fri [ ] Sat [ ] Sun        │
│                                                                  │
│  ─── Targets (for estimation) ───                                │
│  Daily: [3]  Weekly: [15]  Monthly: [50]                        │
│                                                                  │
│  ─── Current Queue ───                                           │
│  📊 23 items scheduled  │  Queue limit: 100                     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

---

## Part 4: Per-Run Item Limits

### Overview

Allow users to limit how many items are processed per automation run. This enables:
- Balancing content production with publishing capacity
- Predictable credit usage per run
- Gradual pipeline processing

### Database Changes

**Extend `AutomationConfig`:**
```python
class AutomationConfig(models.Model):
    # ... existing fields ...

    # NEW: Per-run limits (0 = unlimited)
    max_keywords_per_run = models.IntegerField(
        default=0,
        help_text="Max keywords to cluster per run (0=unlimited)"
    )
    max_clusters_per_run = models.IntegerField(
        default=0,
        help_text="Max clusters to generate ideas for per run (0=unlimited)"
    )
    max_ideas_per_run = models.IntegerField(
        default=0,
        help_text="Max ideas to convert to tasks per run (0=unlimited)"
    )
    max_tasks_per_run = models.IntegerField(
        default=0,
        help_text="Max tasks to generate content for per run (0=unlimited)"
    )
    max_content_per_run = models.IntegerField(
        default=0,
        help_text="Max content to extract image prompts for per run (0=unlimited)"
    )
    max_images_per_run = models.IntegerField(
        default=0,
        help_text="Max images to generate per run (0=unlimited)"
    )
    max_approvals_per_run = models.IntegerField(
        default=0,
        help_text="Max content to auto-approve per run (0=unlimited)"
    )
```

### Service Changes

**Modify stage methods to respect limits:**
```python
def run_stage_1(self):
    """Stage 1: Keywords → Clusters"""
    # ... existing setup ...

    # Apply per-run limit
    max_per_run = self.config.max_keywords_per_run
    if max_per_run > 0:
        pending_keywords = pending_keywords[:max_per_run]
        self.logger.log_stage_progress(
            self.run.run_id, self.account.id, self.site.id,
            1, f"Per-run limit: Processing up to {max_per_run} keywords"
        )

    total_count = pending_keywords.count()
    # ... rest of processing ...
```

### Frontend Changes

**Automation Settings Panel - Enhanced:**
```
┌─────────────────────────────────────────────────────────────────┐
│ Per-Run Limits                                                   │
│ Control how much is processed in each automation run            │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Stage 1: Keywords → Clusters                                    │
│  [  50  ] keywords per run │ Current pending: 150               │
│  ⚡ Will take ~3 runs to process all                            │
│                                                                  │
│  Stage 2: Clusters → Ideas                                       │
│  [  10  ] clusters per run │ Current pending: 25                │
│                                                                  │
│  Stage 3: Ideas → Tasks                                          │
│  [   0  ] (unlimited)      │ Current pending: 30                │
│                                                                  │
│  Stage 4: Tasks → Content                                        │
│  [   5  ] tasks per run    │ Current pending: 30                │
│  💡 Tip: Match with daily publish target for balanced flow      │
│                                                                  │
│  Stage 5: Content → Image Prompts                                │
│  [   5  ] content per run  │ Current pending: 10                │
│                                                                  │
│  Stage 6: Image Prompts → Images                                 │
│  [  20  ] images per run   │ Current pending: 50                │
│                                                                  │
│  Stage 7: Review → Approved                                      │
│  [   5  ] approvals per run│ Current in review: 15              │
│  ⚠️ Limited by publishing schedule capacity                     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

---

## Part 5: UI/UX Fixes

### Automation Dashboard Issues

1. **Wrong metrics display** - Fix counts to show accurate pipeline state
2. **Confusing progress bars** - Use consistent calculation
3. **Missing explanations** - Add tooltips explaining each metric

### Run Detail Page Issues

1. **Stage results showing wrong data** - Fix JSON field mapping
2. **Missing "items remaining" after partial run** - Calculate from initial_snapshot
3. **No clear indication of WHY run stopped** - Show stopped_reason prominently

### Fixes

**GlobalProgressBar.tsx - Fix progress calculation:**
```typescript
// Use initial_snapshot as denominator, stage results as numerator
const calculateGlobalProgress = (run: AutomationRun): number => {
  if (!run.initial_snapshot) return 0;

  const total = run.initial_snapshot.total_initial_items || 0;
  if (total === 0) return 0;

  let processed = 0;
  processed += run.stage_1_result?.keywords_processed || 0;
  processed += run.stage_2_result?.clusters_processed || 0;
  processed += run.stage_3_result?.ideas_processed || 0;
  processed += run.stage_4_result?.tasks_processed || 0;
  processed += run.stage_5_result?.content_processed || 0;
  processed += run.stage_6_result?.images_processed || 0;
  processed += run.stage_7_result?.approved_count || 0;

  return Math.min(100, Math.round((processed / total) * 100));
};
```

---

## Implementation Order

### Phase 1: Critical Bug Fixes (Day 1)
1. ✅ Cancel releases lock
2. ✅ Scheduled check includes 'paused'
3. ✅ Resume reacquires lock
4. ✅ Resume has pause/cancel checks
5. ✅ Pause logs to files

### Phase 2: Per-Run Limits (Day 2)
1. Add model fields to AutomationConfig
2. Migration
3. Update automation_service.py stage methods
4. Frontend settings panel
5. Test with small limits

### Phase 3: Publishing Overhaul (Day 3)
1. Update PublishingSettings model
2. Migration
3. New scheduling algorithm
4. Frontend redesign
5. Test scheduling edge cases

### Phase 4: Credit Budget (Day 4)
1. Add model fields/new model
2. Migration
3. BudgetAllocationService
4. Frontend AI Settings section
5. Test budget calculations

### Phase 5: UI Fixes (Day 5)
1. Fix GlobalProgressBar
2. Fix AutomationPage metrics
3. Fix RunDetail display
4. Add helpful tooltips
5. End-to-end testing

---

## Testing Checklist

### Automation Flow
- [ ] Manual run starts, pauses, resumes, completes
- [ ] Manual run cancels, lock released, new run can start
- [ ] Scheduled run starts on time
- [ ] Scheduled run skips if manual run paused
- [ ] Resume after 7+ hour pause works
- [ ] Per-run limits respected
- [ ] Remaining items processed in next run

### Publishing
- [ ] Stagger mode spreads correctly
- [ ] Time slot mode uses exact times
- [ ] Immediate mode publishes right away
- [ ] No items missed due to limits
- [ ] Queue shows accurate count

### Credits
- [ ] Budget allocation calculates correctly
- [ ] Site override works
- [ ] Global defaults work
- [ ] Estimation uses budget

### UI
- [ ] Progress bar accurate during run
- [ ] Metrics match database counts
- [ ] Run detail shows correct stage results
- [ ] Stopped reason displayed clearly

---

## Rollback Plan

If issues arise:
1. All changes in separate migrations - can rollback individually
2. Feature flags for new behaviors (use_new_scheduling, use_budget_allocation)
3. Keep existing fields alongside new ones initially
4. Frontend changes are purely additive

---

## Success Criteria

1. **Zero lock issues** - Users never stuck unable to start automation
2. **100% scheduling** - All approved content gets scheduled
3. **Predictable runs** - Per-run limits produce consistent results
4. **Clear visibility** - UI shows exactly what's happening and why
5. **No regressions** - All existing functionality continues working