28 KiB
Automation System Enhancement Plan
Created: January 17, 2026
Updated: January 17, 2026 (IMPLEMENTATION COMPLETE)
Status: ✅ ALL PHASES COMPLETE
Priority: 🔴 CRITICAL - Blocks Production Launch
Implementation Progress
✅ PHASE 1: Bug Fixes (COMPLETE)
- Bug #1: Cancel releases lock - views.py
- Bug #2: Scheduled check includes 'paused' - tasks.py
- Bug #3: Resume reacquires lock - tasks.py
- Bug #4: Resume has pause/cancel checks - tasks.py
- Bug #5: Pause logs to files - views.py
- Bug #6: Resume exception releases lock - tasks.py
✅ PHASE 2: Per-Run Item Limits (COMPLETE)
- Added 8 new fields to
AutomationConfigmodel:max_keywords_per_run,max_clusters_per_run,max_ideas_per_runmax_tasks_per_run,max_content_per_run,max_images_per_runmax_approvals_per_run,max_credits_per_run
- Migration: 0014_automation_per_run_limits.py
- Service: Updated
automation_service.pywith_get_per_run_limit(),_apply_per_run_limit(),_check_credit_budget() - API: Updated config endpoints in views.py
✅ PHASE 3: Publishing Settings Overhaul (COMPLETE)
- Added scheduling modes:
time_slots,stagger,immediate - New fields:
scheduling_mode,stagger_start_time,stagger_end_time,stagger_interval_minutes,queue_limit - Migration: 0015_publishing_settings_overhaul.py
- Scheduler: Updated
_calculate_available_slots()with three mode handlers
✅ PHASE 4: Credit % Allocation per AI Function (COMPLETE)
- New model:
SiteAIBudgetAllocationin billing/models.py - Default allocations: 15% clustering, 10% ideas, 40% content, 5% prompts, 30% images
- Migration: 0016_site_ai_budget_allocation.py
- API: New viewset at
/api/v1/billing/sites/{site_id}/ai-budget/
✅ PHASE 5: UI Updates (COMPLETE)
- Updated
AutomationConfiginterface inautomationService.tswith new per-run limit fields - GlobalProgressBar already implements correct calculation using
initial_snapshot
Migrations To Run
cd /data/app/igny8/backend
python manage.py migrate
Files Modified
Backend
backend/igny8_core/business/automation/views.py- Cancel releases lock, pause logsbackend/igny8_core/business/automation/tasks.py- Resume fixes, scheduled checkbackend/igny8_core/business/automation/models.py- Per-run limit fieldsbackend/igny8_core/business/automation/services/automation_service.py- Limit enforcementbackend/igny8_core/business/integration/models.py- Publishing modesbackend/igny8_core/business/billing/models.py- SiteAIBudgetAllocationbackend/igny8_core/modules/billing/views.py- AI budget viewsetbackend/igny8_core/modules/billing/urls.py- AI budget routebackend/igny8_core/modules/integration/views.py- Publishing serializerbackend/igny8_core/tasks/publishing_scheduler.py- Scheduling modes
Frontend
frontend/src/services/automationService.ts- Config interface updated
Migrations
backend/migrations/0014_automation_per_run_limits.pybackend/migrations/0015_publishing_settings_overhaul.pybackend/migrations/0016_site_ai_budget_allocation.py
Executive Summary
This plan addresses critical automation bugs and introduces 4 major enhancements:
- Fix Critical Automation Bugs - Lock management, scheduled runs, logging
- Credit Budget Allocation - Configurable % per AI function
- Publishing Schedule Overhaul - Robust, predictable scheduling
- Per-Run Item Limits - Control throughput per automation run
Part 1: Critical Bug Fixes ✅ COMPLETE
🔴 BUG #1: Cancel Action Doesn't Release Lock
Location: backend/igny8_core/business/automation/views.py line ~1614
Current Code:
def cancel_automation(self, request):
run.status = 'cancelled'
run.cancelled_at = timezone.now()
run.completed_at = timezone.now()
run.save(update_fields=['status', 'cancelled_at', 'completed_at'])
# ❌ MISSING: cache.delete(f'automation_lock_{run.site.id}')
Fix:
def cancel_automation(self, request):
run.status = 'cancelled'
run.cancelled_at = timezone.now()
run.completed_at = timezone.now()
run.save(update_fields=['status', 'cancelled_at', 'completed_at'])
# Release the lock so user can start new automation
from django.core.cache import cache
cache.delete(f'automation_lock_{run.site.id}')
# Log the cancellation
from igny8_core.business.automation.services.automation_logger import AutomationLogger
logger = AutomationLogger()
logger.log_stage_progress(
run.run_id, run.account.id, run.site.id, run.current_stage,
f"Automation cancelled by user"
)
Impact: Users can immediately start new automation after cancelling
🔴 BUG #2: Scheduled Automation Doesn't Check 'paused' Status
Location: backend/igny8_core/business/automation/tasks.py line ~52
Current Code:
# Check if already running
if AutomationRun.objects.filter(site=config.site, status='running').exists():
logger.info(f"[AutomationTask] Skipping site {config.site.id} - already running")
continue
Fix:
# Check if already running OR paused
if AutomationRun.objects.filter(site=config.site, status__in=['running', 'paused']).exists():
logger.info(f"[AutomationTask] Skipping site {config.site.id} - automation in progress (running/paused)")
continue
Impact: Prevents duplicate runs when one is paused
🔴 BUG #3: Resume Doesn't Reacquire Lock
Location: backend/igny8_core/business/automation/tasks.py line ~164
Current Code:
def resume_automation_task(self, run_id: str):
service = AutomationService.from_run_id(run_id)
# ❌ No lock check - could run unprotected after 6hr expiry
Fix:
def resume_automation_task(self, run_id: str):
"""Resume paused automation run from current stage"""
logger.info(f"[AutomationTask] Resuming automation run: {run_id}")
try:
run = AutomationRun.objects.get(run_id=run_id)
# Verify run is actually in 'running' status (set by views.resume)
if run.status != 'running':
logger.warning(f"[AutomationTask] Run {run_id} status is {run.status}, not 'running'. Aborting resume.")
return
# Reacquire lock in case it expired during long pause
from django.core.cache import cache
lock_key = f'automation_lock_{run.site.id}'
# Try to acquire - if fails, another run may have started
if not cache.add(lock_key, 'locked', timeout=21600):
# Check if WE still own it (compare run_id if stored)
existing = cache.get(lock_key)
if existing and existing != 'locked':
logger.warning(f"[AutomationTask] Lock held by different run. Aborting resume for {run_id}")
run.status = 'failed'
run.error_message = 'Lock acquired by another run during pause'
run.save()
return
# Lock exists but may be ours - proceed cautiously
service = AutomationService.from_run_id(run_id)
# ... rest of processing with pause/cancel checks between stages
🔴 BUG #4: Resume Missing Pause/Cancel Checks Between Stages
Location: backend/igny8_core/business/automation/tasks.py line ~183
Current Code:
for stage in range(run.current_stage - 1, 7):
if stage_enabled[stage]:
stage_methods[stage]()
# ❌ No pause/cancel check after each stage
Fix:
for stage in range(run.current_stage - 1, 7):
if stage_enabled[stage]:
stage_methods[stage]()
# Check for pause/cancel AFTER each stage (same as run_automation_task)
service.run.refresh_from_db()
if service.run.status in ['paused', 'cancelled']:
logger.info(f"[AutomationTask] Resumed automation {service.run.status} after stage {stage + 1}")
return
else:
logger.info(f"[AutomationTask] Stage {stage + 1} is disabled, skipping")
🟡 BUG #5: Pause Missing File Log Entry
Location: backend/igny8_core/business/automation/views.py pause action
Fix: Add logging call:
def pause(self, request):
# ... existing code ...
service.pause_automation()
# Log to automation files
service.logger.log_stage_progress(
service.run.run_id, service.account.id, service.site.id,
service.run.current_stage, f"Automation paused by user"
)
return Response({'message': 'Automation paused'})
Part 2: Credit Budget Allocation System
Overview
Add configurable credit % allocation per AI function. Users can:
- Use global defaults (configured by admin)
- Override with site-specific allocations
Database Changes
Extend CreditCostConfig model:
class CreditCostConfig(models.Model):
# ... existing fields ...
# NEW: Budget allocation percentage
budget_percentage = models.DecimalField(
max_digits=5,
decimal_places=2,
default=0,
validators=[MinValueValidator(0), MaxValueValidator(100)],
help_text="Default % of credits allocated to this operation (0-100)"
)
New SiteAIBudgetAllocation model:
class SiteAIBudgetAllocation(AccountBaseModel):
"""Site-specific credit budget allocation overrides"""
site = models.OneToOneField(
'igny8_core_auth.Site',
on_delete=models.CASCADE,
related_name='ai_budget_allocation'
)
use_global_defaults = models.BooleanField(
default=True,
help_text="Use global CreditCostConfig percentages"
)
# Per-operation overrides (only used when use_global_defaults=False)
clustering_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=10)
idea_generation_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=10)
content_generation_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=40)
image_prompt_extraction_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=5)
image_generation_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=35)
class Meta:
db_table = 'igny8_site_ai_budget_allocations'
Service Changes
New BudgetAllocationService:
class BudgetAllocationService:
@staticmethod
def get_operation_budget(site, operation_type, total_credits):
"""
Get credits allocated for an operation based on site settings.
Args:
site: Site instance
operation_type: 'clustering', 'content_generation', etc.
total_credits: Total credits available
Returns:
int: Credits allocated for this operation
"""
allocation = SiteAIBudgetAllocation.objects.filter(site=site).first()
if not allocation or allocation.use_global_defaults:
# Use global CreditCostConfig percentages
config = CreditCostConfig.objects.filter(
operation_type=operation_type,
is_active=True
).first()
percentage = config.budget_percentage if config else 0
else:
# Use site-specific override
field_map = {
'clustering': 'clustering_percentage',
'idea_generation': 'idea_generation_percentage',
'content_generation': 'content_generation_percentage',
'image_prompt_extraction': 'image_prompt_extraction_percentage',
'image_generation': 'image_generation_percentage',
}
field = field_map.get(operation_type)
percentage = getattr(allocation, field, 0) if field else 0
return int(total_credits * (percentage / 100))
Frontend Changes
Site Settings > AI Settings Tab:
- Add "Credit Budget Allocation" section
- Toggle: "Use Global Defaults" / "Custom Allocation"
- If custom: Show sliders for each operation (must sum to 100%)
- Visual pie chart showing allocation
Part 3: Publishing Schedule Overhaul
Current Issues
- Limits are confusing - daily/weekly/monthly are treated as hard caps
- Items not getting scheduled (30% missed in last run)
- Time slot calculation doesn't account for stagger intervals
- No visibility into WHY items weren't scheduled
New Publishing Model
Replace PublishingSettings with enhanced version:
class PublishingSettings(AccountBaseModel):
site = models.OneToOneField('igny8_core_auth.Site', on_delete=models.CASCADE)
# Auto-approval/publish toggles (keep existing)
auto_approval_enabled = models.BooleanField(default=True)
auto_publish_enabled = models.BooleanField(default=True)
# NEW: Scheduling configuration (replaces hard limits)
scheduling_mode = models.CharField(
max_length=20,
choices=[
('slots', 'Time Slots'), # Publish at specific times
('stagger', 'Staggered'), # Spread evenly throughout day
('immediate', 'Immediate'), # Publish as soon as approved
],
default='slots'
)
# Time slot configuration
publish_days = models.JSONField(
default=['mon', 'tue', 'wed', 'thu', 'fri'],
help_text="Days allowed for publishing"
)
publish_time_slots = models.JSONField(
default=['09:00', '14:00', '18:00'],
help_text="Specific times for slot mode"
)
# Stagger mode configuration
stagger_start_time = models.TimeField(default='09:00')
stagger_end_time = models.TimeField(default='18:00')
stagger_interval_minutes = models.IntegerField(
default=15,
help_text="Minutes between publications in stagger mode"
)
# Daily TARGET (soft limit - for estimation, not blocking)
daily_publish_target = models.IntegerField(
default=3,
help_text="Target articles per day (for scheduling spread)"
)
# Weekly/Monthly targets (informational only)
weekly_publish_target = models.IntegerField(default=15)
monthly_publish_target = models.IntegerField(default=50)
# NEW: Maximum queue depth (actual limit)
max_scheduled_queue = models.IntegerField(
default=100,
help_text="Maximum items that can be in 'scheduled' status at once"
)
New Scheduling Algorithm
def calculate_publishing_slots(settings, site, count_needed):
"""
Calculate publishing slots with NO arbitrary limits.
Returns:
List of (datetime, slot_info) tuples
"""
slots = []
now = timezone.now()
if settings.scheduling_mode == 'immediate':
# Return 'now' for all items
return [(now + timedelta(seconds=i*60), {'mode': 'immediate'}) for i in range(count_needed)]
elif settings.scheduling_mode == 'stagger':
# Spread throughout each day
return _calculate_stagger_slots(settings, site, count_needed, now)
else: # 'slots' mode
return _calculate_time_slot_slots(settings, site, count_needed, now)
def _calculate_stagger_slots(settings, site, count_needed, now):
"""
Stagger mode: Spread publications evenly throughout publish hours.
"""
slots = []
day_map = {'mon': 0, 'tue': 1, 'wed': 2, 'thu': 3, 'fri': 4, 'sat': 5, 'sun': 6}
allowed_days = [day_map[d] for d in settings.publish_days if d in day_map]
current_date = now.date()
interval = timedelta(minutes=settings.stagger_interval_minutes)
for day_offset in range(90): # Look up to 90 days ahead
check_date = current_date + timedelta(days=day_offset)
if check_date.weekday() not in allowed_days:
continue
# Generate slots for this day
day_start = timezone.make_aware(
datetime.combine(check_date, settings.stagger_start_time)
)
day_end = timezone.make_aware(
datetime.combine(check_date, settings.stagger_end_time)
)
# Get existing scheduled for this day
existing = Content.objects.filter(
site=site,
site_status='scheduled',
scheduled_publish_at__date=check_date
).values_list('scheduled_publish_at', flat=True)
existing_times = set(existing)
current_slot = day_start
if check_date == current_date and now > day_start:
# Start from next interval after now
minutes_since_start = (now - day_start).total_seconds() / 60
intervals_passed = int(minutes_since_start / settings.stagger_interval_minutes) + 1
current_slot = day_start + timedelta(minutes=intervals_passed * settings.stagger_interval_minutes)
while current_slot <= day_end and len(slots) < count_needed:
if current_slot not in existing_times:
slots.append((current_slot, {'mode': 'stagger', 'date': str(check_date)}))
current_slot += interval
if len(slots) >= count_needed:
break
return slots
Frontend Changes
Site Settings > Publishing Tab - Redesign:
┌─────────────────────────────────────────────────────────────────┐
│ Publishing Schedule │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Auto-Approval: [✓] Automatically approve content │
│ Auto-Publish: [✓] Automatically publish approved content │
│ │
│ ─── Scheduling Mode ─── │
│ ○ Time Slots - Publish at specific times each day │
│ ● Staggered - Spread evenly throughout publish hours │
│ ○ Immediate - Publish as soon as approved │
│ │
│ ─── Stagger Settings ─── │
│ Start Time: [09:00] End Time: [18:00] │
│ Interval: [15] minutes between publications │
│ │
│ ─── Publish Days ─── │
│ [✓] Mon [✓] Tue [✓] Wed [✓] Thu [✓] Fri [ ] Sat [ ] Sun │
│ │
│ ─── Targets (for estimation) ─── │
│ Daily: [3] Weekly: [15] Monthly: [50] │
│ │
│ ─── Current Queue ─── │
│ 📊 23 items scheduled │ Queue limit: 100 │
│ │
└─────────────────────────────────────────────────────────────────┘
Part 4: Per-Run Item Limits
Overview
Allow users to limit how many items are processed per automation run. This enables:
- Balancing content production with publishing capacity
- Predictable credit usage per run
- Gradual pipeline processing
Database Changes
Extend AutomationConfig:
class AutomationConfig(models.Model):
# ... existing fields ...
# NEW: Per-run limits (0 = unlimited)
max_keywords_per_run = models.IntegerField(
default=0,
help_text="Max keywords to cluster per run (0=unlimited)"
)
max_clusters_per_run = models.IntegerField(
default=0,
help_text="Max clusters to generate ideas for per run (0=unlimited)"
)
max_ideas_per_run = models.IntegerField(
default=0,
help_text="Max ideas to convert to tasks per run (0=unlimited)"
)
max_tasks_per_run = models.IntegerField(
default=0,
help_text="Max tasks to generate content for per run (0=unlimited)"
)
max_content_per_run = models.IntegerField(
default=0,
help_text="Max content to extract image prompts for per run (0=unlimited)"
)
max_images_per_run = models.IntegerField(
default=0,
help_text="Max images to generate per run (0=unlimited)"
)
max_approvals_per_run = models.IntegerField(
default=0,
help_text="Max content to auto-approve per run (0=unlimited)"
)
Service Changes
Modify stage methods to respect limits:
def run_stage_1(self):
"""Stage 1: Keywords → Clusters"""
# ... existing setup ...
# Apply per-run limit
max_per_run = self.config.max_keywords_per_run
if max_per_run > 0:
pending_keywords = pending_keywords[:max_per_run]
self.logger.log_stage_progress(
self.run.run_id, self.account.id, self.site.id,
1, f"Per-run limit: Processing up to {max_per_run} keywords"
)
total_count = pending_keywords.count()
# ... rest of processing ...
Frontend Changes
Automation Settings Panel - Enhanced:
┌─────────────────────────────────────────────────────────────────┐
│ Per-Run Limits │
│ Control how much is processed in each automation run │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Stage 1: Keywords → Clusters │
│ [ 50 ] keywords per run │ Current pending: 150 │
│ ⚡ Will take ~3 runs to process all │
│ │
│ Stage 2: Clusters → Ideas │
│ [ 10 ] clusters per run │ Current pending: 25 │
│ │
│ Stage 3: Ideas → Tasks │
│ [ 0 ] (unlimited) │ Current pending: 30 │
│ │
│ Stage 4: Tasks → Content │
│ [ 5 ] tasks per run │ Current pending: 30 │
│ 💡 Tip: Match with daily publish target for balanced flow │
│ │
│ Stage 5: Content → Image Prompts │
│ [ 5 ] content per run │ Current pending: 10 │
│ │
│ Stage 6: Image Prompts → Images │
│ [ 20 ] images per run │ Current pending: 50 │
│ │
│ Stage 7: Review → Approved │
│ [ 5 ] approvals per run│ Current in review: 15 │
│ ⚠️ Limited by publishing schedule capacity │
│ │
└─────────────────────────────────────────────────────────────────┘
Part 5: UI/UX Fixes
Automation Dashboard Issues
- Wrong metrics display - Fix counts to show accurate pipeline state
- Confusing progress bars - Use consistent calculation
- Missing explanations - Add tooltips explaining each metric
Run Detail Page Issues
- Stage results showing wrong data - Fix JSON field mapping
- Missing "items remaining" after partial run - Calculate from initial_snapshot
- No clear indication of WHY run stopped - Show stopped_reason prominently
Fixes
GlobalProgressBar.tsx - Fix progress calculation:
// Use initial_snapshot as denominator, stage results as numerator
const calculateGlobalProgress = (run: AutomationRun): number => {
if (!run.initial_snapshot) return 0;
const total = run.initial_snapshot.total_initial_items || 0;
if (total === 0) return 0;
let processed = 0;
processed += run.stage_1_result?.keywords_processed || 0;
processed += run.stage_2_result?.clusters_processed || 0;
processed += run.stage_3_result?.ideas_processed || 0;
processed += run.stage_4_result?.tasks_processed || 0;
processed += run.stage_5_result?.content_processed || 0;
processed += run.stage_6_result?.images_processed || 0;
processed += run.stage_7_result?.approved_count || 0;
return Math.min(100, Math.round((processed / total) * 100));
};
Implementation Order
Phase 1: Critical Bug Fixes (Day 1)
- ✅ Cancel releases lock
- ✅ Scheduled check includes 'paused'
- ✅ Resume reacquires lock
- ✅ Resume has pause/cancel checks
- ✅ Pause logs to files
Phase 2: Per-Run Limits (Day 2)
- Add model fields to AutomationConfig
- Migration
- Update automation_service.py stage methods
- Frontend settings panel
- Test with small limits
Phase 3: Publishing Overhaul (Day 3)
- Update PublishingSettings model
- Migration
- New scheduling algorithm
- Frontend redesign
- Test scheduling edge cases
Phase 4: Credit Budget (Day 4)
- Add model fields/new model
- Migration
- BudgetAllocationService
- Frontend AI Settings section
- Test budget calculations
Phase 5: UI Fixes (Day 5)
- Fix GlobalProgressBar
- Fix AutomationPage metrics
- Fix RunDetail display
- Add helpful tooltips
- End-to-end testing
Testing Checklist
Automation Flow
- Manual run starts, pauses, resumes, completes
- Manual run cancels, lock released, new run can start
- Scheduled run starts on time
- Scheduled run skips if manual run paused
- Resume after 7+ hour pause works
- Per-run limits respected
- Remaining items processed in next run
Publishing
- Stagger mode spreads correctly
- Time slot mode uses exact times
- Immediate mode publishes right away
- No items missed due to limits
- Queue shows accurate count
Credits
- Budget allocation calculates correctly
- Site override works
- Global defaults work
- Estimation uses budget
UI
- Progress bar accurate during run
- Metrics match database counts
- Run detail shows correct stage results
- Stopped reason displayed clearly
Rollback Plan
If issues arise:
- All changes in separate migrations - can rollback individually
- Feature flags for new behaviors (use_new_scheduling, use_budget_allocation)
- Keep existing fields alongside new ones initially
- Frontend changes are purely additive
Success Criteria
- Zero lock issues - Users never stuck unable to start automation
- 100% scheduling - All approved content gets scheduled
- Predictable runs - Per-run limits produce consistent results
- Clear visibility - UI shows exactly what's happening and why
- No regressions - All existing functionality continues working