igny8/docs/10-BACKEND/automation/SCHEDULER.md

# Automation Scheduler

## Purpose
Describe how scheduled runs are detected, triggered, and resumed using Celery tasks and automation configs.

## Code Locations (exact paths)
- Celery tasks: `backend/igny8_core/business/automation/tasks.py`
- Models: `backend/igny8_core/business/automation/models.py`
- Service invoked: `backend/igny8_core/business/automation/services/automation_service.py`

## High-Level Responsibilities
- Periodically scan enabled automation configs to start scheduled runs.
- Prevent overlapping runs per site via cache locks and active run checks.
- Resume paused runs from their recorded stage.

## Detailed Behavior
- `check_scheduled_automations` (Celery, hourly):
  - Iterates `AutomationConfig` with `is_enabled=True`.
  - Frequency rules:
    - `daily`: run when current hour matches `scheduled_time.hour`.
    - `weekly`: run Mondays at the scheduled hour.
    - `monthly`: run on the 1st of the month at the scheduled hour.
  - Skips if `last_run_at` is within ~23 hours or if an `AutomationRun` with `status='running'` exists for the site.
  - On trigger: instantiates `AutomationService(account, site)`, calls `start_automation(trigger_type='scheduled')`, updates `last_run_at` and `next_run_at` (via `_calculate_next_run`), saves config, and enqueues `run_automation_task.delay(run_id)`.
  - Exceptions are logged per site; lock release is handled by the service on failure paths.
- `run_automation_task`:
  - Loads service via `from_run_id`, runs stages 1–7 sequentially.
  - On exception: marks run failed, records error/completed_at, and deletes site lock.
- `resume_automation_task` / alias `continue_automation_task`:
  - Loads service via `from_run_id`, uses `current_stage` to continue remaining stages.
  - On exception: marks run failed, records error/completed_at.
- `_calculate_next_run`:
  - Computes next run datetime based on frequency and `scheduled_time`, resetting seconds/microseconds; handles month rollover for monthly frequency.

## Data Structures / Models Involved (no code)
- `AutomationConfig`: contains schedule fields (`frequency`, `scheduled_time`, `last_run_at`, `next_run_at`, `is_enabled`).
- `AutomationRun`: records run status/stage used during resume/failure handling.

## Execution Flow
1) Celery beat (or cron) invokes `check_scheduled_automations` hourly.
2) Eligible configs spawn new runs via `AutomationService.start_automation` (includes lock + credit check).
3) `run_automation_task` executes the pipeline asynchronously.
4) Paused runs can be resumed by enqueueing `resume_automation_task`/`continue_automation_task`, which restart at `current_stage`.
5) Failures set run status to `failed` and release locks.

## Cross-Module Interactions
- Uses planner/writer data inside the pipeline (see pipeline doc); billing/credits enforced at start.
- Locking is done via Django cache, independent of other modules but prevents concurrent Celery runs per site.

## State Transitions
- Config timestamps (`last_run_at`, `next_run_at`) update on scheduled launch.
- Run status changes to `failed` on task exceptions; to `completed` at stage 7; to `paused/cancelled` via API.

## Error Handling
- Scheduled start is skipped with log messages if recently run or already running.
- Exceptions during run execution mark the run failed, record error message, set `completed_at`, and release the cache lock.

## Tenancy Rules
- Configs and runs are site- and account-scoped; scheduler uses stored account/site from the config; no cross-tenant scheduling.

## Billing Rules
- Start uses `AutomationService.start_automation`, which enforces credit sufficiency before scheduling the Celery execution.

## Background Tasks / Schedulers
- Hourly `check_scheduled_automations` plus the long-running `run_automation_task` and resume tasks run in Celery workers.

## Key Design Considerations
- Hourly scan with coarse matching keeps implementation simple while honoring per-site schedules.
- Cache lock and active-run checks prevent double-starts from overlapping schedules or manual triggers.
- Resume task reuses the same stage methods to keep behavior consistent between fresh and resumed runs.

## How Developers Should Work With This Module
- When adding new frequencies, extend `check_scheduled_automations` and `_calculate_next_run` consistently.
- Ensure Celery beat (or an equivalent scheduler) runs `check_scheduled_automations` hourly in production.
- Preserve lock acquisition and failure handling when modifying task flows to avoid orphaned locks.