# Automation Module (Code-Sourced, Dec 2025)

Single canonical reference for IGNY8 automation (backend, frontend, and runtime behavior). Replaces all prior automation docs in this folder.

---

## 1) What Automation Does
- Runs the 7-stage pipeline across Planner/Writer:
  1) Keywords → Clusters (AI)
  2) Clusters → Ideas (AI)
  3) Ideas → Tasks (Local)
  4) Tasks → Content (AI)
  5) Content → Image Prompts (AI)
  6) Image Prompts → Images (AI)
  7) Manual Review Gate (Manual)
- Per-site, per-account isolation. One run at a time per site; guarded by cache lock `automation_lock_{site_id}`.
- Scheduling via Celery beat (`automation.check_scheduled_automations`); execution via Celery tasks (`run_automation_task`, `resume_automation_task` / `continue_automation_task`).

---

## 2) Backend API (behavior + payloads)
Base: `/api/v1/automation/` (auth required; site must belong to user’s account).

- `GET config?site_id=`: returns or creates config with enable flag, frequency (`daily|weekly|monthly`), scheduled_time, stage_1..6 batch sizes, delays (`within_stage_delay`, `between_stage_delay`), last_run_at, next_run_at.
- `PUT update_config?site_id=`: same fields as above, updates in-place.
- `POST run_now?site_id=`: starts a manual run; enqueues `run_automation_task`. Fails if a run is already active or lock exists.
- `GET current_run?site_id=`: current running/paused run with status, current_stage, totals, and stage_1..7_result blobs (counts, credits, partial flags, skip reasons).
- `GET pipeline_overview?site_id=`: per-stage status counts and “pending” numbers for UI cards.
- `GET current_processing?site_id=&run_id=`: live processing snapshot for an active run; null if not running.
- `POST pause|resume|cancel?site_id=&run_id=`: pause after current item; resume from saved `current_stage`; cancel after current item and stamp cancelled_at/completed_at.
- `GET history?site_id=`: last 20 runs (id, status, trigger, timestamps, total_credits_used, current_stage).
- `GET logs?run_id=&lines=100`: tail of the per-run activity log written by AutomationLogger.
- `GET estimate?site_id=`: estimated_credits, current_balance, sufficient (balance >= 1.2x estimate).

Error behaviors:
- Missing site_id/run_id → 400.
- Site not in account → 404.
- Run not found → 404 on run-specific endpoints.
- Already running / lock held → 400 on run_now.

---

## 3) Data Model (runtime state)
- `AutomationConfig` (one per site): enable flag, schedule (frequency, time), batch sizes per stage (1–6), delays (within-stage, between-stage), last_run_at, next_run_at.
- `AutomationRun`: run_id, trigger_type (manual/scheduled), status (running/paused/cancelled/completed/failed), current_stage, timestamps (start/pause/resume/cancel/complete), total_credits_used, per-stage result JSON (stage_1_result … stage_7_result), error_message.
- Activity logs: one file per run via AutomationLogger; streamed through the `logs` endpoint.

---

## 4) How Execution Works (AutomationService)
- Start: grabs cache lock `automation_lock_{site_id}`, estimates credits, enforces 1.2x balance check, creates AutomationRun and log file.
- AI functions used: Stage 1 `AutoClusterFunction`; Stage 2 `GenerateIdeasFunction`; Stage 4 `GenerateContentFunction`; Stage 5 `GenerateImagePromptsFunction`; Stage 6 uses `process_image_generation_queue` (not the partial `generate_images` AI function).
- Stage flow (per code):
  - Stage 1 Keywords → Clusters: require ≥5 keywords (validate_minimum_keywords); batch by config; AIEngine clustering; records keywords_processed, clusters_created, batches, credits, time; skips if insufficient keywords.
  - Stage 2 Clusters → Ideas: batch by config; AIEngine ideas; records ideas_created.
  - Stage 3 Ideas → Tasks: local conversion of queued ideas to tasks; batches by config; no AI.
  - Stage 4 Tasks → Content: batch by config; AIEngine content; records content count + word totals.
  - Stage 5 Content → Image Prompts: batch by config; AIEngine image-prompts into Images (featured + in-article).
  - Stage 6 Image Prompts → Images: uses `process_image_generation_queue` with provider/model from IntegrationSettings; updates Images status.
  - Stage 7 Manual Review Gate: marks ready-for-review counts; no AI.
- Control: each stage checks `_check_should_stop` (paused/cancelled); saves partial progress (counts, credits) before returning; resume continues from `current_stage`.
- Credits: upfront estimate check (1.2x buffer) before starting; AIEngine per-call pre-checks and post-SAVE deductions; `total_credits_used` accumulates.
- Locks: acquired on start; cleared on completion or failure; also cleared on fatal errors in tasks.
- Errors: any unhandled exception marks run failed, sets error_message, logs error, clears lock; pipeline_overview/history reflect status.
- Stage result fields (persisted):
  - S1: keywords_processed, clusters_created, batches_run, credits_used, skipped/partial flags, time_elapsed.
  - S2: clusters_processed, ideas_created, batches_run, credits_used.
  - S3: ideas_processed, tasks_created, batches_run.
  - S4: tasks_processed, content_created, total_words, batches_run, credits_used.
  - S5: content_processed, prompts_created, batches_run, credits_used.
  - S6: images_processed, images_generated, batches_run.
  - S7: ready_for_review counts.

Batching & delays:
- Configurable per site; stage_1..6 batch sizes control how many items per batch; `within_stage_delay` pauses between batches; `between_stage_delay` between stages.

Scheduling:
- `check_scheduled_automations` runs hourly; respects frequency/time and last_run_at (~23h guard); skips if a run is active; sets next_run_at; starts `run_automation_task`.

Celery execution:
- `run_automation_task` runs stages 1→7 sequentially for a run_id; failures mark run failed and clear lock.
- `resume_automation_task` / `continue_automation_task` continue from saved `current_stage`.
- Workers need access to cache (locks) and IntegrationSettings (models/providers).

Image pipeline specifics:
- Stage 5 writes prompts to Images (featured + ordered in-article).
- Stage 6 generates images via queue helper; AI `generate_images` remains partial/broken and is not used by automation.

---

## 5) Frontend Behavior (AutomationPage)
- Route: `/automation`.
- What the user can do: run now, pause, resume, cancel; edit config (enable/schedule, batch sizes, delays); view activity log; view history; watch live processing card and pipeline cards update.
- Polling: every ~5s while a run is running/paused for current_run, pipeline_overview, metrics, current_processing; lighter polling when idle.
- Metrics: fetched via low-level endpoints (keywords/clusters/ideas/tasks/content/images) for authoritative counts.
- States shown: running, paused, cancelled, failed, completed; processing card shown when a run exists; pipeline cards use “pending” counts from pipeline_overview.
- Activity log: pulled from `logs` endpoint; shown in UI for live tailing.

---

## 6) Configuration & Dependencies
- Needs IntegrationSettings for AI models and image providers (OpenAI/runware).
- Requires Celery beat and workers; cache backend required for locks.
- Tenant scoping everywhere: site + account filtering on all automation queries.

---

## 7) Known Limitations and Gaps
- `generate_images` AI function is partial/broken; automation uses queue helper instead.
- Pause/Cancel stop after the current item; no mid-item abort.
- Batch defaults are conservative (e.g., stage_2=1, stage_4=1); tune per site for throughput.
- Stage 7 is manual; no automated review step.
- No automated test suite observed for automation pipeline (stage transitions, pause/resume/cancel, scheduling guards, credit estimation/deduction).
- Enhancements to consider: fix or replace `generate_images`; add mid-item abort; surface lock status/owner; broaden batch defaults after validation; add operator-facing doc in app; add tests.

---

## 8) Field/Behavior Quick Tables

### Pipeline “pending” definitions (pipeline_overview)
- Stage 1: Keywords with status `new`, cluster is null, not disabled.
- Stage 2: Clusters status `new`, not disabled, with no ideas.
- Stage 3: ContentIdeas status `new`.
- Stage 4: Tasks status `queued`.
- Stage 5: Content status `draft` with zero images.
- Stage 6: Images status `pending`.
- Stage 7: Content status `review`.

### Stage result fields (stored on AutomationRun)
- S1: keywords_processed, clusters_created, batches_run, credits_used, skipped, partial, time_elapsed.
- S2: clusters_processed, ideas_created, batches_run, credits_used.
- S3: ideas_processed, tasks_created, batches_run.
- S4: tasks_processed, content_created, total_words, batches_run, credits_used.
- S5: content_processed, prompts_created, batches_run, credits_used.
- S6: images_processed, images_generated, batches_run.
- S7: ready_for_review.

### Credit handling
- Pre-run: estimate_credits * 1.2 vs account.credits (fails if insufficient).
- Per AI call: AIEngine pre-check credits; post-SAVE deduction with cost/tokens tracked; total_credits_used aggregates deductions.

### Logging
- Per-run log file via AutomationLogger; accessed with `GET logs?run_id=&lines=`; includes stage start/progress/errors and batch info.

### Polling (frontend)
- Active run: ~5s cadence for current_run, pipeline_overview, metrics, current_processing, logs tail.
- Idle: lighter polling (current_run/pipeline_overview) to show readiness and pending counts.