docs re-org

2025-12-09 13:26:35 +00:00
parent 4d13a57068
commit 6a4f95c35a
231 changed files with 11353 additions and 31152 deletions
--- a/docs/10-BACKEND/automation/AUTOMATION-REFERENCE.md
+++ b/docs/10-BACKEND/automation/AUTOMATION-REFERENCE.md
@@ -0,0 +1,91 @@
+# Automation Module Reference
+
+## Purpose
+Document how the automation module orchestrates multi-stage AI pipelines, exposes API endpoints, enforces tenancy/credits, and manages runs, configs, and logging.
+
+## Code Locations (exact paths)
+- Models: `backend/igny8_core/business/automation/models.py`
+- Services: `backend/igny8_core/business/automation/services/automation_service.py`, `automation_logger.py`
+- Tasks (Celery): `backend/igny8_core/business/automation/tasks.py`
+- API views and routing: `backend/igny8_core/business/automation/views.py`, `urls.py`
+- Supporting AI functions: `backend/igny8_core/ai/functions/auto_cluster.py`, `generate_ideas.py`, `generate_content.py`, `generate_image_prompts.py`, image queue in `backend/igny8_core/ai/tasks.py`
+- Tenancy/auth context: `backend/igny8_core/auth/middleware.py`, `backend/igny8_core/api/base.py`
+
+## High-Level Responsibilities
+- Maintain per-site automation configs (batch sizes, delays, schedule, enable flag) and track run state with detailed per-stage results.
+- Provide APIs to configure, trigger, pause/resume/cancel, inspect, and log automation runs.
+- Execute seven sequential stages that transform planner/writer data via AI and local operations, with credit checks and pause/cancel handling.
+- Enforce tenant/site scoping on all automation resources and API operations.
+
+## Detailed Behavior
+- `AutomationConfig` stores enablement, frequency, scheduled time, batch sizes for stages 1–6, and within/between-stage delays. Config is created lazily per site.
+- `AutomationRun` captures run metadata: trigger type (manual/scheduled), status (`running/paused/cancelled/completed/failed`), current stage, pause/cancel timestamps, per-stage JSON results, total credits used, and error message.
+- `AutomationService` orchestrates the pipeline:
+  - Locks per site via cache (`automation_lock_{site.id}`) to prevent concurrent runs.
+  - Estimates credits before start and requires a 20% buffer over the estimate against `Account.credits`.
+  - Creates `AutomationRun` with generated `run_id` and logs start via `AutomationLogger`.
+  - Executes stages in order; each stage logs start/progress/complete, applies within/between-stage delays from config, and writes stage result JSON (counts, credits, timestamps, partial flags).
+  - Pause/cancel checks occur inside loops; state is persisted so resumed runs continue from the recorded stage.
+  - Stage credit usage is derived from AI task logs difference before/after the stage.
+- API layer (`AutomationViewSet`):
+  - `config`/`update_config` read/write `AutomationConfig` for a given `site_id` (scoped to the user’s account).
+  - `run_now` triggers `AutomationService.start_automation` and enqueues Celery `run_automation_task`.
+  - `current_run`, `history`, `logs`, `current_processing`, `estimate`, `pipeline_overview` expose run status, history, logs, credit estimates, and per-stage pending counts.
+  - `pause`, `resume`, `cancel` endpoints update run status and enqueue resume tasks when needed.
+- Celery tasks:
+  - `check_scheduled_automations` scans enabled configs hourly and triggers runs when frequency/time matches and no recent run exists.
+  - `run_automation_task` performs full pipeline execution.
+  - `resume_automation_task`/`continue_automation_task` continue a paused run from its recorded stage.
+
+## Data Structures / Models Involved (no code)
+- `AutomationConfig`, `AutomationRun` (automation state).
+- Planner models: `Keywords`, `Clusters`, `ContentIdeas`.
+- Writer models: `Tasks`, `Content`, `Images`.
+- AI task log (`AITaskLog`) for credit usage measurement.
+- Tenancy entities: `Account`, `Site` (scoping every query).
+
+## Execution Flow
+- API call → DRF auth → tenant/site resolved → viewset method → `AutomationService` operations → Celery task (for long-running execution).
+- Pipeline stages run in-process inside Celery workers, reading planner/writer data, invoking AI functions, updating models, logging progress, and writing stage results to `AutomationRun`.
+- Completion (or failure) updates run status and releases the site lock.
+
+## Cross-Module Interactions
+- Planner/writer models supply inputs and receive outputs (clusters, ideas, tasks, content, images).
+- AI engine executes clustering, idea generation, content generation, and image prompt generation; image rendering uses the AI image queue.
+- Billing credits are checked against `Account.credits`; credit usage is inferred from AI task logs (deduction logic handled in billing services when those AI calls occur).
+- Integration/publishing modules consume content/images produced downstream (outside automation).
+
+## State Transitions
+- Run status moves through `running` → (`paused`/`cancelled`/`failed`/`completed`); `current_stage` increments after each stage finishes; partial flags and timestamps mark mid-stage exits.
+- Config changes take effect on the next run; pause/resume toggles update run timestamps.
+
+## Error Handling
+- Start blocks if a run is already active for the site or cache lock is held.
+- Stage loops log and continue on per-batch/item errors; pause/cancel results are persisted mid-stage.
+- Failures in Celery run mark `AutomationRun` as failed, store error message, timestamp completion, and release the lock.
+- API endpoints return 400 for missing params or invalid state transitions, 404 for unknown runs, 500 on unexpected errors.
+
+## Tenancy Rules
+- All automation queries filter by `site` tied to the authenticated user’s `account`; config/run creation sets `account` and `site` explicitly.
+- API endpoints fetch `Site` with `account=request.user.account`; automation locks are per site.
+- No cross-tenant access; privileged role bypass is handled by DRF auth/permissions upstream.
+
+## Billing Rules
+- Start requires `Account.credits` ≥ 1.2× estimated credits; otherwise a 400 is returned.
+- Credits actually deducted by AI tasks are reflected via AI task logs and billing services (outside this module); automation aggregates usage per stage in `AutomationRun`.
+
+## Background Tasks / Schedulers
+- Hourly `check_scheduled_automations` respects config frequency/time and last run; skips if a run is already active.
+- Pipeline execution and resume steps run inside Celery tasks; within-stage sleeps apply delays from config.
+
+## Key Design Considerations
+- Single-run-per-site enforced via cache lock to prevent overlapping credit use or data contention.
+- Pause/resume/cancel is cooperative, checked inside stage loops, with partial results persisted.
+- Stage-by-stage logging and result JSON make pipeline progress observable and resumable.
+- Configurable batch sizes and delays balance throughput and API/credit usage.
+
+## How Developers Should Work With This Module
+- Use `AutomationService.start_automation` for new runs; never bypass the cache lock or credit check.
+- When extending stages, preserve pause/cancel checks, result recording, and credit delta calculation.
+- Add new API actions through `AutomationViewSet` if they manipulate automation state; keep site/account scoping.
+- For new schedulers, reuse the lock pattern and `AutomationConfig` fields, and update `next_run_at` appropriately.
--- a/docs/10-BACKEND/automation/PIPELINE-STAGES.md
+++ b/docs/10-BACKEND/automation/PIPELINE-STAGES.md
@@ -0,0 +1,102 @@
+# Automation Pipeline Stages
+
+## Purpose
+Detail the seven pipeline stages executed by `AutomationService`, including inputs, queries, validations, delays, credit handling, and state recording.
+
+## Code Locations (exact paths)
+- Orchestration: `backend/igny8_core/business/automation/services/automation_service.py`
+- Models: `backend/igny8_core/business/automation/models.py`
+- AI functions: `backend/igny8_core/ai/functions/auto_cluster.py`, `generate_ideas.py`, `generate_content.py`, `generate_image_prompts.py`
+- Image queue: `backend/igny8_core/ai/tasks.py` (`process_image_generation_queue`)
+- Stage entrypoints: `backend/igny8_core/business/automation/tasks.py` (Celery `run_automation_task`, `resume_automation_task`)
+
+## High-Level Responsibilities
+- Execute a fixed seven-stage sequence that moves data from planner keywords through content with images and into manual review readiness.
+- Enforce batch sizes/delays from `AutomationConfig`, support pause/cancel, and write per-stage results into `AutomationRun`.
+- Track credit deltas per stage using AI task log counts.
+
+## Detailed Behavior
+Across all stages:
+- Each stage logs start/progress/complete via `AutomationLogger`, respects `within_stage_delay` between batches/items, and `between_stage_delay` between stages.
+- Pause/cancel is checked inside loops; on pause/cancel, the stage records partial counts, credits, elapsed time, and reason, then exits.
+- Credits used per stage are computed from `AITaskLog` count delta relative to stage start.
+
+### Stage 1: Keywords → Clusters (AI)
+- Input query: `Keywords` where `site=current`, `status='new'`, `cluster__isnull=True`, `disabled=False`.
+- Validation: `validate_minimum_keywords` requires at least 5 keywords; if not valid, stage is skipped with result noting skip reason and `current_stage` advances to 2.
+- Processing: Batch size = `stage_1_batch_size` (capped to total). For each batch, calls `AIEngine.execute(AutoClusterFunction, payload={'ids': batch})`; waits on task ID; logs per-batch progress. Errors are logged and skipped; pipeline continues.
+- Result: counts keywords processed, clusters created since run start, batches, credits used, time elapsed; sets `current_stage=2`.
+
+### Stage 2: Clusters → Ideas (AI)
+- Pre-check: warns if any `Keywords` still pending from Stage 1.
+- Input query: `Clusters` where `site=current`, `status='new'`, `disabled=False`.
+- Processing: Iterates clusters one-by-one; for each, calls `AIEngine.execute(GenerateIdeasFunction, payload={'cluster_id': cluster.id})`; waits on task ID; logs progress. Errors are logged and skipped.
+- Result: counts clusters processed, ideas created since run start, credits used, time elapsed; sets `current_stage=3`.
+
+### Stage 3: Ideas → Tasks (Local)
+- Pre-check: warns if clusters remain without ideas.
+- Input query: `ContentIdeas` where `site=current`, `status='new'`.
+- Processing: Batched by `stage_3_batch_size`. For each idea, builds keyword string (M2M keywords or `target_keywords`) and creates `Tasks` with queued status, copying account/site/sector, cluster, content type/structure, and description. Idea status set to `queued`.
+- Result: ideas processed, tasks created, batches, time elapsed (credits 0 because local); sets `current_stage=4`.
+
+### Stage 4: Tasks → Content (AI)
+- Pre-check: warns if `ContentIdeas` remain `new`.
+- Input query: `Tasks` where `site=current`, `status='queued'`.
+- Processing: Batched by `stage_4_batch_size`. Uses `GenerateContentFunction` via `AIEngine` per batch (payload contains task IDs). Waits on task IDs, logs progress, continues on errors. Tracks total words by summing generated content word_count.
+- Result: tasks processed, content created count, total_words, credits used, time elapsed; sets `current_stage=5`.
+
+### Stage 5: Content → Image Prompts (AI)
+- Input query: `Content` where `site=current`, `status='draft'`, with zero images (annotated count=0).
+- Processing: Batched by `stage_5_batch_size`. For each batch, calls `GenerateImagePromptsFunction` via `AIEngine` (payload content IDs). Waits on task IDs, logs progress; continues on errors.
+- Result: content processed, prompts created (from AI task logs), credits used, time elapsed; sets `current_stage=6`.
+
+### Stage 6: Image Prompts → Images (AI image queue)
+- Input query: `Images` where `site=current`, `status='pending'`.
+- Processing: Iterates pending images; for each, enqueues `process_image_generation_queue.delay(image_ids=[id], account_id, content_id)` when Celery is available, or calls directly in sync fallback. Waits on task IDs with continue-on-error to avoid blocking the stage. Logs progress per image; applies within-stage delay between images.
+- Result: images processed, images generated (status `generated` since run start), content moved to `review`, credits used, time elapsed; sets `current_stage=7`.
+
+### Stage 7: Manual Review Gate (Count-only)
+- Input query: `Content` where `site=current`, `status='review'`.
+- Processing: Counts review-ready content, logs IDs (truncated), marks run `status='completed'`, sets `completed_at`, and releases the site lock.
+- Result: ready_for_review count and content IDs stored in `stage_7_result`.
+
+## Execution Flow
+- Celery task `run_automation_task` instantiates `AutomationService.from_run_id` and calls stages 1→7 sequentially.
+- Stage transitions update `AutomationRun.current_stage`; between-stage delays applied via `between_stage_delay`.
+- Resume path (`resume_automation_task`) starts from the recorded `current_stage` and continues through remaining stages.
+
+## Cross-Module Interactions
+- Planner: Stage 1/2 use `Keywords`/`Clusters`; Stage 3 converts `ContentIdeas` into `Tasks`.
+- Writer: Stages 4–6 create `Content` and `Images` and move content toward review.
+- AI engine and functions are invoked in Stages 1, 2, 4, 5; Stage 6 uses the AI image queue.
+- Billing: Credits are consumed by AI calls; automation records deltas per stage from AI task logs.
+
+## State Transitions
+- `AutomationRun.status` moves to `completed` at Stage 7; can be set to `failed` on exceptions or `cancelled` via API; `paused` can be set mid-run and resumed.
+- `current_stage` increments after each successful stage; partial stage results include a `partial` flag and stop reason.
+- Domain models change status along the pipeline (`Keywords` → clusters, `Clusters` → ideas, `ContentIdeas` → queued/tasks, `Tasks` → completed/content, `Content` → draft/review, `Images` → generated).
+
+## Error Handling
+- Each stage logs errors and continues to next batch/item; pause/cancel checks short-circuit with partial results saved.
+- Task wait helper tolerates Celery backend errors; can continue on error when flagged.
+- Stage start may be skipped with explicit skip reason (e.g., insufficient keywords).
+
+## Tenancy Rules
+- All queries filter by `site` (and implicit account via tenancy bases); account/site set on created `Tasks` and inherited on `Images` and other records through model save hooks.
+- Locks and runs are per site; API scoping requires the authenticated user’s account to own the site.
+
+## Billing Rules
+- Start requires sufficient credits (1.2× estimate). Credits used are inferred from AI task log counts per stage; actual deductions occur in AI/billing services invoked by the AI functions.
+
+## Background Tasks / Schedulers
+- Entire stage chain runs inside Celery workers; within-stage sleeps respect config delays; between-stage sleeps applied after each stage.
+
+## Key Design Considerations
+- Idempotent, resume-capable progression with partial state persisted in `AutomationRun`.
+- Configurable batch sizes/delays mitigate rate limits and manage credit burn.
+- Continue-on-error semantics prevent single failures from stopping the pipeline while still recording issues.
+
+## How Developers Should Work With This Module
+- When modifying stages, keep pause/cancel checks, stage result recording, and credit delta calculation.
+- Add new AI stages by wiring through `AIEngine.execute` and the task wait helper; ensure queries are site-scoped and statuses updated.
+- For new items types, add pending queries and status transitions consistent with existing patterns.
--- a/docs/10-BACKEND/automation/SCHEDULER.md
+++ b/docs/10-BACKEND/automation/SCHEDULER.md
@@ -0,0 +1,75 @@
+# Automation Scheduler
+
+## Purpose
+Describe how scheduled runs are detected, triggered, and resumed using Celery tasks and automation configs.
+
+## Code Locations (exact paths)
+- Celery tasks: `backend/igny8_core/business/automation/tasks.py`
+- Models: `backend/igny8_core/business/automation/models.py`
+- Service invoked: `backend/igny8_core/business/automation/services/automation_service.py`
+
+## High-Level Responsibilities
+- Periodically scan enabled automation configs to start scheduled runs.
+- Prevent overlapping runs per site via cache locks and active run checks.
+- Resume paused runs from their recorded stage.
+
+## Detailed Behavior
+- `check_scheduled_automations` (Celery, hourly):
+  - Iterates `AutomationConfig` with `is_enabled=True`.
+  - Frequency rules:
+    - `daily`: run when current hour matches `scheduled_time.hour`.
+    - `weekly`: run Mondays at the scheduled hour.
+    - `monthly`: run on the 1st of the month at the scheduled hour.
+  - Skips if `last_run_at` is within ~23 hours or if an `AutomationRun` with `status='running'` exists for the site.
+  - On trigger: instantiates `AutomationService(account, site)`, calls `start_automation(trigger_type='scheduled')`, updates `last_run_at` and `next_run_at` (via `_calculate_next_run`), saves config, and enqueues `run_automation_task.delay(run_id)`.
+  - Exceptions are logged per site; lock release is handled by the service on failure paths.
+- `run_automation_task`:
+  - Loads service via `from_run_id`, runs stages 1–7 sequentially.
+  - On exception: marks run failed, records error/completed_at, and deletes site lock.
+- `resume_automation_task` / alias `continue_automation_task`:
+  - Loads service via `from_run_id`, uses `current_stage` to continue remaining stages.
+  - On exception: marks run failed, records error/completed_at.
+- `_calculate_next_run`:
+  - Computes next run datetime based on frequency and `scheduled_time`, resetting seconds/microseconds; handles month rollover for monthly frequency.
+
+## Data Structures / Models Involved (no code)
+- `AutomationConfig`: contains schedule fields (`frequency`, `scheduled_time`, `last_run_at`, `next_run_at`, `is_enabled`).
+- `AutomationRun`: records run status/stage used during resume/failure handling.
+
+## Execution Flow
+1) Celery beat (or cron) invokes `check_scheduled_automations` hourly.  
+2) Eligible configs spawn new runs via `AutomationService.start_automation` (includes lock + credit check).  
+3) `run_automation_task` executes the pipeline asynchronously.  
+4) Paused runs can be resumed by enqueueing `resume_automation_task`/`continue_automation_task`, which restart at `current_stage`.  
+5) Failures set run status to `failed` and release locks.
+
+## Cross-Module Interactions
+- Uses planner/writer data inside the pipeline (see pipeline doc); billing/credits enforced at start.
+- Locking is done via Django cache, independent of other modules but prevents concurrent Celery runs per site.
+
+## State Transitions
+- Config timestamps (`last_run_at`, `next_run_at`) update on scheduled launch.
+- Run status changes to `failed` on task exceptions; to `completed` at stage 7; to `paused/cancelled` via API.
+
+## Error Handling
+- Scheduled start is skipped with log messages if recently run or already running.
+- Exceptions during run execution mark the run failed, record error message, set `completed_at`, and release the cache lock.
+
+## Tenancy Rules
+- Configs and runs are site- and account-scoped; scheduler uses stored account/site from the config; no cross-tenant scheduling.
+
+## Billing Rules
+- Start uses `AutomationService.start_automation`, which enforces credit sufficiency before scheduling the Celery execution.
+
+## Background Tasks / Schedulers
+- Hourly `check_scheduled_automations` plus the long-running `run_automation_task` and resume tasks run in Celery workers.
+
+## Key Design Considerations
+- Hourly scan with coarse matching keeps implementation simple while honoring per-site schedules.
+- Cache lock and active-run checks prevent double-starts from overlapping schedules or manual triggers.
+- Resume task reuses the same stage methods to keep behavior consistent between fresh and resumed runs.
+
+## How Developers Should Work With This Module
+- When adding new frequencies, extend `check_scheduled_automations` and `_calculate_next_run` consistently.
+- Ensure Celery beat (or an equivalent scheduler) runs `check_scheduled_automations` hourly in production.
+- Preserve lock acquisition and failure handling when modifying task flows to avoid orphaned locks.