igny8/docs/ai-docs/AI_MASTER_ARCHITECTURE.md

# AI Master Architecture Document
## Clustering, Idea Generation, and Content Generation

**Version:** 1.0
**Date:** 2025-01-XX
**Scope:** Complete architecture for 3 verified AI functions (clustering, idea generation, content generation)

---

## Table of Contents

1. [Common Architecture](#1-common-architecture)
2. [Auto Cluster Keywords](#2-auto-cluster-keywords)
3. [Generate Ideas](#3-generate-ideas)
4. [Generate Content](#4-generate-content)

---

## 1. Common Architecture

### 1.1 Core Framework Files

#### Entry Point
- **File:** `backend/igny8_core/ai/tasks.py`
- **Function:** `run_ai_task`
- **Purpose:** Unified Celery task entrypoint for all AI functions
- **Parameters:** `function_name` (str), `payload` (dict), `account_id` (int)
- **Flow:** Loads function from registry → Creates AIEngine → Executes function

#### Engine Orchestrator
- **File:** `backend/igny8_core/ai/engine.py`
- **Class:** `AIEngine`
- **Purpose:** Central orchestrator managing lifecycle, progress, logging, cost tracking
- **Methods:**
  - `execute` - Main execution pipeline (6 phases: INIT, PREP, AI_CALL, PARSE, SAVE, DONE)
  - `_handle_error` - Centralized error handling
  - `_log_to_database` - Logs to AITaskLog model
  - Helper methods: `_get_input_description`, `_build_validation_message`, `_get_prep_message`, `_get_ai_call_message`, `_get_parse_message`, `_get_parse_message_with_count`, `_get_save_message`, `_calculate_credits_for_clustering`

#### Base Function Class
- **File:** `backend/igny8_core/ai/base.py`
- **Class:** `BaseAIFunction`
- **Purpose:** Abstract base class defining interface for all AI functions
- **Abstract Methods:**
  - `get_name` - Returns function name (e.g., 'auto_cluster')
  - `prepare` - Loads and prepares data
  - `build_prompt` - Builds AI prompt
  - `parse_response` - Parses AI response
  - `save_output` - Saves results to database
- **Optional Methods:**
  - `get_metadata` - Returns display name, description, phases
  - `get_max_items` - Returns max items limit (or None)
  - `validate` - Validates input payload (default: checks for 'ids')
  - `get_model` - Returns model override (default: None, uses account default)

#### Function Registry
- **File:** `backend/igny8_core/ai/registry.py`
- **Functions:**
  - `register_function` - Registers function class
  - `register_lazy_function` - Registers lazy loader
  - `get_function` - Gets function class by name (lazy loads if needed)
  - `get_function_instance` - Gets function instance by name
  - `list_functions` - Lists all registered functions
- **Lazy Loaders:**
  - `_load_auto_cluster` - Loads AutoClusterFunction
  - `_load_generate_ideas` - Loads GenerateIdeasFunction
  - `_load_generate_content` - Loads GenerateContentFunction

#### AI Core Handler
- **File:** `backend/igny8_core/ai/ai_core.py`
- **Class:** `AICore`
- **Purpose:** Centralized AI request handler for all text generation
- **Methods:**
  - `run_ai_request` - Makes API call to OpenAI/Runware
  - `extract_json` - Extracts JSON from response (handles markdown code blocks)

#### Prompt Registry
- **File:** `backend/igny8_core/ai/prompts.py`
- **Class:** `PromptRegistry`
- **Purpose:** Centralized prompt management with hierarchical resolution
- **Method:** `get_prompt` - Gets prompt with resolution order:
  1. Task-level prompt_override (if exists)
  2. DB prompt for (account, function)
  3. Default fallback from DEFAULT_PROMPTS registry
- **Prompt Types:**
  - `clustering` - For auto_cluster function
  - `ideas` - For generate_ideas function
  - `content_generation` - For generate_content function
- **Context Placeholders:**
  - `[IGNY8_KEYWORDS]` - Replaced with keyword list
  - `[IGNY8_CLUSTERS]` - Replaced with cluster list
  - `[IGNY8_CLUSTER_KEYWORDS]` - Replaced with cluster keywords
  - `[IGNY8_IDEA]` - Replaced with idea data
  - `[IGNY8_CLUSTER]` - Replaced with cluster data
  - `[IGNY8_KEYWORDS]` - Replaced with keywords (for content)

#### Model Settings
- **File:** `backend/igny8_core/ai/settings.py`
- **Constants:**
  - `MODEL_CONFIG` - Model configurations per function (model, max_tokens, temperature, response_format)
  - `FUNCTION_ALIASES` - Legacy function name mappings
- **Functions:**
  - `get_model_config` - Gets model config for function (reads from IntegrationSettings if account provided)
  - `get_model` - Gets model name for function
  - `get_max_tokens` - Gets max tokens for function
  - `get_temperature` - Gets temperature for function

#### Validators
- **File:** `backend/igny8_core/ai/validators.py`
- **Functions:**
  - `validate_ids` - Validates 'ids' array in payload
  - `validate_keywords_exist` - Validates keywords exist in database
  - `validate_cluster_exists` - Validates cluster exists
  - `validate_tasks_exist` - Validates tasks exist
  - `validate_cluster_limits` - Validates plan limits (currently disabled - always returns valid)
  - `validate_api_key` - Validates API key is configured
  - `validate_model` - Validates model is in supported list
  - `validate_image_size` - Validates image size for model

#### Progress Tracking
- **File:** `backend/igny8_core/ai/tracker.py`
- **Classes:**
  - `StepTracker` - Tracks request/response steps
  - `ProgressTracker` - Tracks Celery progress updates
  - `CostTracker` - Tracks API costs and tokens
  - `ConsoleStepTracker` - Console-based step logging

#### Database Logging
- **File:** `backend/igny8_core/ai/models.py`
- **Model:** `AITaskLog`
- **Fields:** `task_id`, `function_name`, `account`, `phase`, `message`, `status`, `duration`, `cost`, `tokens`, `request_steps`, `response_steps`, `error`, `payload`, `result`

### 1.2 Execution Flow (All Functions)

```
1. API Endpoint (views.py)
   ↓
2. run_ai_task (tasks.py)
   - Gets account from account_id
   - Gets function instance from registry
   - Creates AIEngine
   ↓
3. AIEngine.execute (engine.py)
   Phase 1: INIT (0-10%)
   - Calls function.validate()
   - Updates progress tracker
   ↓
   Phase 2: PREP (10-25%)
   - Calls function.prepare()
   - Calls function.build_prompt()
   - Updates progress tracker
   ↓
   Phase 3: AI_CALL (25-70%)
   - Gets model config from settings
   - Calls AICore.run_ai_request()
   - Tracks cost and tokens
   - Updates progress tracker
   ↓
   Phase 4: PARSE (70-85%)
   - Calls function.parse_response()
   - Updates progress tracker
   ↓
   Phase 5: SAVE (85-98%)
   - Calls function.save_output()
   - Logs credit usage
   - Updates progress tracker
   ↓
   Phase 6: DONE (98-100%)
   - Logs to AITaskLog
   - Returns result
```

---

## 2. Auto Cluster Keywords

### 2.1 Function Implementation

- **File:** `backend/igny8_core/ai/functions/auto_cluster.py`
- **Class:** `AutoClusterFunction`
- **Inherits:** `BaseAIFunction`

### 2.2 API Endpoint

- **File:** `backend/igny8_core/modules/planner/views.py`
- **ViewSet:** `KeywordViewSet`
- **Action:** `auto_cluster`
- **Method:** POST
- **URL Path:** `/v1/planner/keywords/auto_cluster/`
- **Payload:**
  - `ids` (list[int]) - Keyword IDs to cluster
  - `sector_id` (int, optional) - Sector ID for filtering
- **Response:**
  - `success` (bool)
  - `task_id` (str) - Celery task ID if async
  - `clusters_created` (int) - Number of clusters created
  - `keywords_updated` (int) - Number of keywords updated
  - `message` (str)

### 2.3 Function Methods

#### `get_name()`
- **Returns:** `'auto_cluster'`

#### `get_metadata()`
- **Returns:** Dict with `display_name`, `description`, `phases` (INIT, PREP, AI_CALL, PARSE, SAVE, DONE)

#### `get_max_items()`
- **Returns:** `None` (no limit)

#### `validate(payload, account)`
- **Validates:**
  - Calls `validate_ids` to check for 'ids' array
  - Calls `validate_keywords_exist` to verify keywords exist
- **Returns:** Dict with `valid` (bool) and optional `error` (str)

#### `prepare(payload, account)`
- **Loads:**
  - Keywords from database (filters by `ids`, `account`, optional `sector_id`)
  - Uses `select_related` for: `account`, `site`, `site__account`, `sector`, `sector__site`
- **Returns:** Dict with:
  - `keywords` (list[Keyword objects])
  - `keyword_data` (list[dict]) - Formatted data with: `id`, `keyword`, `volume`, `difficulty`, `intent`
  - `sector_id` (int, optional)

#### `build_prompt(data, account)`
- **Gets Prompt:**
  - Calls `PromptRegistry.get_prompt(function_name='auto_cluster', account, context)`
  - Context includes: `KEYWORDS` (formatted keyword list), optional `SECTOR` (sector name)
- **Formatting:**
  - Formats keywords as: `"- {keyword} (Volume: {volume}, Difficulty: {difficulty}, Intent: {intent})"`
  - Replaces `[IGNY8_KEYWORDS]` placeholder
  - Adds JSON mode instruction if not present
- **Returns:** Prompt string

#### `parse_response(response, step_tracker)`
- **Parsing:**
  - Tries direct JSON parse first
  - Falls back to `AICore.extract_json()` if needed (handles markdown code blocks)
- **Extraction:**
  - Extracts `clusters` array from JSON
  - Handles both dict with 'clusters' key and direct array
- **Returns:** List[Dict] with cluster data:
  - `name` (str) - Cluster name
  - `description` (str) - Cluster description
  - `keywords` (list[str]) - List of keyword strings

#### `save_output(parsed, original_data, account, progress_tracker, step_tracker)`
- **Input:**
  - `parsed` - List of cluster dicts from parse_response
  - `original_data` - Dict from prepare() with `keywords` and `sector_id`
- **Process:**
  - Gets account, site, sector from first keyword
  - For each cluster in parsed:
    - Gets or creates `Clusters` record:
      - Fields: `name`, `description`, `account`, `site`, `sector`, `status='active'`
      - Uses `get_or_create` with name + account + site + sector
    - Matches keywords (case-insensitive):
      - Normalizes cluster keywords and available keywords to lowercase
      - Updates matched `Keywords` records:
        - Sets `cluster` foreign key
        - Sets `status='mapped'`
  - Recalculates cluster metrics:
    - `keywords_count` - Count of keywords in cluster
    - `volume` - Sum of keyword volumes (uses `volume_override` if available, else `seed_keyword__volume`)
- **Returns:** Dict with:
  - `count` (int) - Clusters created
  - `clusters_created` (int) - Clusters created
  - `keywords_updated` (int) - Keywords updated

### 2.4 Database Models

#### Keywords Model
- **File:** `backend/igny8_core/modules/planner/models.py`
- **Model:** `Keywords`
- **Fields Used:**
  - `id` - Keyword ID
  - `seed_keyword` (ForeignKey) - Reference to SeedKeyword
  - `keyword` (property) - Gets keyword text from seed_keyword
  - `volume` (property) - Gets volume from volume_override or seed_keyword
  - `difficulty` (property) - Gets difficulty from difficulty_override or seed_keyword
  - `intent` (property) - Gets intent from seed_keyword
  - `cluster` (ForeignKey) - Assigned cluster
  - `status` - Status ('active', 'pending', 'mapped', 'archived')
  - `account`, `site`, `sector` - From SiteSectorBaseModel

#### Clusters Model
- **File:** `backend/igny8_core/modules/planner/models.py`
- **Model:** `Clusters`
- **Fields Used:**
  - `name` - Cluster name (unique)
  - `description` - Cluster description
  - `keywords_count` - Count of keywords (recalculated)
  - `volume` - Sum of keyword volumes (recalculated)
  - `status` - Status ('active')
  - `account`, `site`, `sector` - From SiteSectorBaseModel

### 2.5 AI Response Format

**Expected JSON:**
```json
{
  "clusters": [
    {
      "name": "Cluster Name",
      "description": "Cluster description",
      "keywords": ["keyword1", "keyword2", "keyword3"]
    }
  ]
}
```

### 2.6 Progress Messages

- **INIT:** "Validating {keyword1}, {keyword2}, {keyword3} and {X} more keywords" (shows first 3, then count)
- **PREP:** "Loading {count} keyword(s)"
- **AI_CALL:** "Generating clusters with Igny8 Semantic SEO Model"
- **PARSE:** "{count} cluster(s) created"
- **SAVE:** "Saving {count} cluster(s)"

---

## 3. Generate Ideas

### 3.1 Function Implementation

- **File:** `backend/igny8_core/ai/functions/generate_ideas.py`
- **Class:** `GenerateIdeasFunction`
- **Inherits:** `BaseAIFunction`

### 3.2 API Endpoint

- **File:** `backend/igny8_core/modules/planner/views.py`
- **ViewSet:** `ClusterViewSet`
- **Action:** `auto_generate_ideas`
- **Method:** POST
- **URL Path:** `/v1/planner/clusters/auto_generate_ideas/`
- **Payload:**
  - `ids` (list[int]) - Cluster IDs (max 10)
- **Response:**
  - `success` (bool)
  - `task_id` (str) - Celery task ID if async
  - `ideas_created` (int) - Number of ideas created
  - `message` (str)

### 3.3 Function Methods

#### `get_name()`
- **Returns:** `'generate_ideas'`

#### `get_metadata()`
- **Returns:** Dict with `display_name`, `description`, `phases` (INIT, PREP, AI_CALL, PARSE, SAVE, DONE)

#### `get_max_items()`
- **Returns:** `10` (max clusters per generation)

#### `validate(payload, account)`
- **Validates:**
  - Calls `super().validate()` to check for 'ids' array and max_items limit
  - Calls `validate_cluster_exists` for first cluster ID
  - Calls `validate_cluster_limits` for plan limits (currently disabled)
- **Returns:** Dict with `valid` (bool) and optional `error` (str)

#### `prepare(payload, account)`
- **Loads:**
  - Clusters from database (filters by `ids`, `account`)
  - Uses `select_related` for: `sector`, `account`, `site`, `sector__site`
  - Uses `prefetch_related` for: `keywords`
- **Gets Keywords:**
  - For each cluster, loads `Keywords` with `select_related('seed_keyword')`
  - Extracts keyword text from `seed_keyword.keyword`
- **Returns:** Dict with:
  - `clusters` (list[Cluster objects])
  - `cluster_data` (list[dict]) - Formatted data with: `id`, `name`, `description`, `keywords` (list[str])
  - `account` (Account object)

#### `build_prompt(data, account)`
- **Gets Prompt:**
  - Calls `PromptRegistry.get_prompt(function_name='generate_ideas', account, context)`
  - Context includes:
    - `CLUSTERS` - Formatted cluster list: `"Cluster ID: {id} | Name: {name} | Description: {description}"`
    - `CLUSTER_KEYWORDS` - Formatted cluster keywords: `"Cluster ID: {id} | Name: {name} | Keywords: {keyword1}, {keyword2}"`
- **Replaces Placeholders:**
  - `[IGNY8_CLUSTERS]` → clusters_text
  - `[IGNY8_CLUSTER_KEYWORDS]` → cluster_keywords_text
- **Returns:** Prompt string

#### `parse_response(response, step_tracker)`
- **Parsing:**
  - Calls `AICore.extract_json()` to extract JSON from response
  - Validates 'ideas' key exists in JSON
- **Returns:** List[Dict] with idea data:
  - `title` (str) - Idea title
  - `description` (str or dict) - Idea description (can be JSON string)
  - `content_type` (str) - Content type ('blog_post', 'article', etc.)
  - `content_structure` (str) - Content structure ('cluster_hub', 'supporting_page', etc.)
  - `cluster_id` (int, optional) - Cluster ID reference
  - `cluster_name` (str, optional) - Cluster name reference
  - `estimated_word_count` (int) - Estimated word count
  - `covered_keywords` or `target_keywords` (str) - Target keywords

#### `save_output(parsed, original_data, account, progress_tracker, step_tracker)`
- **Input:**
  - `parsed` - List of idea dicts from parse_response
  - `original_data` - Dict from prepare() with `clusters` and `cluster_data`
- **Process:**
  - For each idea in parsed:
    - Matches cluster:
      - First tries by `cluster_id` from AI response
      - Falls back to `cluster_name` matching
      - Last resort: position-based matching (first idea → first cluster)
    - Gets site from cluster (or cluster.sector.site)
    - Handles description:
      - If dict, converts to JSON string
      - If not string, converts to string
    - Creates `ContentIdeas` record:
      - Fields:
        - `idea_title` - From `title`
        - `description` - Processed description
        - `content_type` - From `content_type` (default: 'blog_post')
        - `content_structure` - From `content_structure` (default: 'supporting_page')
        - `target_keywords` - From `covered_keywords` or `target_keywords`
        - `keyword_cluster` - Matched cluster
        - `estimated_word_count` - From `estimated_word_count` (default: 1500)
        - `status` - 'new'
        - `account`, `site`, `sector` - From cluster
- **Returns:** Dict with:
  - `count` (int) - Ideas created
  - `ideas_created` (int) - Ideas created

### 3.4 Database Models

#### Clusters Model
- **File:** `backend/igny8_core/modules/planner/models.py`
- **Model:** `Clusters`
- **Fields Used:**
  - `id` - Cluster ID
  - `name` - Cluster name
  - `description` - Cluster description
  - `keywords` (related_name) - Related Keywords
  - `account`, `site`, `sector` - From SiteSectorBaseModel

#### ContentIdeas Model
- **File:** `backend/igny8_core/modules/planner/models.py`
- **Model:** `ContentIdeas`
- **Fields Used:**
  - `idea_title` - Idea title
  - `description` - Idea description (can be JSON string)
  - `content_type` - Content type ('blog_post', 'article', 'guide', 'tutorial')
  - `content_structure` - Content structure ('cluster_hub', 'landing_page', 'pillar_page', 'supporting_page')
  - `target_keywords` - Target keywords string
  - `keyword_cluster` (ForeignKey) - Related cluster
  - `estimated_word_count` - Estimated word count
  - `status` - Status ('new', 'scheduled', 'published')
  - `account`, `site`, `sector` - From SiteSectorBaseModel

### 3.5 AI Response Format

**Expected JSON:**
```json
{
  "ideas": [
    {
      "title": "Idea Title",
      "description": "Idea description or JSON structure",
      "content_type": "blog_post",
      "content_structure": "supporting_page",
      "cluster_id": 1,
      "cluster_name": "Cluster Name",
      "estimated_word_count": 1500,
      "covered_keywords": "keyword1, keyword2"
    }
  ]
}
```

### 3.6 Progress Messages

- **INIT:** "Verifying cluster integrity"
- **PREP:** "Loading cluster keywords"
- **AI_CALL:** "Generating ideas with Igny8 Semantic AI"
- **PARSE:** "{count} high-opportunity idea(s) generated"
- **SAVE:** "Content Outline for Ideas generated"

---

## 4. Generate Content

### 4.1 Function Implementation

- **File:** `backend/igny8_core/ai/functions/generate_content.py`
- **Class:** `GenerateContentFunction`
- **Inherits:** `BaseAIFunction`

### 4.2 API Endpoint

- **File:** `backend/igny8_core/modules/writer/views.py`
- **ViewSet:** `TasksViewSet`
- **Action:** `auto_generate_content`
- **Method:** POST
- **URL Path:** `/v1/writer/tasks/auto_generate_content/`
- **Payload:**
  - `ids` (list[int]) - Task IDs (max 10)
- **Response:**
  - `success` (bool)
  - `task_id` (str) - Celery task ID if async
  - `tasks_updated` (int) - Number of tasks updated
  - `message` (str)

### 4.3 Function Methods

#### `get_name()`
- **Returns:** `'generate_content'`

#### `get_metadata()`
- **Returns:** Dict with `display_name`, `description`, `phases` (INIT, PREP, AI_CALL, PARSE, SAVE, DONE)

#### `get_max_items()`
- **Returns:** `50` (max tasks per batch)

#### `validate(payload, account)`
- **Validates:**
  - Calls `super().validate()` to check for 'ids' array and max_items limit
  - Calls `validate_tasks_exist` to verify tasks exist
- **Returns:** Dict with `valid` (bool) and optional `error` (str)

#### `prepare(payload, account)`
- **Loads:**
  - Tasks from database (filters by `ids`, `account`)
  - Uses `select_related` for: `account`, `site`, `sector`, `cluster`, `idea`
- **Returns:** List[Task objects]

#### `build_prompt(data, account)`
- **Input:** Can be single Task or list[Task] (handles first task if list)
- **Builds Idea Data:**
  - `title` - From task.title
  - `description` - From task.description
  - `outline` - From task.idea.description (handles JSON structure):
    - If JSON, formats as: `"## {H2 heading}\n### {H3 subheading}\nContent Type: {type}\nDetails: {details}"`
    - If plain text, uses as-is
  - `structure` - From task.idea.content_structure or task.content_structure
  - `type` - From task.idea.content_type or task.content_type
  - `estimated_word_count` - From task.idea.estimated_word_count
- **Builds Cluster Data:**
  - `cluster_name` - From task.cluster.name
  - `description` - From task.cluster.description
  - `status` - From task.cluster.status
- **Builds Keywords Data:**
  - From task.keywords (legacy) or task.idea.target_keywords
- **Gets Prompt:**
  - Calls `PromptRegistry.get_prompt(function_name='generate_content', account, task, context)`
  - Context includes:
    - `IDEA` - Formatted idea data string
    - `CLUSTER` - Formatted cluster data string
    - `KEYWORDS` - Keywords string
- **Returns:** Prompt string

#### `parse_response(response, step_tracker)`
- **Parsing:**
  - First tries JSON parse:
    - If successful and dict, returns dict
  - Falls back to plain text:
    - Calls `normalize_content()` from `content_normalizer` to convert to HTML
    - Returns dict with `content` field
- **Returns:** Dict with:
  - **If JSON:**
    - `content` (str) - HTML content
    - `title` (str, optional) - Content title
    - `meta_title` (str, optional) - Meta title
    - `meta_description` (str, optional) - Meta description
    - `word_count` (int, optional) - Word count
    - `primary_keyword` (str, optional) - Primary keyword
    - `secondary_keywords` (list, optional) - Secondary keywords
    - `tags` (list, optional) - Tags
    - `categories` (list, optional) - Categories
  - **If Plain Text:**
    - `content` (str) - Normalized HTML content

#### `save_output(parsed, original_data, account, progress_tracker, step_tracker)`
- **Input:**
  - `parsed` - Dict from parse_response
  - `original_data` - Task object or list[Task] (handles first task if list)
- **Process:**
  - Extracts content fields from parsed dict:
    - `content_html` - From `content` field
    - `title` - From `title` or task.title
    - `meta_title` - From `meta_title` or task.meta_title or task.title
    - `meta_description` - From `meta_description` or task.meta_description or task.description
    - `word_count` - From `word_count` or calculated from content
    - `primary_keyword` - From `primary_keyword`
    - `secondary_keywords` - From `secondary_keywords` (converts to list if needed)
    - `tags` - From `tags` (converts to list if needed)
    - `categories` - From `categories` (converts to list if needed)
  - Calculates word count if not provided:
    - Strips HTML tags and counts words
  - Gets or creates `Content` record:
    - Uses `get_or_create` with `task` (OneToOne relationship)
    - Defaults: `html_content`, `word_count`, `status='draft'`, `account`, `site`, `sector`
  - Updates `Content` fields:
    - `html_content` - Content HTML
    - `word_count` - Word count
    - `title` - Content title
    - `meta_title` - Meta title
    - `meta_description` - Meta description
    - `primary_keyword` - Primary keyword
    - `secondary_keywords` - Secondary keywords (JSONField)
    - `tags` - Tags (JSONField)
    - `categories` - Categories (JSONField)
    - `status` - Always 'draft' for newly generated content
    - `metadata` - Extra fields from parsed dict (excludes standard fields)
    - `account`, `site`, `sector`, `task` - Aligned from task
  - Updates `Tasks` record:
    - Sets `status='completed'`
    - Updates `updated_at`
- **Returns:** Dict with:
  - `count` (int) - Tasks updated (always 1 per task)
  - `tasks_updated` (int) - Tasks updated
  - `word_count` (int) - Word count

### 4.4 Database Models

#### Tasks Model
- **File:** `backend/igny8_core/modules/writer/models.py`
- **Model:** `Tasks`
- **Fields Used:**
  - `id` - Task ID
  - `title` - Task title
  - `description` - Task description
  - `keywords` - Keywords string (legacy)
  - `cluster` (ForeignKey) - Related cluster
  - `idea` (ForeignKey) - Related ContentIdeas
  - `content_structure` - Content structure
  - `content_type` - Content type
  - `status` - Status ('queued', 'completed')
  - `meta_title` - Meta title
  - `meta_description` - Meta description
  - `account`, `site`, `sector` - From SiteSectorBaseModel

#### Content Model
- **File:** `backend/igny8_core/modules/writer/models.py`
- **Model:** `Content`
- **Fields Used:**
  - `task` (OneToOneField) - Related task
  - `html_content` - HTML content
  - `word_count` - Word count
  - `title` - Content title
  - `meta_title` - Meta title
  - `meta_description` - Meta description
  - `primary_keyword` - Primary keyword
  - `secondary_keywords` (JSONField) - Secondary keywords list
  - `tags` (JSONField) - Tags list
  - `categories` (JSONField) - Categories list
  - `status` - Status ('draft', 'review', 'published')
  - `metadata` (JSONField) - Additional metadata
  - `account`, `site`, `sector` - From SiteSectorBaseModel (auto-set from task)

### 4.5 AI Response Format

**Expected JSON:**
```json
{
  "content": "<html>Content HTML</html>",
  "title": "Content Title",
  "meta_title": "Meta Title",
  "meta_description": "Meta description",
  "word_count": 1500,
  "primary_keyword": "primary keyword",
  "secondary_keywords": ["keyword1", "keyword2"],
  "tags": ["tag1", "tag2"],
  "categories": ["category1"]
}
```

**Or Plain Text:**
```
Plain text content that will be normalized to HTML
```

### 4.6 Progress Messages

- **INIT:** "Validating task"
- **PREP:** "Preparing content idea"
- **AI_CALL:** "Writing article with Igny8 Semantic AI"
- **PARSE:** "{count} article(s) created"
- **SAVE:** "Saving article"

---

## 5. Change Guide

### 5.1 Where to Change Validation Logic

- **File:** `backend/igny8_core/ai/validators.py`
- **Functions:** `validate_ids`, `validate_keywords_exist`, `validate_cluster_exists`, `validate_tasks_exist`
- **Or:** Override `validate()` method in function class

### 5.2 Where to Change Data Loading

- **File:** Function-specific file (e.g., `auto_cluster.py`)
- **Method:** `prepare()`
- **Change:** Modify queryset filters, select_related, prefetch_related

### 5.3 Where to Change Prompts

- **File:** `backend/igny8_core/ai/prompts.py`
- **Method:** `PromptRegistry.get_prompt()`
- **Change:** Modify `DEFAULT_PROMPTS` dict or update database prompts

### 5.4 Where to Change Model Configuration

- **File:** `backend/igny8_core/ai/settings.py`
- **Constant:** `MODEL_CONFIG`
- **Change:** Update model, max_tokens, temperature, response_format per function

### 5.5 Where to Change Response Parsing

- **File:** Function-specific file (e.g., `generate_content.py`)
- **Method:** `parse_response()`
- **Change:** Modify JSON extraction or plain text handling

### 5.6 Where to Change Database Saving

- **File:** Function-specific file (e.g., `auto_cluster.py`)
- **Method:** `save_output()`
- **Change:** Modify model creation/update logic, field mappings

### 5.7 Where to Change Progress Messages

- **File:** `backend/igny8_core/ai/engine.py`
- **Methods:** `_get_prep_message()`, `_get_ai_call_message()`, `_get_parse_message()`, `_get_save_message()`
- **Or:** Override in function class `get_metadata()` phases

### 5.8 Where to Change Error Handling

- **File:** `backend/igny8_core/ai/engine.py`
- **Method:** `_handle_error()`
- **Change:** Modify error logging, error response format

---

## 6. Dependencies

### 6.1 Function Dependencies

- All functions depend on: `BaseAIFunction`, `AICore`, `PromptRegistry`, `get_model_config`
- Clustering depends on: `Keywords`, `Clusters` models
- Ideas depends on: `Clusters`, `ContentIdeas`, `Keywords` models
- Content depends on: `Tasks`, `Content`, `ContentIdeas`, `Clusters` models

### 6.2 External Dependencies

- **Celery:** For async task execution (`run_ai_task`)
- **OpenAI API:** For AI text generation (via `AICore.run_ai_request`)
- **Django ORM:** For database operations
- **IntegrationSettings:** For account-specific model configuration

---

## 7. Key Relationships

### 7.1 Clustering Flow
```
Keywords → Clusters (many-to-one)
- Keywords.cluster (ForeignKey)
- Clusters.keywords (related_name)
```

### 7.2 Ideas Flow
```
Clusters → ContentIdeas (one-to-many)
- ContentIdeas.keyword_cluster (ForeignKey)
- Clusters.ideas (related_name, if exists)
```

### 7.3 Content Flow
```
Tasks → Content (one-to-one)
- Content.task (OneToOneField)
- Tasks.content_record (related_name)

Tasks → ContentIdeas (many-to-one)
- Tasks.idea (ForeignKey)
- ContentIdeas.tasks (related_name)

Tasks → Clusters (many-to-one)
- Tasks.cluster (ForeignKey)
- Clusters.tasks (related_name)
```

---

## 8. Notes

- All functions use the same execution pipeline through `AIEngine.execute()`
- Progress tracking is handled automatically by `AIEngine`
- Cost tracking is handled automatically by `CostTracker`
- Database logging is handled automatically by `AITaskLog`
- Model configuration can be overridden per account via `IntegrationSettings`
- Prompts can be overridden per account via database prompts
- All functions support both async (Celery) and sync execution
- Error handling is centralized in `AIEngine._handle_error()`