Files
igny8/docs/ai-docs/AI_MASTER_ARCHITECTURE.md
2025-11-11 21:16:37 +05:00

803 lines
28 KiB
Markdown

# AI Master Architecture Document
## Clustering, Idea Generation, and Content Generation
**Version:** 1.0
**Date:** 2025-01-XX
**Scope:** Complete architecture for 3 verified AI functions (clustering, idea generation, content generation)
---
## Table of Contents
1. [Common Architecture](#1-common-architecture)
2. [Auto Cluster Keywords](#2-auto-cluster-keywords)
3. [Generate Ideas](#3-generate-ideas)
4. [Generate Content](#4-generate-content)
---
## 1. Common Architecture
### 1.1 Core Framework Files
#### Entry Point
- **File:** `backend/igny8_core/ai/tasks.py`
- **Function:** `run_ai_task`
- **Purpose:** Unified Celery task entrypoint for all AI functions
- **Parameters:** `function_name` (str), `payload` (dict), `account_id` (int)
- **Flow:** Loads function from registry → Creates AIEngine → Executes function
#### Engine Orchestrator
- **File:** `backend/igny8_core/ai/engine.py`
- **Class:** `AIEngine`
- **Purpose:** Central orchestrator managing lifecycle, progress, logging, cost tracking
- **Methods:**
- `execute` - Main execution pipeline (6 phases: INIT, PREP, AI_CALL, PARSE, SAVE, DONE)
- `_handle_error` - Centralized error handling
- `_log_to_database` - Logs to AITaskLog model
- Helper methods: `_get_input_description`, `_build_validation_message`, `_get_prep_message`, `_get_ai_call_message`, `_get_parse_message`, `_get_parse_message_with_count`, `_get_save_message`, `_calculate_credits_for_clustering`
#### Base Function Class
- **File:** `backend/igny8_core/ai/base.py`
- **Class:** `BaseAIFunction`
- **Purpose:** Abstract base class defining interface for all AI functions
- **Abstract Methods:**
- `get_name` - Returns function name (e.g., 'auto_cluster')
- `prepare` - Loads and prepares data
- `build_prompt` - Builds AI prompt
- `parse_response` - Parses AI response
- `save_output` - Saves results to database
- **Optional Methods:**
- `get_metadata` - Returns display name, description, phases
- `get_max_items` - Returns max items limit (or None)
- `validate` - Validates input payload (default: checks for 'ids')
- `get_model` - Returns model override (default: None, uses account default)
#### Function Registry
- **File:** `backend/igny8_core/ai/registry.py`
- **Functions:**
- `register_function` - Registers function class
- `register_lazy_function` - Registers lazy loader
- `get_function` - Gets function class by name (lazy loads if needed)
- `get_function_instance` - Gets function instance by name
- `list_functions` - Lists all registered functions
- **Lazy Loaders:**
- `_load_auto_cluster` - Loads AutoClusterFunction
- `_load_generate_ideas` - Loads GenerateIdeasFunction
- `_load_generate_content` - Loads GenerateContentFunction
#### AI Core Handler
- **File:** `backend/igny8_core/ai/ai_core.py`
- **Class:** `AICore`
- **Purpose:** Centralized AI request handler for all text generation
- **Methods:**
- `run_ai_request` - Makes API call to OpenAI/Runware
- `extract_json` - Extracts JSON from response (handles markdown code blocks)
#### Prompt Registry
- **File:** `backend/igny8_core/ai/prompts.py`
- **Class:** `PromptRegistry`
- **Purpose:** Centralized prompt management with hierarchical resolution
- **Method:** `get_prompt` - Gets prompt with resolution order:
1. Task-level prompt_override (if exists)
2. DB prompt for (account, function)
3. Default fallback from DEFAULT_PROMPTS registry
- **Prompt Types:**
- `clustering` - For auto_cluster function
- `ideas` - For generate_ideas function
- `content_generation` - For generate_content function
- **Context Placeholders:**
- `[IGNY8_KEYWORDS]` - Replaced with keyword list
- `[IGNY8_CLUSTERS]` - Replaced with cluster list
- `[IGNY8_CLUSTER_KEYWORDS]` - Replaced with cluster keywords
- `[IGNY8_IDEA]` - Replaced with idea data
- `[IGNY8_CLUSTER]` - Replaced with cluster data
- `[IGNY8_KEYWORDS]` - Replaced with keywords (for content)
#### Model Settings
- **File:** `backend/igny8_core/ai/settings.py`
- **Constants:**
- `MODEL_CONFIG` - Model configurations per function (model, max_tokens, temperature, response_format)
- `FUNCTION_ALIASES` - Legacy function name mappings
- **Functions:**
- `get_model_config` - Gets model config for function (reads from IntegrationSettings if account provided)
- `get_model` - Gets model name for function
- `get_max_tokens` - Gets max tokens for function
- `get_temperature` - Gets temperature for function
#### Validators
- **File:** `backend/igny8_core/ai/validators.py`
- **Functions:**
- `validate_ids` - Validates 'ids' array in payload
- `validate_keywords_exist` - Validates keywords exist in database
- `validate_cluster_exists` - Validates cluster exists
- `validate_tasks_exist` - Validates tasks exist
- `validate_cluster_limits` - Validates plan limits (currently disabled - always returns valid)
- `validate_api_key` - Validates API key is configured
- `validate_model` - Validates model is in supported list
- `validate_image_size` - Validates image size for model
#### Progress Tracking
- **File:** `backend/igny8_core/ai/tracker.py`
- **Classes:**
- `StepTracker` - Tracks request/response steps
- `ProgressTracker` - Tracks Celery progress updates
- `CostTracker` - Tracks API costs and tokens
- `ConsoleStepTracker` - Console-based step logging
#### Database Logging
- **File:** `backend/igny8_core/ai/models.py`
- **Model:** `AITaskLog`
- **Fields:** `task_id`, `function_name`, `account`, `phase`, `message`, `status`, `duration`, `cost`, `tokens`, `request_steps`, `response_steps`, `error`, `payload`, `result`
### 1.2 Execution Flow (All Functions)
```
1. API Endpoint (views.py)
2. run_ai_task (tasks.py)
- Gets account from account_id
- Gets function instance from registry
- Creates AIEngine
3. AIEngine.execute (engine.py)
Phase 1: INIT (0-10%)
- Calls function.validate()
- Updates progress tracker
Phase 2: PREP (10-25%)
- Calls function.prepare()
- Calls function.build_prompt()
- Updates progress tracker
Phase 3: AI_CALL (25-70%)
- Gets model config from settings
- Calls AICore.run_ai_request()
- Tracks cost and tokens
- Updates progress tracker
Phase 4: PARSE (70-85%)
- Calls function.parse_response()
- Updates progress tracker
Phase 5: SAVE (85-98%)
- Calls function.save_output()
- Logs credit usage
- Updates progress tracker
Phase 6: DONE (98-100%)
- Logs to AITaskLog
- Returns result
```
---
## 2. Auto Cluster Keywords
### 2.1 Function Implementation
- **File:** `backend/igny8_core/ai/functions/auto_cluster.py`
- **Class:** `AutoClusterFunction`
- **Inherits:** `BaseAIFunction`
### 2.2 API Endpoint
- **File:** `backend/igny8_core/modules/planner/views.py`
- **ViewSet:** `KeywordViewSet`
- **Action:** `auto_cluster`
- **Method:** POST
- **URL Path:** `/v1/planner/keywords/auto_cluster/`
- **Payload:**
- `ids` (list[int]) - Keyword IDs to cluster
- `sector_id` (int, optional) - Sector ID for filtering
- **Response:**
- `success` (bool)
- `task_id` (str) - Celery task ID if async
- `clusters_created` (int) - Number of clusters created
- `keywords_updated` (int) - Number of keywords updated
- `message` (str)
### 2.3 Function Methods
#### `get_name()`
- **Returns:** `'auto_cluster'`
#### `get_metadata()`
- **Returns:** Dict with `display_name`, `description`, `phases` (INIT, PREP, AI_CALL, PARSE, SAVE, DONE)
#### `get_max_items()`
- **Returns:** `None` (no limit)
#### `validate(payload, account)`
- **Validates:**
- Calls `validate_ids` to check for 'ids' array
- Calls `validate_keywords_exist` to verify keywords exist
- **Returns:** Dict with `valid` (bool) and optional `error` (str)
#### `prepare(payload, account)`
- **Loads:**
- Keywords from database (filters by `ids`, `account`, optional `sector_id`)
- Uses `select_related` for: `account`, `site`, `site__account`, `sector`, `sector__site`
- **Returns:** Dict with:
- `keywords` (list[Keyword objects])
- `keyword_data` (list[dict]) - Formatted data with: `id`, `keyword`, `volume`, `difficulty`, `intent`
- `sector_id` (int, optional)
#### `build_prompt(data, account)`
- **Gets Prompt:**
- Calls `PromptRegistry.get_prompt(function_name='auto_cluster', account, context)`
- Context includes: `KEYWORDS` (formatted keyword list), optional `SECTOR` (sector name)
- **Formatting:**
- Formats keywords as: `"- {keyword} (Volume: {volume}, Difficulty: {difficulty}, Intent: {intent})"`
- Replaces `[IGNY8_KEYWORDS]` placeholder
- Adds JSON mode instruction if not present
- **Returns:** Prompt string
#### `parse_response(response, step_tracker)`
- **Parsing:**
- Tries direct JSON parse first
- Falls back to `AICore.extract_json()` if needed (handles markdown code blocks)
- **Extraction:**
- Extracts `clusters` array from JSON
- Handles both dict with 'clusters' key and direct array
- **Returns:** List[Dict] with cluster data:
- `name` (str) - Cluster name
- `description` (str) - Cluster description
- `keywords` (list[str]) - List of keyword strings
#### `save_output(parsed, original_data, account, progress_tracker, step_tracker)`
- **Input:**
- `parsed` - List of cluster dicts from parse_response
- `original_data` - Dict from prepare() with `keywords` and `sector_id`
- **Process:**
- Gets account, site, sector from first keyword
- For each cluster in parsed:
- Gets or creates `Clusters` record:
- Fields: `name`, `description`, `account`, `site`, `sector`, `status='active'`
- Uses `get_or_create` with name + account + site + sector
- Matches keywords (case-insensitive):
- Normalizes cluster keywords and available keywords to lowercase
- Updates matched `Keywords` records:
- Sets `cluster` foreign key
- Sets `status='mapped'`
- Recalculates cluster metrics:
- `keywords_count` - Count of keywords in cluster
- `volume` - Sum of keyword volumes (uses `volume_override` if available, else `seed_keyword__volume`)
- **Returns:** Dict with:
- `count` (int) - Clusters created
- `clusters_created` (int) - Clusters created
- `keywords_updated` (int) - Keywords updated
### 2.4 Database Models
#### Keywords Model
- **File:** `backend/igny8_core/modules/planner/models.py`
- **Model:** `Keywords`
- **Fields Used:**
- `id` - Keyword ID
- `seed_keyword` (ForeignKey) - Reference to SeedKeyword
- `keyword` (property) - Gets keyword text from seed_keyword
- `volume` (property) - Gets volume from volume_override or seed_keyword
- `difficulty` (property) - Gets difficulty from difficulty_override or seed_keyword
- `intent` (property) - Gets intent from seed_keyword
- `cluster` (ForeignKey) - Assigned cluster
- `status` - Status ('active', 'pending', 'mapped', 'archived')
- `account`, `site`, `sector` - From SiteSectorBaseModel
#### Clusters Model
- **File:** `backend/igny8_core/modules/planner/models.py`
- **Model:** `Clusters`
- **Fields Used:**
- `name` - Cluster name (unique)
- `description` - Cluster description
- `keywords_count` - Count of keywords (recalculated)
- `volume` - Sum of keyword volumes (recalculated)
- `status` - Status ('active')
- `account`, `site`, `sector` - From SiteSectorBaseModel
### 2.5 AI Response Format
**Expected JSON:**
```json
{
"clusters": [
{
"name": "Cluster Name",
"description": "Cluster description",
"keywords": ["keyword1", "keyword2", "keyword3"]
}
]
}
```
### 2.6 Progress Messages
- **INIT:** "Validating {keyword1}, {keyword2}, {keyword3} and {X} more keywords" (shows first 3, then count)
- **PREP:** "Loading {count} keyword(s)"
- **AI_CALL:** "Generating clusters with Igny8 Semantic SEO Model"
- **PARSE:** "{count} cluster(s) created"
- **SAVE:** "Saving {count} cluster(s)"
---
## 3. Generate Ideas
### 3.1 Function Implementation
- **File:** `backend/igny8_core/ai/functions/generate_ideas.py`
- **Class:** `GenerateIdeasFunction`
- **Inherits:** `BaseAIFunction`
### 3.2 API Endpoint
- **File:** `backend/igny8_core/modules/planner/views.py`
- **ViewSet:** `ClusterViewSet`
- **Action:** `auto_generate_ideas`
- **Method:** POST
- **URL Path:** `/v1/planner/clusters/auto_generate_ideas/`
- **Payload:**
- `ids` (list[int]) - Cluster IDs (max 10)
- **Response:**
- `success` (bool)
- `task_id` (str) - Celery task ID if async
- `ideas_created` (int) - Number of ideas created
- `message` (str)
### 3.3 Function Methods
#### `get_name()`
- **Returns:** `'generate_ideas'`
#### `get_metadata()`
- **Returns:** Dict with `display_name`, `description`, `phases` (INIT, PREP, AI_CALL, PARSE, SAVE, DONE)
#### `get_max_items()`
- **Returns:** `10` (max clusters per generation)
#### `validate(payload, account)`
- **Validates:**
- Calls `super().validate()` to check for 'ids' array and max_items limit
- Calls `validate_cluster_exists` for first cluster ID
- Calls `validate_cluster_limits` for plan limits (currently disabled)
- **Returns:** Dict with `valid` (bool) and optional `error` (str)
#### `prepare(payload, account)`
- **Loads:**
- Clusters from database (filters by `ids`, `account`)
- Uses `select_related` for: `sector`, `account`, `site`, `sector__site`
- Uses `prefetch_related` for: `keywords`
- **Gets Keywords:**
- For each cluster, loads `Keywords` with `select_related('seed_keyword')`
- Extracts keyword text from `seed_keyword.keyword`
- **Returns:** Dict with:
- `clusters` (list[Cluster objects])
- `cluster_data` (list[dict]) - Formatted data with: `id`, `name`, `description`, `keywords` (list[str])
- `account` (Account object)
#### `build_prompt(data, account)`
- **Gets Prompt:**
- Calls `PromptRegistry.get_prompt(function_name='generate_ideas', account, context)`
- Context includes:
- `CLUSTERS` - Formatted cluster list: `"Cluster ID: {id} | Name: {name} | Description: {description}"`
- `CLUSTER_KEYWORDS` - Formatted cluster keywords: `"Cluster ID: {id} | Name: {name} | Keywords: {keyword1}, {keyword2}"`
- **Replaces Placeholders:**
- `[IGNY8_CLUSTERS]` → clusters_text
- `[IGNY8_CLUSTER_KEYWORDS]` → cluster_keywords_text
- **Returns:** Prompt string
#### `parse_response(response, step_tracker)`
- **Parsing:**
- Calls `AICore.extract_json()` to extract JSON from response
- Validates 'ideas' key exists in JSON
- **Returns:** List[Dict] with idea data:
- `title` (str) - Idea title
- `description` (str or dict) - Idea description (can be JSON string)
- `content_type` (str) - Content type ('blog_post', 'article', etc.)
- `content_structure` (str) - Content structure ('cluster_hub', 'supporting_page', etc.)
- `cluster_id` (int, optional) - Cluster ID reference
- `cluster_name` (str, optional) - Cluster name reference
- `estimated_word_count` (int) - Estimated word count
- `covered_keywords` or `target_keywords` (str) - Target keywords
#### `save_output(parsed, original_data, account, progress_tracker, step_tracker)`
- **Input:**
- `parsed` - List of idea dicts from parse_response
- `original_data` - Dict from prepare() with `clusters` and `cluster_data`
- **Process:**
- For each idea in parsed:
- Matches cluster:
- First tries by `cluster_id` from AI response
- Falls back to `cluster_name` matching
- Last resort: position-based matching (first idea → first cluster)
- Gets site from cluster (or cluster.sector.site)
- Handles description:
- If dict, converts to JSON string
- If not string, converts to string
- Creates `ContentIdeas` record:
- Fields:
- `idea_title` - From `title`
- `description` - Processed description
- `content_type` - From `content_type` (default: 'blog_post')
- `content_structure` - From `content_structure` (default: 'supporting_page')
- `target_keywords` - From `covered_keywords` or `target_keywords`
- `keyword_cluster` - Matched cluster
- `estimated_word_count` - From `estimated_word_count` (default: 1500)
- `status` - 'new'
- `account`, `site`, `sector` - From cluster
- **Returns:** Dict with:
- `count` (int) - Ideas created
- `ideas_created` (int) - Ideas created
### 3.4 Database Models
#### Clusters Model
- **File:** `backend/igny8_core/modules/planner/models.py`
- **Model:** `Clusters`
- **Fields Used:**
- `id` - Cluster ID
- `name` - Cluster name
- `description` - Cluster description
- `keywords` (related_name) - Related Keywords
- `account`, `site`, `sector` - From SiteSectorBaseModel
#### ContentIdeas Model
- **File:** `backend/igny8_core/modules/planner/models.py`
- **Model:** `ContentIdeas`
- **Fields Used:**
- `idea_title` - Idea title
- `description` - Idea description (can be JSON string)
- `content_type` - Content type ('blog_post', 'article', 'guide', 'tutorial')
- `content_structure` - Content structure ('cluster_hub', 'landing_page', 'pillar_page', 'supporting_page')
- `target_keywords` - Target keywords string
- `keyword_cluster` (ForeignKey) - Related cluster
- `estimated_word_count` - Estimated word count
- `status` - Status ('new', 'scheduled', 'published')
- `account`, `site`, `sector` - From SiteSectorBaseModel
### 3.5 AI Response Format
**Expected JSON:**
```json
{
"ideas": [
{
"title": "Idea Title",
"description": "Idea description or JSON structure",
"content_type": "blog_post",
"content_structure": "supporting_page",
"cluster_id": 1,
"cluster_name": "Cluster Name",
"estimated_word_count": 1500,
"covered_keywords": "keyword1, keyword2"
}
]
}
```
### 3.6 Progress Messages
- **INIT:** "Verifying cluster integrity"
- **PREP:** "Loading cluster keywords"
- **AI_CALL:** "Generating ideas with Igny8 Semantic AI"
- **PARSE:** "{count} high-opportunity idea(s) generated"
- **SAVE:** "Content Outline for Ideas generated"
---
## 4. Generate Content
### 4.1 Function Implementation
- **File:** `backend/igny8_core/ai/functions/generate_content.py`
- **Class:** `GenerateContentFunction`
- **Inherits:** `BaseAIFunction`
### 4.2 API Endpoint
- **File:** `backend/igny8_core/modules/writer/views.py`
- **ViewSet:** `TasksViewSet`
- **Action:** `auto_generate_content`
- **Method:** POST
- **URL Path:** `/v1/writer/tasks/auto_generate_content/`
- **Payload:**
- `ids` (list[int]) - Task IDs (max 10)
- **Response:**
- `success` (bool)
- `task_id` (str) - Celery task ID if async
- `tasks_updated` (int) - Number of tasks updated
- `message` (str)
### 4.3 Function Methods
#### `get_name()`
- **Returns:** `'generate_content'`
#### `get_metadata()`
- **Returns:** Dict with `display_name`, `description`, `phases` (INIT, PREP, AI_CALL, PARSE, SAVE, DONE)
#### `get_max_items()`
- **Returns:** `50` (max tasks per batch)
#### `validate(payload, account)`
- **Validates:**
- Calls `super().validate()` to check for 'ids' array and max_items limit
- Calls `validate_tasks_exist` to verify tasks exist
- **Returns:** Dict with `valid` (bool) and optional `error` (str)
#### `prepare(payload, account)`
- **Loads:**
- Tasks from database (filters by `ids`, `account`)
- Uses `select_related` for: `account`, `site`, `sector`, `cluster`, `idea`
- **Returns:** List[Task objects]
#### `build_prompt(data, account)`
- **Input:** Can be single Task or list[Task] (handles first task if list)
- **Builds Idea Data:**
- `title` - From task.title
- `description` - From task.description
- `outline` - From task.idea.description (handles JSON structure):
- If JSON, formats as: `"## {H2 heading}\n### {H3 subheading}\nContent Type: {type}\nDetails: {details}"`
- If plain text, uses as-is
- `structure` - From task.idea.content_structure or task.content_structure
- `type` - From task.idea.content_type or task.content_type
- `estimated_word_count` - From task.idea.estimated_word_count
- **Builds Cluster Data:**
- `cluster_name` - From task.cluster.name
- `description` - From task.cluster.description
- `status` - From task.cluster.status
- **Builds Keywords Data:**
- From task.keywords (legacy) or task.idea.target_keywords
- **Gets Prompt:**
- Calls `PromptRegistry.get_prompt(function_name='generate_content', account, task, context)`
- Context includes:
- `IDEA` - Formatted idea data string
- `CLUSTER` - Formatted cluster data string
- `KEYWORDS` - Keywords string
- **Returns:** Prompt string
#### `parse_response(response, step_tracker)`
- **Parsing:**
- First tries JSON parse:
- If successful and dict, returns dict
- Falls back to plain text:
- Calls `normalize_content()` from `content_normalizer` to convert to HTML
- Returns dict with `content` field
- **Returns:** Dict with:
- **If JSON:**
- `content` (str) - HTML content
- `title` (str, optional) - Content title
- `meta_title` (str, optional) - Meta title
- `meta_description` (str, optional) - Meta description
- `word_count` (int, optional) - Word count
- `primary_keyword` (str, optional) - Primary keyword
- `secondary_keywords` (list, optional) - Secondary keywords
- `tags` (list, optional) - Tags
- `categories` (list, optional) - Categories
- **If Plain Text:**
- `content` (str) - Normalized HTML content
#### `save_output(parsed, original_data, account, progress_tracker, step_tracker)`
- **Input:**
- `parsed` - Dict from parse_response
- `original_data` - Task object or list[Task] (handles first task if list)
- **Process:**
- Extracts content fields from parsed dict:
- `content_html` - From `content` field
- `title` - From `title` or task.title
- `meta_title` - From `meta_title` or task.meta_title or task.title
- `meta_description` - From `meta_description` or task.meta_description or task.description
- `word_count` - From `word_count` or calculated from content
- `primary_keyword` - From `primary_keyword`
- `secondary_keywords` - From `secondary_keywords` (converts to list if needed)
- `tags` - From `tags` (converts to list if needed)
- `categories` - From `categories` (converts to list if needed)
- Calculates word count if not provided:
- Strips HTML tags and counts words
- Gets or creates `Content` record:
- Uses `get_or_create` with `task` (OneToOne relationship)
- Defaults: `html_content`, `word_count`, `status='draft'`, `account`, `site`, `sector`
- Updates `Content` fields:
- `html_content` - Content HTML
- `word_count` - Word count
- `title` - Content title
- `meta_title` - Meta title
- `meta_description` - Meta description
- `primary_keyword` - Primary keyword
- `secondary_keywords` - Secondary keywords (JSONField)
- `tags` - Tags (JSONField)
- `categories` - Categories (JSONField)
- `status` - Always 'draft' for newly generated content
- `metadata` - Extra fields from parsed dict (excludes standard fields)
- `account`, `site`, `sector`, `task` - Aligned from task
- Updates `Tasks` record:
- Sets `status='completed'`
- Updates `updated_at`
- **Returns:** Dict with:
- `count` (int) - Tasks updated (always 1 per task)
- `tasks_updated` (int) - Tasks updated
- `word_count` (int) - Word count
### 4.4 Database Models
#### Tasks Model
- **File:** `backend/igny8_core/modules/writer/models.py`
- **Model:** `Tasks`
- **Fields Used:**
- `id` - Task ID
- `title` - Task title
- `description` - Task description
- `keywords` - Keywords string (legacy)
- `cluster` (ForeignKey) - Related cluster
- `idea` (ForeignKey) - Related ContentIdeas
- `content_structure` - Content structure
- `content_type` - Content type
- `status` - Status ('queued', 'completed')
- `meta_title` - Meta title
- `meta_description` - Meta description
- `account`, `site`, `sector` - From SiteSectorBaseModel
#### Content Model
- **File:** `backend/igny8_core/modules/writer/models.py`
- **Model:** `Content`
- **Fields Used:**
- `task` (OneToOneField) - Related task
- `html_content` - HTML content
- `word_count` - Word count
- `title` - Content title
- `meta_title` - Meta title
- `meta_description` - Meta description
- `primary_keyword` - Primary keyword
- `secondary_keywords` (JSONField) - Secondary keywords list
- `tags` (JSONField) - Tags list
- `categories` (JSONField) - Categories list
- `status` - Status ('draft', 'review', 'published')
- `metadata` (JSONField) - Additional metadata
- `account`, `site`, `sector` - From SiteSectorBaseModel (auto-set from task)
### 4.5 AI Response Format
**Expected JSON:**
```json
{
"content": "<html>Content HTML</html>",
"title": "Content Title",
"meta_title": "Meta Title",
"meta_description": "Meta description",
"word_count": 1500,
"primary_keyword": "primary keyword",
"secondary_keywords": ["keyword1", "keyword2"],
"tags": ["tag1", "tag2"],
"categories": ["category1"]
}
```
**Or Plain Text:**
```
Plain text content that will be normalized to HTML
```
### 4.6 Progress Messages
- **INIT:** "Validating task"
- **PREP:** "Preparing content idea"
- **AI_CALL:** "Writing article with Igny8 Semantic AI"
- **PARSE:** "{count} article(s) created"
- **SAVE:** "Saving article"
---
## 5. Change Guide
### 5.1 Where to Change Validation Logic
- **File:** `backend/igny8_core/ai/validators.py`
- **Functions:** `validate_ids`, `validate_keywords_exist`, `validate_cluster_exists`, `validate_tasks_exist`
- **Or:** Override `validate()` method in function class
### 5.2 Where to Change Data Loading
- **File:** Function-specific file (e.g., `auto_cluster.py`)
- **Method:** `prepare()`
- **Change:** Modify queryset filters, select_related, prefetch_related
### 5.3 Where to Change Prompts
- **File:** `backend/igny8_core/ai/prompts.py`
- **Method:** `PromptRegistry.get_prompt()`
- **Change:** Modify `DEFAULT_PROMPTS` dict or update database prompts
### 5.4 Where to Change Model Configuration
- **File:** `backend/igny8_core/ai/settings.py`
- **Constant:** `MODEL_CONFIG`
- **Change:** Update model, max_tokens, temperature, response_format per function
### 5.5 Where to Change Response Parsing
- **File:** Function-specific file (e.g., `generate_content.py`)
- **Method:** `parse_response()`
- **Change:** Modify JSON extraction or plain text handling
### 5.6 Where to Change Database Saving
- **File:** Function-specific file (e.g., `auto_cluster.py`)
- **Method:** `save_output()`
- **Change:** Modify model creation/update logic, field mappings
### 5.7 Where to Change Progress Messages
- **File:** `backend/igny8_core/ai/engine.py`
- **Methods:** `_get_prep_message()`, `_get_ai_call_message()`, `_get_parse_message()`, `_get_save_message()`
- **Or:** Override in function class `get_metadata()` phases
### 5.8 Where to Change Error Handling
- **File:** `backend/igny8_core/ai/engine.py`
- **Method:** `_handle_error()`
- **Change:** Modify error logging, error response format
---
## 6. Dependencies
### 6.1 Function Dependencies
- All functions depend on: `BaseAIFunction`, `AICore`, `PromptRegistry`, `get_model_config`
- Clustering depends on: `Keywords`, `Clusters` models
- Ideas depends on: `Clusters`, `ContentIdeas`, `Keywords` models
- Content depends on: `Tasks`, `Content`, `ContentIdeas`, `Clusters` models
### 6.2 External Dependencies
- **Celery:** For async task execution (`run_ai_task`)
- **OpenAI API:** For AI text generation (via `AICore.run_ai_request`)
- **Django ORM:** For database operations
- **IntegrationSettings:** For account-specific model configuration
---
## 7. Key Relationships
### 7.1 Clustering Flow
```
Keywords → Clusters (many-to-one)
- Keywords.cluster (ForeignKey)
- Clusters.keywords (related_name)
```
### 7.2 Ideas Flow
```
Clusters → ContentIdeas (one-to-many)
- ContentIdeas.keyword_cluster (ForeignKey)
- Clusters.ideas (related_name, if exists)
```
### 7.3 Content Flow
```
Tasks → Content (one-to-one)
- Content.task (OneToOneField)
- Tasks.content_record (related_name)
Tasks → ContentIdeas (many-to-one)
- Tasks.idea (ForeignKey)
- ContentIdeas.tasks (related_name)
Tasks → Clusters (many-to-one)
- Tasks.cluster (ForeignKey)
- Clusters.tasks (related_name)
```
---
## 8. Notes
- All functions use the same execution pipeline through `AIEngine.execute()`
- Progress tracking is handled automatically by `AIEngine`
- Cost tracking is handled automatically by `CostTracker`
- Database logging is handled automatically by `AITaskLog`
- Model configuration can be overridden per account via `IntegrationSettings`
- Prompts can be overridden per account via database prompts
- All functions support both async (Celery) and sync execution
- Error handling is centralized in `AIEngine._handle_error()`