igny8/AI-MODEL-COST-REFACTOR-PLAN.md

# AI Model & Cost Configuration System - Refactor Plan

**Version:** 2.0
**Date:** December 23, 2025
**Current State:** Commit #10 (98e68f6) - Credit-based system with operation configs
**Target:** Token-based system with centralized AI model cost configuration

---

## Executive Summary

### Current System (Commit #10)
- ✅ **CreditCostConfig**: Operation-level credit costs (clustering=1 credit, ideas=15 credits)
- ✅ **Units**: per_request, per_100_words, per_200_words, per_item, per_image
- ❌ **No token tracking**: Credits are fixed per operation, not based on actual AI usage
- ❌ **No model awareness**: All models cost the same regardless of GPT-3.5 vs GPT-4
- ❌ **No accurate analytics**: Cannot track real costs or token consumption

### Previous Attempt (Commits 8-9 - Reverted)
- ✅ Token-based calculation: `credits = total_tokens / tokens_per_credit`
- ✅ BillingConfiguration: Global `default_tokens_per_credit = 100`
- ✅ Per-operation token ratios in CreditCostConfig
- ❌ **Too complex**: Each operation had separate `tokens_per_credit`, `min_credits`, `price_per_credit_usd`
- ❌ **Not model-aware**: Still didn't account for different AI model costs

### Proposed Solution (This Plan)
1. **Add token-based units**: `per_100_tokens`, `per_1000_tokens` to existing unit choices
2. **Create AIModelConfig**: Centralized model pricing (GPT-4: $10/1M input, $30/1M output)
3. **Link everything**: Integration settings → Model → Cost calculation → Credit deduction
4. **Accurate tracking**: Real-time token usage, model costs, and credit analytics

---

## Problem Analysis

### What Commits 8-9 Tried to Achieve
**Goal:** Move from fixed-credit-per-operation to dynamic token-based billing

**Implementation:**
```
OLD (Commit #10):
- Clustering = 10 credits (always, regardless of token usage)
- Content Generation = 1 credit per 100 words

NEW (Commits 8-9):
- Clustering = X tokens used / 150 tokens_per_credit = Y credits
- Content Generation = X tokens used / 100 tokens_per_credit = Y credits
```

**Why It Failed:**
1. **Complexity overload**: Every operation needed its own token ratio configuration
2. **Duplicate configs**: `tokens_per_credit` at both global and operation level
3. **No model differentiation**: GPT-3.5 turbo (cheap) vs GPT-4 (expensive) cost the same
4. **Migration issues**: Database schema changes broke backward compatibility

### Root Cause
**Missing piece:** No centralized AI model cost configuration. Each operation was configured in isolation without understanding which AI model was being used and its actual cost.

---

## Proposed Architecture

### 1. New Model: `AIModelConfig`

**Purpose:** Single source of truth for AI model pricing

**Fields:**
```
- model_name: CharField (e.g., "gpt-4-turbo", "gpt-3.5-turbo", "claude-3-sonnet")
- provider: CharField (openai, anthropic, runware)
- model_type: CharField (text, image)
- cost_per_1k_input_tokens: DecimalField (e.g., $0.01)
- cost_per_1k_output_tokens: DecimalField (e.g., $0.03)
- tokens_per_credit: IntegerField (e.g., 100) - How many tokens = 1 credit
- is_active: BooleanField
- display_name: CharField (e.g., "GPT-4 Turbo (Recommended)")
- description: TextField
- created_at, updated_at
```

**Example Data:**
| Model | Provider | Input $/1K | Output $/1K | Tokens/Credit | Display Name |
|-------|----------|------------|-------------|---------------|--------------|
| gpt-4-turbo | openai | $0.010 | $0.030 | 50 | GPT-4 Turbo (Premium) |
| gpt-3.5-turbo | openai | $0.0005 | $0.0015 | 200 | GPT-3.5 Turbo (Fast) |
| claude-3-sonnet | anthropic | $0.003 | $0.015 | 100 | Claude 3 Sonnet |

### 2. Updated Model: `CreditCostConfig`

**Changes:**
- Keep existing fields: `operation_type`, `credits_cost`, `unit`, `display_name`, `is_active`
- **ADD** `default_model`: ForeignKey to AIModelConfig (nullable)
- **UPDATE** `unit` choices: Add `per_100_tokens`, `per_1000_tokens`

**New Unit Choices:**
```python
UNIT_CHOICES = [
    ('per_request', 'Per Request'),           # Fixed cost (clustering)
    ('per_100_words', 'Per 100 Words'),       # Word-based (content)
    ('per_200_words', 'Per 200 Words'),       # Word-based (optimization)
    ('per_item', 'Per Item'),                 # Item-based (ideas per cluster)
    ('per_image', 'Per Image'),               # Image-based
    ('per_100_tokens', 'Per 100 Tokens'),     # NEW: Token-based
    ('per_1000_tokens', 'Per 1000 Tokens'),   # NEW: Token-based
]
```

**How It Works:**
```
Example 1: Content Generation with GPT-4 Turbo
- Operation: content_generation
- Unit: per_1000_tokens
- Default Model: gpt-4-turbo (50 tokens/credit)
- Actual usage: 2500 input + 1500 output = 4000 total tokens
- Credits = 4000 / 50 = 80 credits

Example 2: Content Generation with GPT-3.5 Turbo (user selected)
- Operation: content_generation
- Unit: per_1000_tokens
- Model used: gpt-3.5-turbo (200 tokens/credit)
- Actual usage: 2500 input + 1500 output = 4000 total tokens
- Credits = 4000 / 200 = 20 credits (4x cheaper!)
```

### 3. Updated Model: `IntegrationSettings`

**Changes:**
- **ADD** `default_text_model`: ForeignKey to AIModelConfig
- **ADD** `default_image_model`: ForeignKey to AIModelConfig
- Keep existing: `openai_api_key`, `anthropic_api_key`, `runware_api_key`

**Purpose:** Account-level model selection
```
Account "AWS Admin" Settings:
- OpenAI API Key: sk-...
- Default Text Model: GPT-3.5 Turbo (cost-effective)
- Default Image Model: DALL-E 3

Account "Premium Client" Settings:
- OpenAI API Key: sk-...
- Default Text Model: GPT-4 Turbo (best quality)
- Default Image Model: DALL-E 3
```

### 4. Updated: `CreditUsageLog`

**Changes:**
- Keep existing: `operation_type`, `credits_used`, `tokens_input`, `tokens_output`
- **UPDATE** `model_used`: CharField → ForeignKey to AIModelConfig
- **ADD** `cost_usd_input`: DecimalField (actual input cost)
- **ADD** `cost_usd_output`: DecimalField (actual output cost)
- **ADD** `cost_usd_total`: DecimalField (total API cost)

**Purpose:** Accurate cost tracking and analytics

---

## Implementation Timeline

### Phase 1: Foundation (Week 1)

#### Step 1.1: Create AIModelConfig Model
- [ ] Create model in `backend/igny8_core/business/billing/models.py`
- [ ] Create admin interface in `backend/igny8_core/business/billing/admin.py`
- [ ] Create migration
- [ ] Seed initial data (GPT-4, GPT-3.5, Claude, DALL-E models)

#### Step 1.2: Update CreditCostConfig
- [ ] Add `default_model` ForeignKey field
- [ ] Update `UNIT_CHOICES` to include `per_100_tokens`, `per_1000_tokens`
- [ ] Create migration
- [ ] Update admin interface to show model selector

#### Step 1.3: Update IntegrationSettings
- [ ] Add `default_text_model` ForeignKey
- [ ] Add `default_image_model` ForeignKey
- [ ] Create migration
- [ ] Update admin interface with model selectors

### Phase 2: Credit Calculation (Week 2)

#### Step 2.1: Update CreditService
- [ ] Add method: `calculate_credits_from_tokens(operation_type, tokens_input, tokens_output, model_used)`
- [ ] Logic:
  ```
  1. Get CreditCostConfig for operation
  2. Get model's tokens_per_credit ratio
  3. Calculate: credits = total_tokens / tokens_per_credit
  4. Apply rounding (up/down/nearest)
  5. Apply minimum credits if configured
  ```
- [ ] Keep legacy methods for backward compatibility

#### Step 2.2: Update AIEngine
- [ ] Extract `model_used` from AI response
- [ ] Pass model to credit calculation
- [ ] Handle model selection priority:
  ```
  1. Task-level override (if specified)
  2. Account's default model (from IntegrationSettings)
  3. System default model (fallback)
  ```

#### Step 2.3: Update AI Services
- [ ] Update clustering_service.py
- [ ] Update ideas_service.py
- [ ] Update content_service.py
- [ ] Update image_service.py
- [ ] Update optimizer_service.py
- [ ] Update linker_service.py

### Phase 3: Logging & Analytics (Week 3)

#### Step 3.1: Update CreditUsageLog
- [ ] Change `model_used` from CharField to ForeignKey
- [ ] Add cost fields: `cost_usd_input`, `cost_usd_output`, `cost_usd_total`
- [ ] Create migration with data preservation
- [ ] Update logging logic to capture costs

#### Step 3.2: Create Analytics Views
- [ ] Token Usage Report (by model, by operation, by account)
- [ ] Cost Analysis Report (actual $ spent vs credits charged)
- [ ] Model Performance Report (tokens/sec, success rate by model)
- [ ] Account Efficiency Report (credit consumption patterns)

#### Step 3.3: Update Admin Reports
- [ ] Enhance existing reports with model data
- [ ] Add model cost comparison charts
- [ ] Add token consumption trends

### Phase 4: Testing & Migration (Week 4)

#### Step 4.1: Data Migration
- [ ] Backfill existing CreditUsageLog with default models
- [ ] Link existing IntegrationSettings to default models
- [ ] Update existing CreditCostConfig with default models

#### Step 4.2: Testing
- [ ] Unit tests for credit calculation with different models
- [ ] Integration tests for full AI execution flow
- [ ] Load tests for analytics queries
- [ ] Admin interface testing

#### Step 4.3: Documentation
- [ ] Update API documentation
- [ ] Create admin user guide
- [ ] Create developer guide
- [ ] Update pricing page

---

## Functional Flow

### User Perspective

#### Scenario 1: Content Generation (Default Model)
```
1. User clicks "Generate Content" for 5 blog posts
2. System checks account's default model: GPT-3.5 Turbo
3. Content generated using GPT-3.5 Turbo
4. Token usage: 12,500 input + 8,500 output = 21,000 tokens
5. Model ratio: 200 tokens/credit
6. Credits deducted: 21,000 / 200 = 105 credits
7. User sees: "✓ Generated 5 posts (105 credits, GPT-3.5)"
```

#### Scenario 2: Content Generation (Premium Model)
```
1. User selects "Use GPT-4 Turbo" from model dropdown
2. System validates: account has GPT-4 enabled
3. Content generated using GPT-4 Turbo
4. Token usage: 12,500 input + 8,500 output = 21,000 tokens
5. Model ratio: 50 tokens/credit
6. Credits deducted: 21,000 / 50 = 420 credits (4x more expensive!)
7. User sees: "✓ Generated 5 posts (420 credits, GPT-4 Turbo)"
8. System shows warning: "GPT-4 used 4x more credits than GPT-3.5"
```

#### Scenario 3: Image Generation
```
1. User generates 10 images
2. System uses account's default image model: DALL-E 3
3. No token tracking for images (fixed cost per image)
4. Credits: 10 images × 5 credits/image = 50 credits
5. User sees: "✓ Generated 10 images (50 credits, DALL-E 3)"
```

### Backend Operational Context

#### Credit Calculation Flow
```
User Request
    ↓
AIEngine.execute()
    ↓
Determine Model:
  - Task.model_override (highest priority)
  - Account.default_text_model (from IntegrationSettings)
  - CreditCostConfig.default_model (fallback)
    ↓
Call AI API (OpenAI, Anthropic, etc.)
    ↓
Response: {
  input_tokens: 2500,
  output_tokens: 1500,
  model: "gpt-4-turbo",
  cost_usd: 0.085
}
    ↓
CreditService.calculate_credits_from_tokens(
  operation_type="content_generation",
  tokens_input=2500,
  tokens_output=1500,
  model_used=gpt-4-turbo
)
    ↓
Logic:
  1. Get CreditCostConfig for "content_generation"
  2. Check unit: per_1000_tokens
  3. Get model's tokens_per_credit: 50
  4. Calculate: (2500 + 1500) / 50 = 80 credits
  5. Apply rounding: ceil(80) = 80 credits
    ↓
CreditService.deduct_credits(
  account=user.account,
  amount=80,
  operation_type="content_generation",
  description="Generated blog post",
  tokens_input=2500,
  tokens_output=1500,
  model_used=gpt-4-turbo,
  cost_usd=0.085
)
    ↓
CreditUsageLog created:
  - operation_type: content_generation
  - credits_used: 80
  - tokens_input: 2500
  - tokens_output: 1500
  - model_used: gpt-4-turbo (FK)
  - cost_usd_input: 0.025
  - cost_usd_output: 0.060
  - cost_usd_total: 0.085
    ↓
Account.credits updated: 1000 → 920
```

#### Analytics & Reporting

**Token Usage Report:**
```sql
SELECT
    model.display_name,
    operation_type,
    COUNT(*) as total_calls,
    SUM(tokens_input) as total_input_tokens,
    SUM(tokens_output) as total_output_tokens,
    SUM(credits_used) as total_credits,
    SUM(cost_usd_total) as total_cost_usd
FROM credit_usage_log
JOIN ai_model_config ON model_used_id = model.id
WHERE account_id = ?
  AND created_at >= ?
GROUP BY model.id, operation_type
ORDER BY total_cost_usd DESC
```

**Output:**
| Model | Operation | Calls | Input Tokens | Output Tokens | Credits | Cost USD |
|-------|-----------|-------|--------------|---------------|---------|----------|
| GPT-4 Turbo | content_generation | 150 | 375K | 225K | 12,000 | $9.75 |
| GPT-3.5 Turbo | clustering | 50 | 25K | 10K | 175 | $0.06 |
| Claude 3 Sonnet | idea_generation | 80 | 40K | 60K | 1,000 | $0.42 |

**Cost Efficiency Analysis:**
```
Account: Premium Client
Period: Last 30 days

Credits Purchased: 50,000 credits × $0.01 = $500.00 (revenue)
Actual AI Costs: $247.83 (OpenAI + Anthropic API costs)
Gross Margin: $252.17 (50.4% margin)

Model Usage:
- GPT-4 Turbo: 65% of costs, 45% of credits
- GPT-3.5 Turbo: 20% of costs, 40% of credits
- Claude 3: 15% of costs, 15% of credits

Recommendation:
- GPT-3.5 most profitable (high credits, low cost)
- GPT-4 acceptable margin (high value, high cost)
```

---

## Benefits

### For Users
1. **Transparent Pricing**: See exact model and token usage per operation
2. **Cost Control**: Choose cheaper models when quality difference is minimal
3. **Model Selection**: Pick GPT-4 for important content, GPT-3.5 for bulk work
4. **Usage Analytics**: Understand token consumption patterns

### For Backend Operations
1. **Accurate Cost Tracking**: Know exactly how much each account costs
2. **Revenue Optimization**: Set credit prices based on actual model costs
3. **Model Performance**: Track which models are most efficient
4. **Billing Transparency**: Can show users actual API costs vs credits charged

### For Business
1. **Margin Visibility**: Track profitability per account, per model
2. **Pricing Flexibility**: Easily adjust credit costs when AI prices change
3. **Model Migration**: Seamlessly switch between providers (OpenAI → Anthropic)
4. **Scalability**: Support new models without code changes

---

## Migration Strategy

### Backward Compatibility

**Phase 1: Dual Mode**
- Keep old credit calculation as fallback
- New token-based calculation opt-in per operation
- Both systems run in parallel

**Phase 2: Gradual Migration**
- Week 1: Migrate non-critical operations (clustering, ideas)
- Week 2: Migrate content generation
- Week 3: Migrate optimization and linking
- Week 4: Full cutover

**Phase 3: Cleanup**
- Remove legacy calculation code
- Archive old credit cost configs
- Update all documentation

### Data Preservation
- All existing CreditUsageLog entries preserved
- Backfill `model_used` with "legacy-unknown" placeholder model
- Historical data remains queryable
- Analytics show "before/after" comparison

---

## Risk Mitigation

### Technical Risks
1. **Migration complexity**: Use feature flags, gradual rollout
2. **Performance impact**: Index all FK relationships, cache model configs
3. **API changes**: Handle token extraction failures gracefully

### Business Risks
1. **Cost increase**: Monitor margin changes, adjust credit pricing if needed
2. **User confusion**: Clear UI messaging about model selection
3. **Revenue impact**: Set credit prices with 50%+ margin buffer

---

## Success Metrics

### Phase 1 (Foundation)
- ✅ AIModelConfig admin accessible
- ✅ 5+ models configured (GPT-4, GPT-3.5, Claude, etc.)
- ✅ All integration settings linked to models

### Phase 2 (Calculation)
- ✅ 100% of operations use token-based calculation
- ✅ Credit deductions accurate within 1% margin
- ✅ Model selection working (default, override, fallback)

### Phase 3 (Analytics)
- ✅ Token usage report showing accurate data
- ✅ Cost analysis report shows margin per account
- ✅ Model performance metrics visible

### Phase 4 (Production)
- ✅ 30+ days production data collected
- ✅ Margin maintained at 45%+ across all accounts
- ✅ Zero billing disputes related to credits
- ✅ User satisfaction: 90%+ understand pricing

---

## Appendix: Code Examples (Conceptual)

### Credit Calculation Logic
```python
# Simplified conceptual flow (not actual code)

def calculate_credits_from_tokens(operation_type, tokens_input, tokens_output, model_used):
    """
    Calculate credits based on actual token usage and model cost
    """
    # Get operation config
    config = CreditCostConfig.objects.get(operation_type=operation_type)

    # Determine unit type
    if config.unit == 'per_1000_tokens':
        total_tokens = tokens_input + tokens_output
        tokens_per_credit = model_used.tokens_per_credit

        # Calculate credits
        credits_float = total_tokens / tokens_per_credit

        # Apply rounding (configured globally)
        credits = apply_rounding(credits_float)

        # Apply minimum
        credits = max(credits, config.credits_cost)

        return credits

    elif config.unit == 'per_request':
        # Fixed cost, ignore tokens
        return config.credits_cost

    # ... other unit types
```

### Model Selection Priority
```python
# Simplified conceptual flow (not actual code)

def get_model_for_operation(account, operation_type, task_override=None):
    """
    Determine which AI model to use
    Priority: Task Override > Account Default > System Default
    """
    # 1. Task-level override (highest priority)
    if task_override and task_override.model_id:
        return task_override.model

    # 2. Account default model
    integration = IntegrationSettings.objects.get(account=account)
    operation_config = CreditCostConfig.objects.get(operation_type=operation_type)

    if operation_config.model_type == 'text':
        if integration.default_text_model:
            return integration.default_text_model
    elif operation_config.model_type == 'image':
        if integration.default_image_model:
            return integration.default_image_model

    # 3. System default (fallback)
    if operation_config.default_model:
        return operation_config.default_model

    # 4. Hard-coded fallback
    return AIModelConfig.objects.get(model_name='gpt-3.5-turbo')
```

---

## Comparison: Old vs New

### Current System (Commit #10)
```
Operation: content_generation
Cost: 1 credit per 100 words
Usage: Generated 1000-word article
Result: 10 credits deducted

Problem:
- Doesn't track actual tokens used
- All models cost the same
- No cost transparency
```

### Previous Attempt (Commits 8-9)
```
Operation: content_generation
Config: 100 tokens per credit
Usage: 2500 input + 1500 output = 4000 tokens
Result: 4000 / 100 = 40 credits deducted

Problem:
- Still no model differentiation
- Over-engineered (too many config options)
- Complex migrations
```

### Proposed System
```
Operation: content_generation
Model: GPT-4 Turbo (50 tokens/credit)
Usage: 2500 input + 1500 output = 4000 tokens
Cost: $0.085 (actual API cost)
Result: 4000 / 50 = 80 credits deducted

Benefits:
✓ Accurate token tracking
✓ Model-aware pricing
✓ Cost transparency
✓ Margin visibility
✓ User can choose cheaper model

Alternative with GPT-3.5:
Model: GPT-3.5 Turbo (200 tokens/credit)
Same 4000 tokens
Cost: $0.008 (10x cheaper API cost)
Result: 4000 / 200 = 20 credits (4x fewer credits)
```

---

## Conclusion

This refactor transforms IGNY8's billing system from a simple fixed-cost model to a sophisticated token-based system that:

1. **Tracks actual usage** with token-level precision
2. **Differentiates AI models** so users pay appropriately
3. **Provides transparency** showing exact costs and models used
4. **Enables cost control** through model selection
5. **Improves margins** through accurate cost tracking

The phased approach ensures backward compatibility while gradually migrating to the new system. By Week 4, IGNY8 will have complete visibility into AI costs, user consumption patterns, and revenue margins—all while giving users more control and transparency.