Files
igny8/tmp-md-files/AI-MODEL-COST-REFACTOR-PLAN.md
alorig d5d8ce9168 123
2025-12-24 02:03:10 +05:00

20 KiB
Raw Blame History

AI Model & Cost Configuration System - Refactor Plan

Version: 2.0
Date: December 23, 2025
Current State: Commit #10 (98e68f6) - Credit-based system with operation configs
Target: Token-based system with centralized AI model cost configuration


Executive Summary

Current System (Commit #10)

  • CreditCostConfig: Operation-level credit costs (clustering=1 credit, ideas=15 credits)
  • Units: per_request, per_100_words, per_200_words, per_item, per_image
  • No token tracking: Credits are fixed per operation, not based on actual AI usage
  • No model awareness: All models cost the same regardless of GPT-3.5 vs GPT-4
  • No accurate analytics: Cannot track real costs or token consumption

Previous Attempt (Commits 8-9 - Reverted)

  • Token-based calculation: credits = total_tokens / tokens_per_credit
  • BillingConfiguration: Global default_tokens_per_credit = 100
  • Per-operation token ratios in CreditCostConfig
  • Too complex: Each operation had separate tokens_per_credit, min_credits, price_per_credit_usd
  • Not model-aware: Still didn't account for different AI model costs

Proposed Solution (This Plan)

  1. Add token-based units: per_100_tokens, per_1000_tokens to existing unit choices
  2. Create AIModelConfig: Centralized model pricing (GPT-4: $10/1M input, $30/1M output)
  3. Link everything: Integration settings → Model → Cost calculation → Credit deduction
  4. Accurate tracking: Real-time token usage, model costs, and credit analytics

Problem Analysis

What Commits 8-9 Tried to Achieve

Goal: Move from fixed-credit-per-operation to dynamic token-based billing

Implementation:

OLD (Commit #10):
- Clustering = 10 credits (always, regardless of token usage)
- Content Generation = 1 credit per 100 words

NEW (Commits 8-9):
- Clustering = X tokens used / 150 tokens_per_credit = Y credits
- Content Generation = X tokens used / 100 tokens_per_credit = Y credits

Why It Failed:

  1. Complexity overload: Every operation needed its own token ratio configuration
  2. Duplicate configs: tokens_per_credit at both global and operation level
  3. No model differentiation: GPT-3.5 turbo (cheap) vs GPT-4 (expensive) cost the same
  4. Migration issues: Database schema changes broke backward compatibility

Root Cause

Missing piece: No centralized AI model cost configuration. Each operation was configured in isolation without understanding which AI model was being used and its actual cost.


Proposed Architecture

1. New Model: AIModelConfig

Purpose: Single source of truth for AI model pricing

Fields:

- model_name: CharField (e.g., "gpt-4-turbo", "gpt-3.5-turbo", "claude-3-sonnet")
- provider: CharField (openai, anthropic, runware)
- model_type: CharField (text, image)
- cost_per_1k_input_tokens: DecimalField (e.g., $0.01)
- cost_per_1k_output_tokens: DecimalField (e.g., $0.03)
- tokens_per_credit: IntegerField (e.g., 100) - How many tokens = 1 credit
- is_active: BooleanField
- display_name: CharField (e.g., "GPT-4 Turbo (Recommended)")
- description: TextField
- created_at, updated_at

Example Data:

Model Provider Input $/1K Output $/1K Tokens/Credit Display Name
gpt-4-turbo openai $0.010 $0.030 50 GPT-4 Turbo (Premium)
gpt-3.5-turbo openai $0.0005 $0.0015 200 GPT-3.5 Turbo (Fast)
claude-3-sonnet anthropic $0.003 $0.015 100 Claude 3 Sonnet

2. Updated Model: CreditCostConfig

Changes:

  • Keep existing fields: operation_type, credits_cost, unit, display_name, is_active
  • ADD default_model: ForeignKey to AIModelConfig (nullable)
  • UPDATE unit choices: Add per_100_tokens, per_1000_tokens

New Unit Choices:

UNIT_CHOICES = [
    ('per_request', 'Per Request'),           # Fixed cost (clustering)
    ('per_100_words', 'Per 100 Words'),       # Word-based (content)
    ('per_200_words', 'Per 200 Words'),       # Word-based (optimization)
    ('per_item', 'Per Item'),                 # Item-based (ideas per cluster)
    ('per_image', 'Per Image'),               # Image-based
    ('per_100_tokens', 'Per 100 Tokens'),     # NEW: Token-based
    ('per_1000_tokens', 'Per 1000 Tokens'),   # NEW: Token-based
]

How It Works:

Example 1: Content Generation with GPT-4 Turbo
- Operation: content_generation
- Unit: per_1000_tokens
- Default Model: gpt-4-turbo (50 tokens/credit)
- Actual usage: 2500 input + 1500 output = 4000 total tokens
- Credits = 4000 / 50 = 80 credits

Example 2: Content Generation with GPT-3.5 Turbo (user selected)
- Operation: content_generation
- Unit: per_1000_tokens
- Model used: gpt-3.5-turbo (200 tokens/credit)
- Actual usage: 2500 input + 1500 output = 4000 total tokens
- Credits = 4000 / 200 = 20 credits (4x cheaper!)

3. Updated Model: IntegrationSettings

Changes:

  • ADD default_text_model: ForeignKey to AIModelConfig
  • ADD default_image_model: ForeignKey to AIModelConfig
  • Keep existing: openai_api_key, anthropic_api_key, runware_api_key

Purpose: Account-level model selection

Account "AWS Admin" Settings:
- OpenAI API Key: sk-...
- Default Text Model: GPT-3.5 Turbo (cost-effective)
- Default Image Model: DALL-E 3

Account "Premium Client" Settings:
- OpenAI API Key: sk-...
- Default Text Model: GPT-4 Turbo (best quality)
- Default Image Model: DALL-E 3

4. Updated: CreditUsageLog

Changes:

  • Keep existing: operation_type, credits_used, tokens_input, tokens_output
  • UPDATE model_used: CharField → ForeignKey to AIModelConfig
  • ADD cost_usd_input: DecimalField (actual input cost)
  • ADD cost_usd_output: DecimalField (actual output cost)
  • ADD cost_usd_total: DecimalField (total API cost)

Purpose: Accurate cost tracking and analytics


Implementation Timeline

Phase 1: Foundation (Week 1)

Step 1.1: Create AIModelConfig Model

  • Create model in backend/igny8_core/business/billing/models.py
  • Create admin interface in backend/igny8_core/business/billing/admin.py
  • Create migration
  • Seed initial data (GPT-4, GPT-3.5, Claude, DALL-E models)

Step 1.2: Update CreditCostConfig

  • Add default_model ForeignKey field
  • Update UNIT_CHOICES to include per_100_tokens, per_1000_tokens
  • Create migration
  • Update admin interface to show model selector

Step 1.3: Update IntegrationSettings

  • Add default_text_model ForeignKey
  • Add default_image_model ForeignKey
  • Create migration
  • Update admin interface with model selectors

Phase 2: Credit Calculation (Week 2)

Step 2.1: Update CreditService

  • Add method: calculate_credits_from_tokens(operation_type, tokens_input, tokens_output, model_used)
  • Logic:
    1. Get CreditCostConfig for operation
    2. Get model's tokens_per_credit ratio
    3. Calculate: credits = total_tokens / tokens_per_credit
    4. Apply rounding (up/down/nearest)
    5. Apply minimum credits if configured
    
  • Keep legacy methods for backward compatibility

Step 2.2: Update AIEngine

  • Extract model_used from AI response
  • Pass model to credit calculation
  • Handle model selection priority:
    1. Task-level override (if specified)
    2. Account's default model (from IntegrationSettings)
    3. System default model (fallback)
    

Step 2.3: Update AI Services

  • Update clustering_service.py
  • Update ideas_service.py
  • Update content_service.py
  • Update image_service.py
  • Update optimizer_service.py
  • Update linker_service.py

Phase 3: Logging & Analytics (Week 3)

Step 3.1: Update CreditUsageLog

  • Change model_used from CharField to ForeignKey
  • Add cost fields: cost_usd_input, cost_usd_output, cost_usd_total
  • Create migration with data preservation
  • Update logging logic to capture costs

Step 3.2: Create Analytics Views

  • Token Usage Report (by model, by operation, by account)
  • Cost Analysis Report (actual $ spent vs credits charged)
  • Model Performance Report (tokens/sec, success rate by model)
  • Account Efficiency Report (credit consumption patterns)

Step 3.3: Update Admin Reports

  • Enhance existing reports with model data
  • Add model cost comparison charts
  • Add token consumption trends

Phase 4: Testing & Migration (Week 4)

Step 4.1: Data Migration

  • Backfill existing CreditUsageLog with default models
  • Link existing IntegrationSettings to default models
  • Update existing CreditCostConfig with default models

Step 4.2: Testing

  • Unit tests for credit calculation with different models
  • Integration tests for full AI execution flow
  • Load tests for analytics queries
  • Admin interface testing

Step 4.3: Documentation

  • Update API documentation
  • Create admin user guide
  • Create developer guide
  • Update pricing page

Functional Flow

User Perspective

Scenario 1: Content Generation (Default Model)

1. User clicks "Generate Content" for 5 blog posts
2. System checks account's default model: GPT-3.5 Turbo
3. Content generated using GPT-3.5 Turbo
4. Token usage: 12,500 input + 8,500 output = 21,000 tokens
5. Model ratio: 200 tokens/credit
6. Credits deducted: 21,000 / 200 = 105 credits
7. User sees: "✓ Generated 5 posts (105 credits, GPT-3.5)"

Scenario 2: Content Generation (Premium Model)

1. User selects "Use GPT-4 Turbo" from model dropdown
2. System validates: account has GPT-4 enabled
3. Content generated using GPT-4 Turbo
4. Token usage: 12,500 input + 8,500 output = 21,000 tokens
5. Model ratio: 50 tokens/credit
6. Credits deducted: 21,000 / 50 = 420 credits (4x more expensive!)
7. User sees: "✓ Generated 5 posts (420 credits, GPT-4 Turbo)"
8. System shows warning: "GPT-4 used 4x more credits than GPT-3.5"

Scenario 3: Image Generation

1. User generates 10 images
2. System uses account's default image model: DALL-E 3
3. No token tracking for images (fixed cost per image)
4. Credits: 10 images × 5 credits/image = 50 credits
5. User sees: "✓ Generated 10 images (50 credits, DALL-E 3)"

Backend Operational Context

Credit Calculation Flow

User Request
    ↓
AIEngine.execute()
    ↓
Determine Model:
  - Task.model_override (highest priority)
  - Account.default_text_model (from IntegrationSettings)
  - CreditCostConfig.default_model (fallback)
    ↓
Call AI API (OpenAI, Anthropic, etc.)
    ↓
Response: {
  input_tokens: 2500,
  output_tokens: 1500,
  model: "gpt-4-turbo",
  cost_usd: 0.085
}
    ↓
CreditService.calculate_credits_from_tokens(
  operation_type="content_generation",
  tokens_input=2500,
  tokens_output=1500,
  model_used=gpt-4-turbo
)
    ↓
Logic:
  1. Get CreditCostConfig for "content_generation"
  2. Check unit: per_1000_tokens
  3. Get model's tokens_per_credit: 50
  4. Calculate: (2500 + 1500) / 50 = 80 credits
  5. Apply rounding: ceil(80) = 80 credits
    ↓
CreditService.deduct_credits(
  account=user.account,
  amount=80,
  operation_type="content_generation",
  description="Generated blog post",
  tokens_input=2500,
  tokens_output=1500,
  model_used=gpt-4-turbo,
  cost_usd=0.085
)
    ↓
CreditUsageLog created:
  - operation_type: content_generation
  - credits_used: 80
  - tokens_input: 2500
  - tokens_output: 1500
  - model_used: gpt-4-turbo (FK)
  - cost_usd_input: 0.025
  - cost_usd_output: 0.060
  - cost_usd_total: 0.085
    ↓
Account.credits updated: 1000 → 920

Analytics & Reporting

Token Usage Report:

SELECT 
    model.display_name,
    operation_type,
    COUNT(*) as total_calls,
    SUM(tokens_input) as total_input_tokens,
    SUM(tokens_output) as total_output_tokens,
    SUM(credits_used) as total_credits,
    SUM(cost_usd_total) as total_cost_usd
FROM credit_usage_log
JOIN ai_model_config ON model_used_id = model.id
WHERE account_id = ?
  AND created_at >= ?
GROUP BY model.id, operation_type
ORDER BY total_cost_usd DESC

Output:

Model Operation Calls Input Tokens Output Tokens Credits Cost USD
GPT-4 Turbo content_generation 150 375K 225K 12,000 $9.75
GPT-3.5 Turbo clustering 50 25K 10K 175 $0.06
Claude 3 Sonnet idea_generation 80 40K 60K 1,000 $0.42

Cost Efficiency Analysis:

Account: Premium Client
Period: Last 30 days

Credits Purchased: 50,000 credits × $0.01 = $500.00 (revenue)
Actual AI Costs: $247.83 (OpenAI + Anthropic API costs)
Gross Margin: $252.17 (50.4% margin)

Model Usage:
- GPT-4 Turbo: 65% of costs, 45% of credits
- GPT-3.5 Turbo: 20% of costs, 40% of credits
- Claude 3: 15% of costs, 15% of credits

Recommendation: 
- GPT-3.5 most profitable (high credits, low cost)
- GPT-4 acceptable margin (high value, high cost)

Benefits

For Users

  1. Transparent Pricing: See exact model and token usage per operation
  2. Cost Control: Choose cheaper models when quality difference is minimal
  3. Model Selection: Pick GPT-4 for important content, GPT-3.5 for bulk work
  4. Usage Analytics: Understand token consumption patterns

For Backend Operations

  1. Accurate Cost Tracking: Know exactly how much each account costs
  2. Revenue Optimization: Set credit prices based on actual model costs
  3. Model Performance: Track which models are most efficient
  4. Billing Transparency: Can show users actual API costs vs credits charged

For Business

  1. Margin Visibility: Track profitability per account, per model
  2. Pricing Flexibility: Easily adjust credit costs when AI prices change
  3. Model Migration: Seamlessly switch between providers (OpenAI → Anthropic)
  4. Scalability: Support new models without code changes

Migration Strategy

Backward Compatibility

Phase 1: Dual Mode

  • Keep old credit calculation as fallback
  • New token-based calculation opt-in per operation
  • Both systems run in parallel

Phase 2: Gradual Migration

  • Week 1: Migrate non-critical operations (clustering, ideas)
  • Week 2: Migrate content generation
  • Week 3: Migrate optimization and linking
  • Week 4: Full cutover

Phase 3: Cleanup

  • Remove legacy calculation code
  • Archive old credit cost configs
  • Update all documentation

Data Preservation

  • All existing CreditUsageLog entries preserved
  • Backfill model_used with "legacy-unknown" placeholder model
  • Historical data remains queryable
  • Analytics show "before/after" comparison

Risk Mitigation

Technical Risks

  1. Migration complexity: Use feature flags, gradual rollout
  2. Performance impact: Index all FK relationships, cache model configs
  3. API changes: Handle token extraction failures gracefully

Business Risks

  1. Cost increase: Monitor margin changes, adjust credit pricing if needed
  2. User confusion: Clear UI messaging about model selection
  3. Revenue impact: Set credit prices with 50%+ margin buffer

Success Metrics

Phase 1 (Foundation)

  • AIModelConfig admin accessible
  • 5+ models configured (GPT-4, GPT-3.5, Claude, etc.)
  • All integration settings linked to models

Phase 2 (Calculation)

  • 100% of operations use token-based calculation
  • Credit deductions accurate within 1% margin
  • Model selection working (default, override, fallback)

Phase 3 (Analytics)

  • Token usage report showing accurate data
  • Cost analysis report shows margin per account
  • Model performance metrics visible

Phase 4 (Production)

  • 30+ days production data collected
  • Margin maintained at 45%+ across all accounts
  • Zero billing disputes related to credits
  • User satisfaction: 90%+ understand pricing

Appendix: Code Examples (Conceptual)

Credit Calculation Logic

# Simplified conceptual flow (not actual code)

def calculate_credits_from_tokens(operation_type, tokens_input, tokens_output, model_used):
    """
    Calculate credits based on actual token usage and model cost
    """
    # Get operation config
    config = CreditCostConfig.objects.get(operation_type=operation_type)
    
    # Determine unit type
    if config.unit == 'per_1000_tokens':
        total_tokens = tokens_input + tokens_output
        tokens_per_credit = model_used.tokens_per_credit
        
        # Calculate credits
        credits_float = total_tokens / tokens_per_credit
        
        # Apply rounding (configured globally)
        credits = apply_rounding(credits_float)
        
        # Apply minimum
        credits = max(credits, config.credits_cost)
        
        return credits
    
    elif config.unit == 'per_request':
        # Fixed cost, ignore tokens
        return config.credits_cost
    
    # ... other unit types

Model Selection Priority

# Simplified conceptual flow (not actual code)

def get_model_for_operation(account, operation_type, task_override=None):
    """
    Determine which AI model to use
    Priority: Task Override > Account Default > System Default
    """
    # 1. Task-level override (highest priority)
    if task_override and task_override.model_id:
        return task_override.model
    
    # 2. Account default model
    integration = IntegrationSettings.objects.get(account=account)
    operation_config = CreditCostConfig.objects.get(operation_type=operation_type)
    
    if operation_config.model_type == 'text':
        if integration.default_text_model:
            return integration.default_text_model
    elif operation_config.model_type == 'image':
        if integration.default_image_model:
            return integration.default_image_model
    
    # 3. System default (fallback)
    if operation_config.default_model:
        return operation_config.default_model
    
    # 4. Hard-coded fallback
    return AIModelConfig.objects.get(model_name='gpt-3.5-turbo')

Comparison: Old vs New

Current System (Commit #10)

Operation: content_generation
Cost: 1 credit per 100 words
Usage: Generated 1000-word article
Result: 10 credits deducted

Problem: 
- Doesn't track actual tokens used
- All models cost the same
- No cost transparency

Previous Attempt (Commits 8-9)

Operation: content_generation
Config: 100 tokens per credit
Usage: 2500 input + 1500 output = 4000 tokens
Result: 4000 / 100 = 40 credits deducted

Problem:
- Still no model differentiation
- Over-engineered (too many config options)
- Complex migrations

Proposed System

Operation: content_generation
Model: GPT-4 Turbo (50 tokens/credit)
Usage: 2500 input + 1500 output = 4000 tokens
Cost: $0.085 (actual API cost)
Result: 4000 / 50 = 80 credits deducted

Benefits:
✓ Accurate token tracking
✓ Model-aware pricing
✓ Cost transparency
✓ Margin visibility
✓ User can choose cheaper model

Alternative with GPT-3.5:
Model: GPT-3.5 Turbo (200 tokens/credit)
Same 4000 tokens
Cost: $0.008 (10x cheaper API cost)
Result: 4000 / 200 = 20 credits (4x fewer credits)

Conclusion

This refactor transforms IGNY8's billing system from a simple fixed-cost model to a sophisticated token-based system that:

  1. Tracks actual usage with token-level precision
  2. Differentiates AI models so users pay appropriately
  3. Provides transparency showing exact costs and models used
  4. Enables cost control through model selection
  5. Improves margins through accurate cost tracking

The phased approach ensures backward compatibility while gradually migrating to the new system. By Week 4, IGNY8 will have complete visibility into AI costs, user consumption patterns, and revenue margins—all while giving users more control and transparency.