20 KiB
AI Model & Cost Configuration System - Refactor Plan
Version: 2.0
Date: December 23, 2025
Current State: Commit #10 (98e68f6) - Credit-based system with operation configs
Target: Token-based system with centralized AI model cost configuration
Executive Summary
Current System (Commit #10)
- ✅ CreditCostConfig: Operation-level credit costs (clustering=1 credit, ideas=15 credits)
- ✅ Units: per_request, per_100_words, per_200_words, per_item, per_image
- ❌ No token tracking: Credits are fixed per operation, not based on actual AI usage
- ❌ No model awareness: All models cost the same regardless of GPT-3.5 vs GPT-4
- ❌ No accurate analytics: Cannot track real costs or token consumption
Previous Attempt (Commits 8-9 - Reverted)
- ✅ Token-based calculation:
credits = total_tokens / tokens_per_credit - ✅ BillingConfiguration: Global
default_tokens_per_credit = 100 - ✅ Per-operation token ratios in CreditCostConfig
- ❌ Too complex: Each operation had separate
tokens_per_credit,min_credits,price_per_credit_usd - ❌ Not model-aware: Still didn't account for different AI model costs
Proposed Solution (This Plan)
- Add token-based units:
per_100_tokens,per_1000_tokensto existing unit choices - Create AIModelConfig: Centralized model pricing (GPT-4: $10/1M input, $30/1M output)
- Link everything: Integration settings → Model → Cost calculation → Credit deduction
- Accurate tracking: Real-time token usage, model costs, and credit analytics
Problem Analysis
What Commits 8-9 Tried to Achieve
Goal: Move from fixed-credit-per-operation to dynamic token-based billing
Implementation:
OLD (Commit #10):
- Clustering = 10 credits (always, regardless of token usage)
- Content Generation = 1 credit per 100 words
NEW (Commits 8-9):
- Clustering = X tokens used / 150 tokens_per_credit = Y credits
- Content Generation = X tokens used / 100 tokens_per_credit = Y credits
Why It Failed:
- Complexity overload: Every operation needed its own token ratio configuration
- Duplicate configs:
tokens_per_creditat both global and operation level - No model differentiation: GPT-3.5 turbo (cheap) vs GPT-4 (expensive) cost the same
- Migration issues: Database schema changes broke backward compatibility
Root Cause
Missing piece: No centralized AI model cost configuration. Each operation was configured in isolation without understanding which AI model was being used and its actual cost.
Proposed Architecture
1. New Model: AIModelConfig
Purpose: Single source of truth for AI model pricing
Fields:
- model_name: CharField (e.g., "gpt-4-turbo", "gpt-3.5-turbo", "claude-3-sonnet")
- provider: CharField (openai, anthropic, runware)
- model_type: CharField (text, image)
- cost_per_1k_input_tokens: DecimalField (e.g., $0.01)
- cost_per_1k_output_tokens: DecimalField (e.g., $0.03)
- tokens_per_credit: IntegerField (e.g., 100) - How many tokens = 1 credit
- is_active: BooleanField
- display_name: CharField (e.g., "GPT-4 Turbo (Recommended)")
- description: TextField
- created_at, updated_at
Example Data:
| Model | Provider | Input $/1K | Output $/1K | Tokens/Credit | Display Name |
|---|---|---|---|---|---|
| gpt-4-turbo | openai | $0.010 | $0.030 | 50 | GPT-4 Turbo (Premium) |
| gpt-3.5-turbo | openai | $0.0005 | $0.0015 | 200 | GPT-3.5 Turbo (Fast) |
| claude-3-sonnet | anthropic | $0.003 | $0.015 | 100 | Claude 3 Sonnet |
2. Updated Model: CreditCostConfig
Changes:
- Keep existing fields:
operation_type,credits_cost,unit,display_name,is_active - ADD
default_model: ForeignKey to AIModelConfig (nullable) - UPDATE
unitchoices: Addper_100_tokens,per_1000_tokens
New Unit Choices:
UNIT_CHOICES = [
('per_request', 'Per Request'), # Fixed cost (clustering)
('per_100_words', 'Per 100 Words'), # Word-based (content)
('per_200_words', 'Per 200 Words'), # Word-based (optimization)
('per_item', 'Per Item'), # Item-based (ideas per cluster)
('per_image', 'Per Image'), # Image-based
('per_100_tokens', 'Per 100 Tokens'), # NEW: Token-based
('per_1000_tokens', 'Per 1000 Tokens'), # NEW: Token-based
]
How It Works:
Example 1: Content Generation with GPT-4 Turbo
- Operation: content_generation
- Unit: per_1000_tokens
- Default Model: gpt-4-turbo (50 tokens/credit)
- Actual usage: 2500 input + 1500 output = 4000 total tokens
- Credits = 4000 / 50 = 80 credits
Example 2: Content Generation with GPT-3.5 Turbo (user selected)
- Operation: content_generation
- Unit: per_1000_tokens
- Model used: gpt-3.5-turbo (200 tokens/credit)
- Actual usage: 2500 input + 1500 output = 4000 total tokens
- Credits = 4000 / 200 = 20 credits (4x cheaper!)
3. Updated Model: IntegrationSettings
Changes:
- ADD
default_text_model: ForeignKey to AIModelConfig - ADD
default_image_model: ForeignKey to AIModelConfig - Keep existing:
openai_api_key,anthropic_api_key,runware_api_key
Purpose: Account-level model selection
Account "AWS Admin" Settings:
- OpenAI API Key: sk-...
- Default Text Model: GPT-3.5 Turbo (cost-effective)
- Default Image Model: DALL-E 3
Account "Premium Client" Settings:
- OpenAI API Key: sk-...
- Default Text Model: GPT-4 Turbo (best quality)
- Default Image Model: DALL-E 3
4. Updated: CreditUsageLog
Changes:
- Keep existing:
operation_type,credits_used,tokens_input,tokens_output - UPDATE
model_used: CharField → ForeignKey to AIModelConfig - ADD
cost_usd_input: DecimalField (actual input cost) - ADD
cost_usd_output: DecimalField (actual output cost) - ADD
cost_usd_total: DecimalField (total API cost)
Purpose: Accurate cost tracking and analytics
Implementation Timeline
Phase 1: Foundation (Week 1)
Step 1.1: Create AIModelConfig Model
- Create model in
backend/igny8_core/business/billing/models.py - Create admin interface in
backend/igny8_core/business/billing/admin.py - Create migration
- Seed initial data (GPT-4, GPT-3.5, Claude, DALL-E models)
Step 1.2: Update CreditCostConfig
- Add
default_modelForeignKey field - Update
UNIT_CHOICESto includeper_100_tokens,per_1000_tokens - Create migration
- Update admin interface to show model selector
Step 1.3: Update IntegrationSettings
- Add
default_text_modelForeignKey - Add
default_image_modelForeignKey - Create migration
- Update admin interface with model selectors
Phase 2: Credit Calculation (Week 2)
Step 2.1: Update CreditService
- Add method:
calculate_credits_from_tokens(operation_type, tokens_input, tokens_output, model_used) - Logic:
1. Get CreditCostConfig for operation 2. Get model's tokens_per_credit ratio 3. Calculate: credits = total_tokens / tokens_per_credit 4. Apply rounding (up/down/nearest) 5. Apply minimum credits if configured - Keep legacy methods for backward compatibility
Step 2.2: Update AIEngine
- Extract
model_usedfrom AI response - Pass model to credit calculation
- Handle model selection priority:
1. Task-level override (if specified) 2. Account's default model (from IntegrationSettings) 3. System default model (fallback)
Step 2.3: Update AI Services
- Update clustering_service.py
- Update ideas_service.py
- Update content_service.py
- Update image_service.py
- Update optimizer_service.py
- Update linker_service.py
Phase 3: Logging & Analytics (Week 3)
Step 3.1: Update CreditUsageLog
- Change
model_usedfrom CharField to ForeignKey - Add cost fields:
cost_usd_input,cost_usd_output,cost_usd_total - Create migration with data preservation
- Update logging logic to capture costs
Step 3.2: Create Analytics Views
- Token Usage Report (by model, by operation, by account)
- Cost Analysis Report (actual $ spent vs credits charged)
- Model Performance Report (tokens/sec, success rate by model)
- Account Efficiency Report (credit consumption patterns)
Step 3.3: Update Admin Reports
- Enhance existing reports with model data
- Add model cost comparison charts
- Add token consumption trends
Phase 4: Testing & Migration (Week 4)
Step 4.1: Data Migration
- Backfill existing CreditUsageLog with default models
- Link existing IntegrationSettings to default models
- Update existing CreditCostConfig with default models
Step 4.2: Testing
- Unit tests for credit calculation with different models
- Integration tests for full AI execution flow
- Load tests for analytics queries
- Admin interface testing
Step 4.3: Documentation
- Update API documentation
- Create admin user guide
- Create developer guide
- Update pricing page
Functional Flow
User Perspective
Scenario 1: Content Generation (Default Model)
1. User clicks "Generate Content" for 5 blog posts
2. System checks account's default model: GPT-3.5 Turbo
3. Content generated using GPT-3.5 Turbo
4. Token usage: 12,500 input + 8,500 output = 21,000 tokens
5. Model ratio: 200 tokens/credit
6. Credits deducted: 21,000 / 200 = 105 credits
7. User sees: "✓ Generated 5 posts (105 credits, GPT-3.5)"
Scenario 2: Content Generation (Premium Model)
1. User selects "Use GPT-4 Turbo" from model dropdown
2. System validates: account has GPT-4 enabled
3. Content generated using GPT-4 Turbo
4. Token usage: 12,500 input + 8,500 output = 21,000 tokens
5. Model ratio: 50 tokens/credit
6. Credits deducted: 21,000 / 50 = 420 credits (4x more expensive!)
7. User sees: "✓ Generated 5 posts (420 credits, GPT-4 Turbo)"
8. System shows warning: "GPT-4 used 4x more credits than GPT-3.5"
Scenario 3: Image Generation
1. User generates 10 images
2. System uses account's default image model: DALL-E 3
3. No token tracking for images (fixed cost per image)
4. Credits: 10 images × 5 credits/image = 50 credits
5. User sees: "✓ Generated 10 images (50 credits, DALL-E 3)"
Backend Operational Context
Credit Calculation Flow
User Request
↓
AIEngine.execute()
↓
Determine Model:
- Task.model_override (highest priority)
- Account.default_text_model (from IntegrationSettings)
- CreditCostConfig.default_model (fallback)
↓
Call AI API (OpenAI, Anthropic, etc.)
↓
Response: {
input_tokens: 2500,
output_tokens: 1500,
model: "gpt-4-turbo",
cost_usd: 0.085
}
↓
CreditService.calculate_credits_from_tokens(
operation_type="content_generation",
tokens_input=2500,
tokens_output=1500,
model_used=gpt-4-turbo
)
↓
Logic:
1. Get CreditCostConfig for "content_generation"
2. Check unit: per_1000_tokens
3. Get model's tokens_per_credit: 50
4. Calculate: (2500 + 1500) / 50 = 80 credits
5. Apply rounding: ceil(80) = 80 credits
↓
CreditService.deduct_credits(
account=user.account,
amount=80,
operation_type="content_generation",
description="Generated blog post",
tokens_input=2500,
tokens_output=1500,
model_used=gpt-4-turbo,
cost_usd=0.085
)
↓
CreditUsageLog created:
- operation_type: content_generation
- credits_used: 80
- tokens_input: 2500
- tokens_output: 1500
- model_used: gpt-4-turbo (FK)
- cost_usd_input: 0.025
- cost_usd_output: 0.060
- cost_usd_total: 0.085
↓
Account.credits updated: 1000 → 920
Analytics & Reporting
Token Usage Report:
SELECT
model.display_name,
operation_type,
COUNT(*) as total_calls,
SUM(tokens_input) as total_input_tokens,
SUM(tokens_output) as total_output_tokens,
SUM(credits_used) as total_credits,
SUM(cost_usd_total) as total_cost_usd
FROM credit_usage_log
JOIN ai_model_config ON model_used_id = model.id
WHERE account_id = ?
AND created_at >= ?
GROUP BY model.id, operation_type
ORDER BY total_cost_usd DESC
Output:
| Model | Operation | Calls | Input Tokens | Output Tokens | Credits | Cost USD |
|---|---|---|---|---|---|---|
| GPT-4 Turbo | content_generation | 150 | 375K | 225K | 12,000 | $9.75 |
| GPT-3.5 Turbo | clustering | 50 | 25K | 10K | 175 | $0.06 |
| Claude 3 Sonnet | idea_generation | 80 | 40K | 60K | 1,000 | $0.42 |
Cost Efficiency Analysis:
Account: Premium Client
Period: Last 30 days
Credits Purchased: 50,000 credits × $0.01 = $500.00 (revenue)
Actual AI Costs: $247.83 (OpenAI + Anthropic API costs)
Gross Margin: $252.17 (50.4% margin)
Model Usage:
- GPT-4 Turbo: 65% of costs, 45% of credits
- GPT-3.5 Turbo: 20% of costs, 40% of credits
- Claude 3: 15% of costs, 15% of credits
Recommendation:
- GPT-3.5 most profitable (high credits, low cost)
- GPT-4 acceptable margin (high value, high cost)
Benefits
For Users
- Transparent Pricing: See exact model and token usage per operation
- Cost Control: Choose cheaper models when quality difference is minimal
- Model Selection: Pick GPT-4 for important content, GPT-3.5 for bulk work
- Usage Analytics: Understand token consumption patterns
For Backend Operations
- Accurate Cost Tracking: Know exactly how much each account costs
- Revenue Optimization: Set credit prices based on actual model costs
- Model Performance: Track which models are most efficient
- Billing Transparency: Can show users actual API costs vs credits charged
For Business
- Margin Visibility: Track profitability per account, per model
- Pricing Flexibility: Easily adjust credit costs when AI prices change
- Model Migration: Seamlessly switch between providers (OpenAI → Anthropic)
- Scalability: Support new models without code changes
Migration Strategy
Backward Compatibility
Phase 1: Dual Mode
- Keep old credit calculation as fallback
- New token-based calculation opt-in per operation
- Both systems run in parallel
Phase 2: Gradual Migration
- Week 1: Migrate non-critical operations (clustering, ideas)
- Week 2: Migrate content generation
- Week 3: Migrate optimization and linking
- Week 4: Full cutover
Phase 3: Cleanup
- Remove legacy calculation code
- Archive old credit cost configs
- Update all documentation
Data Preservation
- All existing CreditUsageLog entries preserved
- Backfill
model_usedwith "legacy-unknown" placeholder model - Historical data remains queryable
- Analytics show "before/after" comparison
Risk Mitigation
Technical Risks
- Migration complexity: Use feature flags, gradual rollout
- Performance impact: Index all FK relationships, cache model configs
- API changes: Handle token extraction failures gracefully
Business Risks
- Cost increase: Monitor margin changes, adjust credit pricing if needed
- User confusion: Clear UI messaging about model selection
- Revenue impact: Set credit prices with 50%+ margin buffer
Success Metrics
Phase 1 (Foundation)
- ✅ AIModelConfig admin accessible
- ✅ 5+ models configured (GPT-4, GPT-3.5, Claude, etc.)
- ✅ All integration settings linked to models
Phase 2 (Calculation)
- ✅ 100% of operations use token-based calculation
- ✅ Credit deductions accurate within 1% margin
- ✅ Model selection working (default, override, fallback)
Phase 3 (Analytics)
- ✅ Token usage report showing accurate data
- ✅ Cost analysis report shows margin per account
- ✅ Model performance metrics visible
Phase 4 (Production)
- ✅ 30+ days production data collected
- ✅ Margin maintained at 45%+ across all accounts
- ✅ Zero billing disputes related to credits
- ✅ User satisfaction: 90%+ understand pricing
Appendix: Code Examples (Conceptual)
Credit Calculation Logic
# Simplified conceptual flow (not actual code)
def calculate_credits_from_tokens(operation_type, tokens_input, tokens_output, model_used):
"""
Calculate credits based on actual token usage and model cost
"""
# Get operation config
config = CreditCostConfig.objects.get(operation_type=operation_type)
# Determine unit type
if config.unit == 'per_1000_tokens':
total_tokens = tokens_input + tokens_output
tokens_per_credit = model_used.tokens_per_credit
# Calculate credits
credits_float = total_tokens / tokens_per_credit
# Apply rounding (configured globally)
credits = apply_rounding(credits_float)
# Apply minimum
credits = max(credits, config.credits_cost)
return credits
elif config.unit == 'per_request':
# Fixed cost, ignore tokens
return config.credits_cost
# ... other unit types
Model Selection Priority
# Simplified conceptual flow (not actual code)
def get_model_for_operation(account, operation_type, task_override=None):
"""
Determine which AI model to use
Priority: Task Override > Account Default > System Default
"""
# 1. Task-level override (highest priority)
if task_override and task_override.model_id:
return task_override.model
# 2. Account default model
integration = IntegrationSettings.objects.get(account=account)
operation_config = CreditCostConfig.objects.get(operation_type=operation_type)
if operation_config.model_type == 'text':
if integration.default_text_model:
return integration.default_text_model
elif operation_config.model_type == 'image':
if integration.default_image_model:
return integration.default_image_model
# 3. System default (fallback)
if operation_config.default_model:
return operation_config.default_model
# 4. Hard-coded fallback
return AIModelConfig.objects.get(model_name='gpt-3.5-turbo')
Comparison: Old vs New
Current System (Commit #10)
Operation: content_generation
Cost: 1 credit per 100 words
Usage: Generated 1000-word article
Result: 10 credits deducted
Problem:
- Doesn't track actual tokens used
- All models cost the same
- No cost transparency
Previous Attempt (Commits 8-9)
Operation: content_generation
Config: 100 tokens per credit
Usage: 2500 input + 1500 output = 4000 tokens
Result: 4000 / 100 = 40 credits deducted
Problem:
- Still no model differentiation
- Over-engineered (too many config options)
- Complex migrations
Proposed System
Operation: content_generation
Model: GPT-4 Turbo (50 tokens/credit)
Usage: 2500 input + 1500 output = 4000 tokens
Cost: $0.085 (actual API cost)
Result: 4000 / 50 = 80 credits deducted
Benefits:
✓ Accurate token tracking
✓ Model-aware pricing
✓ Cost transparency
✓ Margin visibility
✓ User can choose cheaper model
Alternative with GPT-3.5:
Model: GPT-3.5 Turbo (200 tokens/credit)
Same 4000 tokens
Cost: $0.008 (10x cheaper API cost)
Result: 4000 / 200 = 20 credits (4x fewer credits)
Conclusion
This refactor transforms IGNY8's billing system from a simple fixed-cost model to a sophisticated token-based system that:
- Tracks actual usage with token-level precision
- Differentiates AI models so users pay appropriately
- Provides transparency showing exact costs and models used
- Enables cost control through model selection
- Improves margins through accurate cost tracking
The phased approach ensures backward compatibility while gradually migrating to the new system. By Week 4, IGNY8 will have complete visibility into AI costs, user consumption patterns, and revenue margins—all while giving users more control and transparency.