2026 lines
58 KiB
Markdown
2026 lines
58 KiB
Markdown
# IGNY8 Phase 0: Self-Hosted AI Infrastructure (00F)
|
|
|
|
**Status:** Ready for Implementation
|
|
**Version:** 1.1
|
|
**Priority:** High (cost savings critical for unit economics)
|
|
**Duration:** 5-7 days
|
|
**Dependencies:** 00B (VPS provisioning) must be complete first
|
|
**Source of Truth:** Codebase at `/data/app/igny8/`
|
|
**Cost:** ~$200/month GPU rental + $0 software (open source)
|
|
|
|
---
|
|
|
|
## 1. Current State
|
|
|
|
### Existing AI Integration
|
|
- **External providers (verified from `IntegrationProvider` model):** OpenAI (GPT-4, GPT-3.5), Anthropic (Claude), Runware (image gen)
|
|
- **Storage:** API keys stored in `IntegrationProvider` model (table: `igny8_integration_providers`) with per-account overrides in `IntegrationSettings` (table: `igny8_integration_settings`). Global defaults in `GlobalIntegrationSettings`.
|
|
- **Provider types in codebase:** `ai`, `payment`, `email`, `storage` (from `PROVIDER_TYPE_CHOICES`)
|
|
- **Existing provider_ids:** `openai`, `runware`, `stripe`, `paypal`, `resend`
|
|
- **Architecture:** Multi-provider AI engine with model selection capability
|
|
- **Current AI functions:** `auto_cluster`, `generate_ideas`, `generate_content`, `generate_images`, `generate_image_prompts`, `optimize_content`, `generate_site_structure`
|
|
- **Async handling:** Celery workers process long-running AI tasks
|
|
- **Cost impact:** External APIs constitute 15-30% of monthly operational costs
|
|
|
|
### Problem
|
|
- External API costs scale linearly with subscriber growth
|
|
- No cost leverage at scale (pay-as-you-go model)
|
|
- API rate limits require careful orchestration
|
|
- Privacy concerns with offloading content generation
|
|
|
|
---
|
|
|
|
## 2. What to Build
|
|
|
|
### Infrastructure Stack
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ IGNY8 Backend (on VPS) │
|
|
│ - Send requests to LiteLLM proxy (local localhost:8000) │
|
|
│ - Fallback to OpenAI/Anthropic if self-hosted unavailable │
|
|
└──────────────┬──────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────────────────────────────────────────────────┐
|
|
│ LiteLLM Proxy (on VPS, port 8000) │
|
|
│ - OpenAI-compatible API gateway │
|
|
│ - Routes requests to local Ollama and ComfyUI (via tunnel) │
|
|
│ - Load balancing & model selection │
|
|
│ - Fallback configuration for external APIs │
|
|
└──────────────┬───────────────────────────────────────────────┘
|
|
│
|
|
┌────────┴──────────────────┐
|
|
│ │
|
|
▼ ▼
|
|
┌──────────────────┐ ┌──────────────────────┐
|
|
│ SSH Tunnel │ │ ComfyUI Tunnel │
|
|
│ (autossh) │ │ (autossh) │
|
|
│ Port 11434-11435 │ │ Port 8188 │
|
|
│ │ │ (image generation) │
|
|
└────────┬─────────┘ └──────────┬───────────┘
|
|
│ │
|
|
▼ ▼
|
|
┌────────────────────────────────────────────────────────┐
|
|
│ Vast.ai GPU Server (2x RTX 3090, 48GB VRAM) │
|
|
│ ┌──────────────────────────────────────────────────┐ │
|
|
│ │ Ollama Container │ │
|
|
│ │ - Qwen3-32B (reasoning) │ │
|
|
│ │ - Qwen3-30B-A3B (multimodal) │ │
|
|
│ │ - Qwen3-14B (general purpose) │ │
|
|
│ │ - Qwen3-8B (fast inference) │ │
|
|
│ │ Listening on 0.0.0.0:11434 │ │
|
|
│ └──────────────────────────────────────────────────┘ │
|
|
│ ┌──────────────────────────────────────────────────┐ │
|
|
│ │ ComfyUI Container │ │
|
|
│ │ - FLUX.1 (image gen) │ │
|
|
│ │ - Stable Diffusion 3.5 (image gen) │ │
|
|
│ │ - SDXL-Lightning (fast generation) │ │
|
|
│ │ Listening on 0.0.0.0:8188 │ │
|
|
│ └──────────────────────────────────────────────────┘ │
|
|
└────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Components to Deploy
|
|
|
|
1. **Vast.ai GPU Rental**
|
|
- Machine: 2x NVIDIA RTX 3090 (48GB total VRAM)
|
|
- Estimated cost: $180-220/month
|
|
- Auto-bid setup for cost optimization
|
|
- Persistence: Restore from snapshot between rentals
|
|
|
|
2. **Ollama (Text LLM Server)**
|
|
- Container-based deployment on GPU
|
|
- Models: Qwen3 series (32B, 30B-A3B, 14B, 8B)
|
|
- API: OpenAI-compatible `/v1/chat/completions`
|
|
- Port: 11434 (tunneled via SSH)
|
|
|
|
3. **ComfyUI (Image Generation)**
|
|
- Container-based deployment on GPU
|
|
- Models: FLUX.1, Stable Diffusion 3.5, SDXL-Lightning
|
|
- API: REST endpoints for image generation
|
|
- Port: 8188 (tunneled via SSH)
|
|
|
|
4. **SSH Tunnel (autossh)**
|
|
- Persistent connection from VPS to GPU server
|
|
- Systemd service with auto-restart
|
|
- Ports: 11434/11435 (Ollama), 8188 (ComfyUI)
|
|
- Handles network interruptions automatically
|
|
|
|
5. **LiteLLM Proxy**
|
|
- Runs on IGNY8 VPS
|
|
- Acts as OpenAI-compatible API gateway
|
|
- Configurable routing based on model/task type
|
|
- Fallback to OpenAI/Anthropic if self-hosted unavailable
|
|
- Port: 8000 (local access only)
|
|
|
|
6. **IGNY8 Backend Integration**
|
|
- Add self-hosted LiteLLM as new `IntegrationProvider`
|
|
- Update AI request logic to check availability
|
|
- Implement fallback chain: self-hosted → OpenAI → Anthropic
|
|
- Cost tracking per provider
|
|
|
|
---
|
|
|
|
## 3. Data Models / APIs
|
|
|
|
### Database Models (Minimal Schema Changes)
|
|
|
|
Use existing `IntegrationProvider` model — add a new row with `provider_type='ai'`:
|
|
|
|
```python
|
|
# New IntegrationProvider row (NO new provider_type needed)
|
|
# provider_type='ai' already exists in PROVIDER_TYPE_CHOICES
|
|
|
|
# Create via admin or migration:
|
|
IntegrationProvider.objects.create(
|
|
provider_id='self_hosted_ai',
|
|
display_name='Self-Hosted AI (LiteLLM)',
|
|
provider_type='ai',
|
|
api_key='', # LiteLLM doesn't require auth (internal)
|
|
api_endpoint='http://localhost:8000',
|
|
is_active=True,
|
|
is_sandbox=False,
|
|
config={
|
|
"priority": 10, # Try self-hosted first
|
|
"models": {
|
|
"text_generation": "qwen3:32b",
|
|
"text_generation_fast": "qwen3:8b",
|
|
"image_generation": "flux.1-dev",
|
|
"image_generation_fast": "sdxl-lightning"
|
|
},
|
|
"timeout": 300, # 5 minute timeout for slow models
|
|
"fallback_to": "openai" # Fallback provider if self-hosted fails
|
|
}
|
|
)
|
|
```
|
|
|
|
### LiteLLM API Endpoints
|
|
|
|
**Text Generation (Compatible with OpenAI API)**
|
|
|
|
```bash
|
|
# Request
|
|
curl -X POST http://localhost:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "ollama/qwen3:32b",
|
|
"messages": [{"role": "user", "content": "Write an article about..."}],
|
|
"temperature": 0.7,
|
|
"max_tokens": 2000
|
|
}'
|
|
|
|
# Response (identical to OpenAI)
|
|
{
|
|
"id": "chatcmpl-...",
|
|
"object": "chat.completion",
|
|
"created": 1234567890,
|
|
"model": "ollama/qwen3:32b",
|
|
"choices": [{
|
|
"index": 0,
|
|
"message": {"role": "assistant", "content": "Article text..."},
|
|
"finish_reason": "stop"
|
|
}],
|
|
"usage": {
|
|
"prompt_tokens": 50,
|
|
"completion_tokens": 500,
|
|
"total_tokens": 550
|
|
}
|
|
}
|
|
```
|
|
|
|
**Image Generation (ComfyUI via LiteLLM)**
|
|
|
|
```bash
|
|
# Request
|
|
curl -X POST http://localhost:8000/v1/images/generations \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "comfyui/flux.1-dev",
|
|
"prompt": "A professional product photo of...",
|
|
"size": "1024x1024",
|
|
"n": 1,
|
|
"quality": "hd"
|
|
}'
|
|
|
|
# Response
|
|
{
|
|
"created": 1234567890,
|
|
"data": [{
|
|
"url": "data:image/png;base64,...",
|
|
"revised_prompt": "A professional product photo of..."
|
|
}]
|
|
}
|
|
```
|
|
|
|
### Model Routing Configuration
|
|
|
|
**LiteLLM Config (see section 4.2)**
|
|
|
|
- Routes `gpt-4` requests → `ollama/qwen3:32b`
|
|
- Routes `gpt-3.5-turbo` requests → `ollama/qwen3:8b`
|
|
- Routes DALL-E requests → `comfyui/flux.1-dev`
|
|
- Includes fallback to OpenAI for unavailable models
|
|
- Respects timeout and retry limits
|
|
|
|
---
|
|
|
|
## 4. Implementation Steps
|
|
|
|
### Phase 1: GPU Infrastructure Setup (Days 1-2)
|
|
|
|
#### 4.1 Vast.ai Account & GPU Rental
|
|
|
|
**Step 1: Create Vast.ai Account**
|
|
```bash
|
|
# Navigate to https://www.vast.ai
|
|
# Sign up with email
|
|
# Verify account via email
|
|
# Add payment method (credit card or crypto)
|
|
```
|
|
|
|
**Step 2: Rent GPU Instance**
|
|
|
|
Requirements:
|
|
- 2x NVIDIA RTX 3090 (or 1x RTX 4090) = 48GB+ VRAM
|
|
- Ubuntu 24.04 LTS base image (preferred) or later
|
|
- Minimum bandwidth: 100 Mbps
|
|
- SSH port 22 open
|
|
|
|
Setup via Vast.ai dashboard:
|
|
1. Go to "Browse" → Filter by:
|
|
- GPU: 2x RTX 3090 or RTX 4090
|
|
- Min VRAM: 48GB
|
|
- OS: Ubuntu 24.04 LTS (or later)
|
|
- Price: Sort by lowest $/hr
|
|
2. Click "Rent" on selected instance
|
|
3. Choose:
|
|
- Disk size: 500GB (includes models)
|
|
- Secure Cloud: No (to access port 22)
|
|
4. Wait for machine to start (2-5 minutes)
|
|
5. Record SSH credentials from dashboard
|
|
|
|
**Step 3: Test SSH Access**
|
|
```bash
|
|
# From your local machine
|
|
ssh root@<vast_ai_ip> -i ~/.ssh/vast_key
|
|
# Update system
|
|
apt update && apt upgrade -y
|
|
```
|
|
|
|
**Step 4: Set Up Snapshot for Persistence**
|
|
```bash
|
|
# After first-time setup, create snapshot in Vast.ai dashboard
|
|
# Future rentals: select snapshot to restore previous state
|
|
```
|
|
|
|
---
|
|
|
|
#### 4.2 Vast.ai: Docker & Base Containers
|
|
|
|
**Step 1: Install Docker**
|
|
```bash
|
|
# SSH into Vast.ai machine
|
|
ssh root@<vast_ai_ip>
|
|
|
|
# Install Docker
|
|
curl https://get.docker.com -sSfL | sh
|
|
systemctl enable docker
|
|
systemctl start docker
|
|
|
|
# Verify
|
|
docker --version
|
|
```
|
|
|
|
**Step 2: Set Up Storage**
|
|
```bash
|
|
# Create persistent directory for models
|
|
mkdir -p /mnt/models
|
|
mkdir -p /mnt/ollama-cache
|
|
mkdir -p /mnt/comfyui-models
|
|
chmod 777 /mnt/*
|
|
|
|
# Create docker network for inter-container communication
|
|
docker network create ai-network
|
|
```
|
|
|
|
**Step 3: Deploy Ollama Container**
|
|
```bash
|
|
docker run -d \
|
|
--name ollama \
|
|
--network ai-network \
|
|
--gpus all \
|
|
-e OLLAMA_MODELS=/mnt/ollama-cache \
|
|
-v /mnt/ollama-cache:/root/.ollama \
|
|
-p 0.0.0.0:11434:11434 \
|
|
ollama/ollama:latest
|
|
```
|
|
|
|
**Step 4: Pull Qwen3 Models**
|
|
```bash
|
|
# Wait for ollama to be ready
|
|
sleep 10
|
|
|
|
# Pull models (will take 30-60 minutes depending on speed)
|
|
# Order by priority (largest first)
|
|
docker exec ollama ollama pull qwen3:32b # ~20GB
|
|
docker exec ollama ollama pull qwen3:30b-a3b # ~18GB
|
|
docker exec ollama ollama pull qwen3:14b # ~9GB
|
|
docker exec ollama ollama pull qwen3:8b # ~5GB
|
|
|
|
# Verify models are loaded
|
|
docker exec ollama ollama list
|
|
# Output should show all models with their sizes
|
|
```
|
|
|
|
**Step 5: Deploy ComfyUI Container**
|
|
```bash
|
|
# Clone ComfyUI repository
|
|
cd /opt
|
|
git clone https://github.com/comfyanonymous/ComfyUI.git
|
|
cd ComfyUI
|
|
|
|
# Use Docker image with CUDA support
|
|
docker run -d \
|
|
--name comfyui \
|
|
--network ai-network \
|
|
--gpus all \
|
|
-e CUDA_VISIBLE_DEVICES=0,1 \
|
|
-v /mnt/comfyui-models:/ComfyUI/models \
|
|
-v /mnt/comfyui-output:/ComfyUI/output \
|
|
-p 0.0.0.0:8188:8188 \
|
|
comfyui-docker:latest
|
|
|
|
# Alternative: Run from source
|
|
docker run -d \
|
|
--name comfyui \
|
|
--network ai-network \
|
|
--gpus all \
|
|
-v /opt/ComfyUI:/ComfyUI \
|
|
-v /mnt/comfyui-models:/ComfyUI/models \
|
|
-v /mnt/comfyui-output:/ComfyUI/output \
|
|
-p 0.0.0.0:8188:8188 \
|
|
-w /ComfyUI \
|
|
nvidia/cuda:11.8.0-runtime-ubuntu22.04 \
|
|
bash -c "pip install -r requirements.txt && python -m http.server 8188"
|
|
```
|
|
|
|
**Step 6: Download Image Generation Models**
|
|
```bash
|
|
# Download models to ComfyUI
|
|
# FLUX.1 (recommended for quality)
|
|
cd /mnt/comfyui-models/checkpoints
|
|
wget -O flux1-dev-Q8.safetensors \
|
|
"https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev-Q8_0.safetensors"
|
|
|
|
# Stable Diffusion 3.5 (alternative)
|
|
wget -O sd3.5-large.safetensors \
|
|
"https://huggingface.co/stabilityai/stable-diffusion-3.5-large/resolve/main/sd_xl_base_1.0.safetensors"
|
|
|
|
# SDXL-Lightning (fast, lower quality but acceptable)
|
|
wget -O sdxl-lightning.safetensors \
|
|
"https://huggingface.co/latent-consistency/lcm-sdxl/resolve/main/pytorch_lora_weights.safetensors"
|
|
|
|
# VAE (for all models)
|
|
cd /mnt/comfyui-models/vae
|
|
wget -O ae.safetensors \
|
|
"https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/ae.safetensors"
|
|
```
|
|
|
|
**Step 7: Verify Services**
|
|
```bash
|
|
# Check Ollama API
|
|
curl http://localhost:11434/api/tags
|
|
# Should return: {"models": [{"name": "qwen3:32b", "size": ...}, ...]}
|
|
|
|
# Check ComfyUI
|
|
curl http://localhost:8188/system_stats
|
|
# Should return GPU/memory stats
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 2: VPS Tunnel & LiteLLM Setup (Days 2-3)
|
|
|
|
#### 4.3 IGNY8 VPS: SSH Tunnel Configuration
|
|
|
|
**Prerequisites:** VPS must be provisioned (see 00B)
|
|
|
|
**VPS Environment:**
|
|
- Ubuntu 24.04 LTS
|
|
- Docker 29.x
|
|
- GPU server is deployed separately on Vast.ai (not on the VPS)
|
|
- The VPS maintains SSH tunnels to the Vast.ai GPU server for accessing Ollama and ComfyUI
|
|
|
|
**DNS Note:** During initial setup, before DNS flip, the IGNY8 backend connects to the LiteLLM proxy via `localhost:8000` within the VPS environment. This uses the internal Docker network and local port forwarding, so external DNS configuration does not affect this connection. DNS considerations only apply to external client connections to the IGNY8 API.
|
|
|
|
**Step 1: Generate SSH Key Pair**
|
|
```bash
|
|
# On VPS
|
|
ssh-keygen -t rsa -b 4096 -f /root/.ssh/vast_ai -N ""
|
|
# On local machine, copy public key to Vast.ai machine
|
|
ssh-copy-id -i /root/.ssh/vast_ai.pub root@<vast_ai_ip>
|
|
```
|
|
|
|
**Step 2: Install & Configure autossh**
|
|
```bash
|
|
# On VPS
|
|
apt install autossh -y
|
|
|
|
# Create dedicated user for tunnel
|
|
useradd -m -s /bin/bash tunnel-user
|
|
mkdir -p /home/tunnel-user/.ssh
|
|
cp /root/.ssh/vast_ai* /home/tunnel-user/.ssh/
|
|
chown -R tunnel-user:tunnel-user /home/tunnel-user/.ssh
|
|
chmod 600 /home/tunnel-user/.ssh/vast_ai
|
|
```
|
|
|
|
**Step 3: Create autossh Systemd Service**
|
|
|
|
File: `/etc/systemd/system/tunnel-vast-ai.service`
|
|
|
|
```ini
|
|
[Unit]
|
|
Description=SSH Tunnel to Vast.ai GPU Server
|
|
After=network.target
|
|
Wants=network-online.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=tunnel-user
|
|
ExecStart=/usr/bin/autossh \
|
|
-M 20000 \
|
|
-N \
|
|
-o "ServerAliveInterval=30" \
|
|
-o "ServerAliveCountMax=3" \
|
|
-o "ExitOnForwardFailure=no" \
|
|
-o "ConnectTimeout=10" \
|
|
-o "StrictHostKeyChecking=accept-new" \
|
|
-i /home/tunnel-user/.ssh/vast_ai \
|
|
-L 11434:localhost:11434 \
|
|
-L 11435:localhost:11435 \
|
|
-L 8188:localhost:8188 \
|
|
root@<vast_ai_ip>
|
|
|
|
Restart=always
|
|
RestartSec=10
|
|
StandardOutput=journal
|
|
StandardError=journal
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
**Step 4: Start Tunnel Service**
|
|
```bash
|
|
# Reload systemd
|
|
systemctl daemon-reload
|
|
|
|
# Start service
|
|
systemctl start tunnel-vast-ai
|
|
|
|
# Enable on boot
|
|
systemctl enable tunnel-vast-ai
|
|
|
|
# Verify tunnel is up
|
|
systemctl status tunnel-vast-ai
|
|
|
|
# Check logs
|
|
journalctl -u tunnel-vast-ai -f
|
|
```
|
|
|
|
**Step 5: Test Tunnel Connectivity**
|
|
```bash
|
|
# On VPS, verify ports are open
|
|
netstat -tlnp | grep -E '(11434|8188)'
|
|
# Should show: 127.0.0.1:11434 LISTEN
|
|
# 127.0.0.1:8188 LISTEN
|
|
|
|
# Test Ollama through tunnel
|
|
curl http://localhost:11434/api/tags
|
|
# Should return model list from remote Vast.ai machine
|
|
|
|
# Test ComfyUI through tunnel
|
|
curl http://localhost:8188/system_stats
|
|
# Should return GPU stats
|
|
```
|
|
|
|
---
|
|
|
|
#### 4.4 LiteLLM Installation & Configuration
|
|
|
|
**Step 1: Install LiteLLM**
|
|
```bash
|
|
# On VPS
|
|
pip install litellm fastapi uvicorn python-dotenv requests
|
|
|
|
# Verify installation
|
|
python -c "import litellm; print(litellm.__version__)"
|
|
```
|
|
|
|
**Step 2: Create LiteLLM Configuration**
|
|
|
|
File: `/opt/litellm/config.yaml`
|
|
|
|
```yaml
|
|
# LiteLLM Configuration for IGNY8
|
|
model_list:
|
|
# Text Generation Models (Ollama via SSH tunnel)
|
|
- model_name: gpt-4
|
|
litellm_params:
|
|
model: ollama/qwen3:32b
|
|
api_base: http://localhost:11434
|
|
timeout: 300
|
|
max_tokens: 8000
|
|
|
|
- model_name: gpt-4-turbo
|
|
litellm_params:
|
|
model: ollama/qwen3:30b-a3b
|
|
api_base: http://localhost:11434
|
|
timeout: 300
|
|
max_tokens: 8000
|
|
|
|
- model_name: gpt-3.5-turbo
|
|
litellm_params:
|
|
model: ollama/qwen3:14b
|
|
api_base: http://localhost:11434
|
|
timeout: 180
|
|
max_tokens: 4000
|
|
|
|
- model_name: gpt-3.5-turbo-fast
|
|
litellm_params:
|
|
model: ollama/qwen3:8b
|
|
api_base: http://localhost:11434
|
|
timeout: 120
|
|
max_tokens: 2048
|
|
|
|
# Fallback to OpenAI for redundancy
|
|
- model_name: gpt-4-fallback
|
|
litellm_params:
|
|
model: gpt-4
|
|
api_key: ${OPENAI_API_KEY}
|
|
timeout: 60
|
|
|
|
- model_name: gpt-3.5-turbo-fallback
|
|
litellm_params:
|
|
model: gpt-3.5-turbo
|
|
api_key: ${OPENAI_API_KEY}
|
|
timeout: 60
|
|
|
|
# Image Generation (ComfyUI via tunnel)
|
|
- model_name: dall-e-3
|
|
litellm_params:
|
|
model: comfyui/flux.1-dev
|
|
api_base: http://localhost:8188
|
|
timeout: 120
|
|
|
|
- model_name: dall-e-2
|
|
litellm_params:
|
|
model: comfyui/sdxl-lightning
|
|
api_base: http://localhost:8188
|
|
timeout: 60
|
|
|
|
# Router configuration for model selection
|
|
router_settings:
|
|
routing_strategy: "simple-shuffle" # Load balancing
|
|
allowed_model_region: null
|
|
|
|
# Logging configuration
|
|
litellm_settings:
|
|
verbose: true
|
|
log_level: "INFO"
|
|
cache_responses: true
|
|
cache_params: ["model", "messages", "temperature"]
|
|
|
|
# Fallback behavior
|
|
fallback_models:
|
|
- model_name: gpt-4
|
|
fallback_to: ["gpt-4-fallback", "gpt-3.5-turbo"]
|
|
- model_name: gpt-3.5-turbo
|
|
fallback_to: ["gpt-3.5-turbo-fallback"]
|
|
- model_name: dall-e-3
|
|
fallback_to: ["dall-e-2"]
|
|
```
|
|
|
|
**Step 3: Create Environment File**
|
|
|
|
File: `/opt/litellm/.env`
|
|
|
|
```bash
|
|
# OpenAI (for fallback)
|
|
OPENAI_API_KEY=sk-your-key-here
|
|
|
|
# Anthropic (optional fallback)
|
|
ANTHROPIC_API_KEY=sk-ant-your-key-here
|
|
|
|
# LiteLLM settings
|
|
LITELLM_LOG_LEVEL=INFO
|
|
LITELLM_CACHE=true
|
|
|
|
# Service settings
|
|
PORT=8000
|
|
HOST=127.0.0.1
|
|
```
|
|
|
|
**Step 4: Create LiteLLM Startup Script**
|
|
|
|
File: `/opt/litellm/start.sh`
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
set -e
|
|
|
|
cd /opt/litellm
|
|
|
|
# Load environment variables
|
|
source .env
|
|
|
|
# Start LiteLLM server
|
|
python -m litellm.server \
|
|
--config config.yaml \
|
|
--host 127.0.0.1 \
|
|
--port 8000 \
|
|
--num_workers 4 \
|
|
--worker_timeout 600
|
|
```
|
|
|
|
```bash
|
|
chmod +x /opt/litellm/start.sh
|
|
```
|
|
|
|
**Step 5: Create Systemd Service for LiteLLM**
|
|
|
|
File: `/etc/systemd/system/litellm.service`
|
|
|
|
```ini
|
|
[Unit]
|
|
Description=LiteLLM AI Proxy Gateway
|
|
After=network.target tunnel-vast-ai.service
|
|
Wants=tunnel-vast-ai.service
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=root
|
|
WorkingDirectory=/opt/litellm
|
|
ExecStart=/opt/litellm/start.sh
|
|
Restart=always
|
|
RestartSec=10
|
|
StandardOutput=journal
|
|
StandardError=journal
|
|
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/python/bin"
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
**Step 6: Start LiteLLM Service**
|
|
```bash
|
|
systemctl daemon-reload
|
|
systemctl start litellm
|
|
systemctl enable litellm
|
|
|
|
# Verify service
|
|
systemctl status litellm
|
|
|
|
# Check logs
|
|
journalctl -u litellm -f
|
|
```
|
|
|
|
**Step 7: Test LiteLLM API**
|
|
```bash
|
|
# Test text generation with self-hosted model
|
|
curl -X POST http://localhost:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer test-key" \
|
|
-d '{
|
|
"model": "gpt-4",
|
|
"messages": [
|
|
{"role": "user", "content": "Say hello in one sentence"}
|
|
],
|
|
"temperature": 0.7,
|
|
"max_tokens": 100
|
|
}'
|
|
|
|
# Should return a response from Qwen3:32B
|
|
|
|
# Test fallback (disconnect tunnel first to test fallback logic)
|
|
# curl should eventually fall back to OpenAI after timeout
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 3: IGNY8 Backend Integration (Days 3-4)
|
|
|
|
#### 4.5 Add Self-Hosted Provider to IGNY8
|
|
|
|
**Step 1: Update GlobalIntegrationSettings Model**
|
|
|
|
File: `backend/models/integration.py`
|
|
|
|
```python
|
|
# Add to IntegrationProvider enum
|
|
class IntegrationProvider(models.TextChoices):
|
|
OPENAI = "openai", "OpenAI"
|
|
ANTHROPIC = "anthropic", "Anthropic"
|
|
RUNWARE = "runware", "Runware"
|
|
BRIA = "bria", "Bria"
|
|
SELF_HOSTED = "self_hosted_ai", "Self-Hosted AI (LiteLLM)" # NEW
|
|
|
|
# Example settings structure
|
|
SELF_HOSTED_SETTINGS = {
|
|
"provider": "self_hosted_ai",
|
|
"name": "Self-Hosted AI (LiteLLM)",
|
|
"base_url": "http://localhost:8000",
|
|
"api_key": "not_required",
|
|
"enabled": True,
|
|
"priority": 10, # Try first
|
|
"models": {
|
|
"text_generation": "gpt-4", # Maps to qwen3:32b
|
|
"text_generation_fast": "gpt-3.5-turbo", # Maps to qwen3:8b
|
|
"image_generation": "dall-e-3", # Maps to flux.1-dev
|
|
"image_generation_fast": "dall-e-2" # Maps to sdxl-lightning
|
|
},
|
|
"timeout": 300,
|
|
"fallback_to": "openai"
|
|
}
|
|
```
|
|
|
|
**Step 2: Add Self-Hosted Settings to Database**
|
|
|
|
File: `backend/management/commands/init_integrations.py`
|
|
|
|
```python
|
|
from backend.models.integration import GlobalIntegrationSettings, IntegrationProvider
|
|
|
|
def add_self_hosted_integration():
|
|
"""Initialize self-hosted AI integration"""
|
|
self_hosted_config = {
|
|
"provider": IntegrationProvider.SELF_HOSTED,
|
|
"name": "Self-Hosted AI (LiteLLM)",
|
|
"base_url": "http://localhost:8000",
|
|
"api_key": "", # Not required for local proxy
|
|
"enabled": True,
|
|
"priority": 10, # Higher priority = try first
|
|
"models": {
|
|
"text_generation": "gpt-4",
|
|
"text_generation_fast": "gpt-3.5-turbo",
|
|
"image_generation": "dall-e-3",
|
|
"image_generation_fast": "dall-e-2"
|
|
},
|
|
"timeout": 300,
|
|
"max_retries": 2,
|
|
"fallback_provider": IntegrationProvider.OPENAI
|
|
}
|
|
|
|
integration, created = GlobalIntegrationSettings.objects.update_or_create(
|
|
provider=IntegrationProvider.SELF_HOSTED,
|
|
defaults=self_hosted_config
|
|
)
|
|
|
|
if created:
|
|
print(f"✓ Created {IntegrationProvider.SELF_HOSTED} integration")
|
|
else:
|
|
print(f"✓ Updated {IntegrationProvider.SELF_HOSTED} integration")
|
|
|
|
# Run in management command initialization
|
|
```
|
|
|
|
**Step 3: Update AI Request Router**
|
|
|
|
File: `backend/services/ai_engine.py`
|
|
|
|
```python
|
|
import requests
|
|
import logging
|
|
from typing import Optional, List, Dict, Any
|
|
from backend.models.integration import GlobalIntegrationSettings, IntegrationProvider
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
class AIEngineRouter:
|
|
"""Routes AI requests to appropriate provider with fallback chain"""
|
|
|
|
PROVIDER_PRIORITY = {
|
|
IntegrationProvider.SELF_HOSTED: 10, # Try first
|
|
IntegrationProvider.OPENAI: 5,
|
|
IntegrationProvider.ANTHROPIC: 4,
|
|
}
|
|
|
|
def __init__(self):
|
|
self.providers = self._load_providers()
|
|
|
|
def _load_providers(self) -> List[Dict[str, Any]]:
|
|
"""Load enabled providers from database"""
|
|
configs = GlobalIntegrationSettings.objects.filter(
|
|
enabled=True
|
|
).values()
|
|
|
|
# Sort by priority (highest first)
|
|
sorted_configs = sorted(
|
|
configs,
|
|
key=lambda x: self.PROVIDER_PRIORITY.get(x['provider'], 0),
|
|
reverse=True
|
|
)
|
|
|
|
return sorted_configs
|
|
|
|
def generate_text(
|
|
self,
|
|
prompt: str,
|
|
model: str = "gpt-4",
|
|
max_tokens: int = 2000,
|
|
temperature: float = 0.7,
|
|
timeout: Optional[int] = None
|
|
) -> Dict[str, Any]:
|
|
"""Generate text using available provider with fallback"""
|
|
|
|
for provider_config in self.providers:
|
|
try:
|
|
result = self._call_provider(
|
|
provider_config,
|
|
"text",
|
|
prompt=prompt,
|
|
model=model,
|
|
max_tokens=max_tokens,
|
|
temperature=temperature,
|
|
timeout=timeout or provider_config.get('timeout', 300)
|
|
)
|
|
return {
|
|
"success": True,
|
|
"provider": provider_config['provider'],
|
|
"text": result['content'],
|
|
"usage": result.get('usage', {}),
|
|
"model": result.get('model', model)
|
|
}
|
|
except Exception as e:
|
|
logger.warning(
|
|
f"Provider {provider_config['provider']} failed: {str(e)}"
|
|
)
|
|
continue
|
|
|
|
# All providers failed
|
|
raise Exception("All AI providers exhausted. No response available.")
|
|
|
|
def generate_image(
|
|
self,
|
|
prompt: str,
|
|
model: str = "dall-e-3",
|
|
size: str = "1024x1024",
|
|
quality: str = "hd",
|
|
timeout: Optional[int] = None
|
|
) -> Dict[str, Any]:
|
|
"""Generate image using available provider with fallback"""
|
|
|
|
for provider_config in self.providers:
|
|
try:
|
|
result = self._call_provider(
|
|
provider_config,
|
|
"image",
|
|
prompt=prompt,
|
|
model=model,
|
|
size=size,
|
|
quality=quality,
|
|
timeout=timeout or provider_config.get('timeout', 120)
|
|
)
|
|
return {
|
|
"success": True,
|
|
"provider": provider_config['provider'],
|
|
"image_url": result['url'],
|
|
"revised_prompt": result.get('revised_prompt', prompt),
|
|
"model": result.get('model', model)
|
|
}
|
|
except Exception as e:
|
|
logger.warning(
|
|
f"Provider {provider_config['provider']} failed: {str(e)}"
|
|
)
|
|
continue
|
|
|
|
# All providers failed
|
|
raise Exception("All image generation providers exhausted.")
|
|
|
|
def _call_provider(
|
|
self,
|
|
provider_config: Dict[str, Any],
|
|
task_type: str, # "text" or "image"
|
|
**kwargs
|
|
) -> Dict[str, Any]:
|
|
"""Call specific provider based on type"""
|
|
|
|
provider = provider_config['provider']
|
|
|
|
if provider == IntegrationProvider.SELF_HOSTED:
|
|
return self._call_litellm(provider_config, task_type, **kwargs)
|
|
elif provider == IntegrationProvider.OPENAI:
|
|
return self._call_openai(provider_config, task_type, **kwargs)
|
|
elif provider == IntegrationProvider.ANTHROPIC:
|
|
return self._call_anthropic(provider_config, task_type, **kwargs)
|
|
else:
|
|
raise ValueError(f"Unknown provider: {provider}")
|
|
|
|
def _call_litellm(
|
|
self,
|
|
provider_config: Dict[str, Any],
|
|
task_type: str,
|
|
**kwargs
|
|
) -> Dict[str, Any]:
|
|
"""Call LiteLLM proxy on localhost"""
|
|
|
|
base_url = provider_config['base_url']
|
|
timeout = kwargs.pop('timeout', 300)
|
|
|
|
if task_type == "text":
|
|
# Chat completion endpoint
|
|
endpoint = f"{base_url}/v1/chat/completions"
|
|
payload = {
|
|
"model": kwargs.get('model', 'gpt-4'),
|
|
"messages": [
|
|
{"role": "user", "content": kwargs['prompt']}
|
|
],
|
|
"temperature": kwargs.get('temperature', 0.7),
|
|
"max_tokens": kwargs.get('max_tokens', 2000)
|
|
}
|
|
elif task_type == "image":
|
|
# Image generation endpoint
|
|
endpoint = f"{base_url}/v1/images/generations"
|
|
payload = {
|
|
"model": kwargs.get('model', 'dall-e-3'),
|
|
"prompt": kwargs['prompt'],
|
|
"size": kwargs.get('size', '1024x1024'),
|
|
"n": 1,
|
|
"quality": kwargs.get('quality', 'hd')
|
|
}
|
|
else:
|
|
raise ValueError(f"Unknown task type: {task_type}")
|
|
|
|
try:
|
|
response = requests.post(
|
|
endpoint,
|
|
json=payload,
|
|
timeout=timeout,
|
|
headers={"Authorization": "Bearer test"}
|
|
)
|
|
response.raise_for_status()
|
|
|
|
data = response.json()
|
|
|
|
if task_type == "text":
|
|
return {
|
|
"content": data['choices'][0]['message']['content'],
|
|
"usage": data.get('usage', {}),
|
|
"model": data.get('model', kwargs.get('model'))
|
|
}
|
|
else: # image
|
|
return {
|
|
"url": data['data'][0]['url'],
|
|
"revised_prompt": data['data'][0].get('revised_prompt'),
|
|
"model": kwargs.get('model')
|
|
}
|
|
|
|
except requests.exceptions.Timeout:
|
|
logger.error(f"LiteLLM timeout after {timeout}s")
|
|
raise
|
|
except requests.exceptions.ConnectionError:
|
|
logger.error("Cannot connect to LiteLLM proxy - tunnel may be down")
|
|
raise
|
|
except Exception as e:
|
|
logger.error(f"LiteLLM request failed: {str(e)}")
|
|
raise
|
|
|
|
def _call_openai(self, provider_config, task_type, **kwargs):
|
|
"""Existing OpenAI implementation"""
|
|
# Use existing OpenAI integration code
|
|
pass
|
|
|
|
def _call_anthropic(self, provider_config, task_type, **kwargs):
|
|
"""Existing Anthropic implementation"""
|
|
# Use existing Anthropic integration code
|
|
pass
|
|
|
|
|
|
# Initialize global instance
|
|
ai_router = AIEngineRouter()
|
|
```
|
|
|
|
**Step 4: Update Content Generation Celery Tasks**
|
|
|
|
File: `backend/tasks/content_generation.py`
|
|
|
|
```python
|
|
from celery import shared_task
|
|
from backend.services.ai_engine import ai_router
|
|
import logging
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
@shared_task
|
|
def generate_article_content(user_id: int, article_id: int):
|
|
"""Generate article content using AI router (tries self-hosted first)"""
|
|
try:
|
|
# Get article from database
|
|
article = Article.objects.get(id=article_id, user_id=user_id)
|
|
|
|
# Generate content
|
|
result = ai_router.generate_text(
|
|
prompt=f"Write a detailed article about: {article.topic}",
|
|
model="gpt-4",
|
|
max_tokens=3000,
|
|
temperature=0.7
|
|
)
|
|
|
|
# Save result
|
|
article.content = result['text']
|
|
article.ai_provider = result['provider']
|
|
article.save()
|
|
|
|
logger.info(
|
|
f"Generated article {article_id} using {result['provider']}"
|
|
)
|
|
|
|
return {
|
|
"success": True,
|
|
"article_id": article_id,
|
|
"provider": result['provider']
|
|
}
|
|
|
|
except Exception as e:
|
|
logger.error(f"Article generation failed: {str(e)}")
|
|
raise
|
|
|
|
@shared_task
|
|
def generate_product_images(user_id: int, product_id: int):
|
|
"""Generate product images using AI router"""
|
|
try:
|
|
product = Product.objects.get(id=product_id, user_id=user_id)
|
|
|
|
# Try to generate with self-hosted first (faster)
|
|
result = ai_router.generate_image(
|
|
prompt=f"Professional product photo of: {product.description}",
|
|
model="dall-e-3",
|
|
size="1024x1024",
|
|
quality="hd"
|
|
)
|
|
|
|
product.image_url = result['image_url']
|
|
product.ai_provider = result['provider']
|
|
product.save()
|
|
|
|
logger.info(f"Generated image for product {product_id} using {result['provider']}")
|
|
|
|
return {
|
|
"success": True,
|
|
"product_id": product_id,
|
|
"provider": result['provider'],
|
|
"image_url": result['image_url']
|
|
}
|
|
|
|
except Exception as e:
|
|
logger.error(f"Image generation failed: {str(e)}")
|
|
raise
|
|
```
|
|
|
|
**Step 5: Add AI Provider Tracking**
|
|
|
|
File: `backend/models/content.py`
|
|
|
|
```python
|
|
from django.db import models
|
|
from backend.models.integration import IntegrationProvider
|
|
|
|
class Article(models.Model):
|
|
# ... existing fields ...
|
|
|
|
# Track which AI provider generated content
|
|
ai_provider = models.CharField(
|
|
max_length=50,
|
|
choices=IntegrationProvider.choices,
|
|
default=IntegrationProvider.OPENAI,
|
|
help_text="Which AI provider generated this content"
|
|
)
|
|
ai_cost = models.DecimalField(
|
|
max_digits=10,
|
|
decimal_places=6,
|
|
default=0,
|
|
help_text="Cost to generate via AI provider"
|
|
)
|
|
ai_generation_time = models.DurationField(
|
|
null=True,
|
|
blank=True,
|
|
help_text="Time taken to generate content"
|
|
)
|
|
|
|
class Product(models.Model):
|
|
# ... existing fields ...
|
|
|
|
ai_provider = models.CharField(
|
|
max_length=50,
|
|
choices=IntegrationProvider.choices,
|
|
default=IntegrationProvider.OPENAI,
|
|
help_text="Which AI provider generated the image"
|
|
)
|
|
ai_image_cost = models.DecimalField(
|
|
max_digits=10,
|
|
decimal_places=6,
|
|
default=0,
|
|
help_text="Cost to generate image"
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 4: Monitoring & Fallback (Days 4-5)
|
|
|
|
#### 4.6 Health Check & Failover System
|
|
|
|
**Step 1: Create Health Check Service**
|
|
|
|
File: `backend/services/ai_health_check.py`
|
|
|
|
```python
|
|
import requests
|
|
import time
|
|
import logging
|
|
from typing import Dict, Any, Tuple
|
|
from datetime import datetime, timedelta
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
class AIHealthMonitor:
|
|
"""Monitor health of self-hosted AI infrastructure"""
|
|
|
|
OLLAMA_ENDPOINT = "http://localhost:11434/api/tags"
|
|
COMFYUI_ENDPOINT = "http://localhost:8188/system_stats"
|
|
LITELLM_ENDPOINT = "http://localhost:8000/health"
|
|
|
|
HEALTH_CHECK_INTERVAL = 60 # seconds
|
|
FAILURE_THRESHOLD = 3 # Mark unhealthy after 3 failures
|
|
|
|
def __init__(self):
|
|
self.last_check = None
|
|
self.failure_count = {
|
|
'ollama': 0,
|
|
'comfyui': 0,
|
|
'litellm': 0
|
|
}
|
|
self.is_healthy = {
|
|
'ollama': True,
|
|
'comfyui': True,
|
|
'litellm': True
|
|
}
|
|
|
|
def check_all(self) -> Dict[str, Any]:
|
|
"""Run all health checks"""
|
|
|
|
results = {
|
|
'timestamp': datetime.now().isoformat(),
|
|
'overall_healthy': True,
|
|
'services': {}
|
|
}
|
|
|
|
# Check Ollama
|
|
ollama_healthy = self._check_ollama()
|
|
results['services']['ollama'] = {
|
|
'healthy': ollama_healthy,
|
|
'endpoint': self.OLLAMA_ENDPOINT
|
|
}
|
|
if not ollama_healthy:
|
|
results['overall_healthy'] = False
|
|
|
|
# Check ComfyUI
|
|
comfyui_healthy = self._check_comfyui()
|
|
results['services']['comfyui'] = {
|
|
'healthy': comfyui_healthy,
|
|
'endpoint': self.COMFYUI_ENDPOINT
|
|
}
|
|
if not comfyui_healthy:
|
|
results['overall_healthy'] = False
|
|
|
|
# Check LiteLLM
|
|
litellm_healthy = self._check_litellm()
|
|
results['services']['litellm'] = {
|
|
'healthy': litellm_healthy,
|
|
'endpoint': self.LITELLM_ENDPOINT
|
|
}
|
|
if not litellm_healthy:
|
|
results['overall_healthy'] = False
|
|
|
|
self.last_check = results
|
|
|
|
# Log status change if needed
|
|
if self.is_healthy['ollama'] != ollama_healthy:
|
|
level = logging.WARNING if not ollama_healthy else logging.INFO
|
|
logger.log(level, f"Ollama service {'down' if not ollama_healthy else 'recovered'}")
|
|
|
|
if self.is_healthy['comfyui'] != comfyui_healthy:
|
|
level = logging.WARNING if not comfyui_healthy else logging.INFO
|
|
logger.log(level, f"ComfyUI service {'down' if not comfyui_healthy else 'recovered'}")
|
|
|
|
if self.is_healthy['litellm'] != litellm_healthy:
|
|
level = logging.WARNING if not litellm_healthy else logging.INFO
|
|
logger.log(level, f"LiteLLM service {'down' if not litellm_healthy else 'recovered'}")
|
|
|
|
# Update internal state
|
|
self.is_healthy['ollama'] = ollama_healthy
|
|
self.is_healthy['comfyui'] = comfyui_healthy
|
|
self.is_healthy['litellm'] = litellm_healthy
|
|
|
|
return results
|
|
|
|
def _check_ollama(self) -> bool:
|
|
"""Check if Ollama is responding"""
|
|
try:
|
|
response = requests.get(self.OLLAMA_ENDPOINT, timeout=5)
|
|
if response.status_code == 200:
|
|
self.failure_count['ollama'] = 0
|
|
return True
|
|
except Exception as e:
|
|
logger.debug(f"Ollama health check failed: {str(e)}")
|
|
|
|
self.failure_count['ollama'] += 1
|
|
return self.failure_count['ollama'] < self.FAILURE_THRESHOLD
|
|
|
|
def _check_comfyui(self) -> bool:
|
|
"""Check if ComfyUI is responding"""
|
|
try:
|
|
response = requests.get(self.COMFYUI_ENDPOINT, timeout=5)
|
|
if response.status_code == 200:
|
|
self.failure_count['comfyui'] = 0
|
|
return True
|
|
except Exception as e:
|
|
logger.debug(f"ComfyUI health check failed: {str(e)}")
|
|
|
|
self.failure_count['comfyui'] += 1
|
|
return self.failure_count['comfyui'] < self.FAILURE_THRESHOLD
|
|
|
|
def _check_litellm(self) -> bool:
|
|
"""Check if LiteLLM is responding"""
|
|
try:
|
|
response = requests.get(self.LITELLM_ENDPOINT, timeout=5)
|
|
if response.status_code == 200:
|
|
self.failure_count['litellm'] = 0
|
|
return True
|
|
except Exception as e:
|
|
logger.debug(f"LiteLLM health check failed: {str(e)}")
|
|
|
|
self.failure_count['litellm'] += 1
|
|
return self.failure_count['litellm'] < self.FAILURE_THRESHOLD
|
|
|
|
def is_self_hosted_available(self) -> bool:
|
|
"""Check if self-hosted AI is fully available"""
|
|
return all([
|
|
self.is_healthy['ollama'],
|
|
self.is_healthy['comfyui'],
|
|
self.is_healthy['litellm']
|
|
])
|
|
|
|
|
|
# Create global instance
|
|
health_monitor = AIHealthMonitor()
|
|
```
|
|
|
|
**Step 2: Create Health Check Celery Task**
|
|
|
|
File: `backend/tasks/health_checks.py`
|
|
|
|
```python
|
|
from celery import shared_task
|
|
from celery.schedules import schedule
|
|
from backend.services.ai_health_check import health_monitor
|
|
from backend.models.monitoring import ServiceHealthLog
|
|
import logging
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
@shared_task
|
|
def check_ai_health():
|
|
"""Run AI infrastructure health checks every minute"""
|
|
|
|
results = health_monitor.check_all()
|
|
|
|
# Log to database
|
|
ServiceHealthLog.objects.create(
|
|
service='self_hosted_ai',
|
|
is_healthy=results['overall_healthy'],
|
|
details=results
|
|
)
|
|
|
|
# Alert if services are down
|
|
if not results['overall_healthy']:
|
|
down_services = [
|
|
service for service, status in results['services'].items()
|
|
if not status['healthy']
|
|
]
|
|
|
|
logger.error(
|
|
f"AI services down: {', '.join(down_services)}. "
|
|
f"Falling back to external APIs."
|
|
)
|
|
|
|
return results
|
|
|
|
|
|
# Add to celery beat schedule
|
|
CELERY_BEAT_SCHEDULE = {
|
|
'check-ai-health': {
|
|
'task': 'backend.tasks.health_checks.check_ai_health',
|
|
'schedule': 60.0, # Every 60 seconds
|
|
},
|
|
}
|
|
```
|
|
|
|
**Step 3: Create Monitoring Model**
|
|
|
|
File: `backend/models/monitoring.py`
|
|
|
|
```python
|
|
from django.db import models
|
|
from django.utils import timezone
|
|
|
|
class ServiceHealthLog(models.Model):
|
|
"""Log of service health checks"""
|
|
|
|
SERVICE_CHOICES = [
|
|
('self_hosted_ai', 'Self-Hosted AI'),
|
|
('tunnel', 'SSH Tunnel'),
|
|
('litellm', 'LiteLLM Proxy'),
|
|
]
|
|
|
|
service = models.CharField(max_length=50, choices=SERVICE_CHOICES)
|
|
is_healthy = models.BooleanField()
|
|
details = models.JSONField(default=dict)
|
|
checked_at = models.DateTimeField(auto_now_add=True)
|
|
|
|
class Meta:
|
|
ordering = ['-checked_at']
|
|
indexes = [
|
|
models.Index(fields=['-checked_at']),
|
|
models.Index(fields=['service', '-checked_at']),
|
|
]
|
|
|
|
def __str__(self):
|
|
status = "✓ Healthy" if self.is_healthy else "✗ Down"
|
|
return f"{self.service} {status} @ {self.checked_at}"
|
|
|
|
|
|
class AIUsageLog(models.Model):
|
|
"""Track AI provider usage and costs"""
|
|
|
|
PROVIDER_CHOICES = [
|
|
('self_hosted_ai', 'Self-Hosted AI'),
|
|
('openai', 'OpenAI'),
|
|
('anthropic', 'Anthropic'),
|
|
]
|
|
|
|
TASK_TYPE_CHOICES = [
|
|
('text_generation', 'Text Generation'),
|
|
('image_generation', 'Image Generation'),
|
|
('keyword_research', 'Keyword Research'),
|
|
]
|
|
|
|
user = models.ForeignKey('User', on_delete=models.CASCADE)
|
|
provider = models.CharField(max_length=50, choices=PROVIDER_CHOICES)
|
|
task_type = models.CharField(max_length=50, choices=TASK_TYPE_CHOICES)
|
|
model_used = models.CharField(max_length=100)
|
|
|
|
input_tokens = models.IntegerField(default=0)
|
|
output_tokens = models.IntegerField(default=0)
|
|
|
|
cost = models.DecimalField(max_digits=10, decimal_places=6, default=0)
|
|
duration_ms = models.IntegerField() # Milliseconds
|
|
|
|
success = models.BooleanField(default=True)
|
|
error_message = models.TextField(blank=True)
|
|
|
|
created_at = models.DateTimeField(auto_now_add=True)
|
|
|
|
class Meta:
|
|
ordering = ['-created_at']
|
|
indexes = [
|
|
models.Index(fields=['user', '-created_at']),
|
|
models.Index(fields=['provider', '-created_at']),
|
|
]
|
|
|
|
def __str__(self):
|
|
return f"{self.provider} - {self.task_type} - ${self.cost:.4f}"
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 5: Cost Tracking & Optimization (Days 5-6)
|
|
|
|
#### 4.7 Cost Calculation & Dashboard
|
|
|
|
**Step 1: Create Cost Calculator**
|
|
|
|
File: `backend/services/cost_calculator.py`
|
|
|
|
```python
|
|
from decimal import Decimal
|
|
from typing import Dict, Any
|
|
|
|
class AICostCalculator:
|
|
"""Calculate AI generation costs by provider"""
|
|
|
|
# Self-hosted cost (Vast.ai GPU rental amortized)
|
|
# $200/month ÷ 30 days ÷ 24 hours = $0.278/hour
|
|
# Assuming 70% utilization = $0.1945/hour
|
|
SELF_HOSTED_COST_PER_HOUR = Decimal('0.20') # Conservative estimate
|
|
|
|
# OpenAI pricing (as of 2026)
|
|
OPENAI_PRICING = {
|
|
'gpt-4': {
|
|
'input': Decimal('0.00003'), # per token
|
|
'output': Decimal('0.00006'),
|
|
},
|
|
'gpt-3.5-turbo': {
|
|
'input': Decimal('0.0005'),
|
|
'output': Decimal('0.0015'),
|
|
},
|
|
'dall-e-3': Decimal('0.04'), # per image
|
|
}
|
|
|
|
# Anthropic pricing
|
|
ANTHROPIC_PRICING = {
|
|
'claude-3-opus': {
|
|
'input': Decimal('0.000015'),
|
|
'output': Decimal('0.000075'),
|
|
},
|
|
'claude-3-sonnet': {
|
|
'input': Decimal('0.000003'),
|
|
'output': Decimal('0.000015'),
|
|
},
|
|
}
|
|
|
|
@classmethod
|
|
def calculate_text_generation_cost(
|
|
cls,
|
|
provider: str,
|
|
model: str,
|
|
input_tokens: int,
|
|
output_tokens: int,
|
|
duration_ms: int = 0
|
|
) -> Decimal:
|
|
"""Calculate cost for text generation"""
|
|
|
|
if provider == 'self_hosted_ai':
|
|
# Cost based on compute time (rough estimate)
|
|
duration_hours = duration_ms / (1000 * 3600)
|
|
return cls.SELF_HOSTED_COST_PER_HOUR * Decimal(duration_hours)
|
|
|
|
elif provider == 'openai':
|
|
pricing = cls.OPENAI_PRICING.get(model, {})
|
|
input_cost = Decimal(input_tokens) * pricing.get('input', Decimal(0))
|
|
output_cost = Decimal(output_tokens) * pricing.get('output', Decimal(0))
|
|
return input_cost + output_cost
|
|
|
|
elif provider == 'anthropic':
|
|
pricing = cls.ANTHROPIC_PRICING.get(model, {})
|
|
input_cost = Decimal(input_tokens) * pricing.get('input', Decimal(0))
|
|
output_cost = Decimal(output_tokens) * pricing.get('output', Decimal(0))
|
|
return input_cost + output_cost
|
|
|
|
return Decimal(0)
|
|
|
|
@classmethod
|
|
def calculate_image_generation_cost(
|
|
cls,
|
|
provider: str,
|
|
model: str,
|
|
duration_ms: int = 0
|
|
) -> Decimal:
|
|
"""Calculate cost for image generation"""
|
|
|
|
if provider == 'self_hosted_ai':
|
|
# Cost based on compute time
|
|
duration_hours = duration_ms / (1000 * 3600)
|
|
return cls.SELF_HOSTED_COST_PER_HOUR * Decimal(duration_hours)
|
|
|
|
elif provider == 'openai':
|
|
if 'dall-e' in model:
|
|
return cls.OPENAI_PRICING.get('dall-e-3', Decimal('0.04'))
|
|
|
|
return Decimal(0)
|
|
|
|
@classmethod
|
|
def monthly_cost_analysis(cls) -> Dict[str, Any]:
|
|
"""Analyze projected monthly costs"""
|
|
|
|
from backend.models.monitoring import AIUsageLog
|
|
from django.utils import timezone
|
|
from datetime import timedelta
|
|
|
|
# Get last 30 days of usage
|
|
thirty_days_ago = timezone.now() - timedelta(days=30)
|
|
usage_logs = AIUsageLog.objects.filter(
|
|
created_at__gte=thirty_days_ago
|
|
)
|
|
|
|
cost_by_provider = {}
|
|
total_cost = Decimal(0)
|
|
|
|
for log in usage_logs:
|
|
if log.provider not in cost_by_provider:
|
|
cost_by_provider[log.provider] = {
|
|
'count': 0,
|
|
'total_cost': Decimal(0),
|
|
'saved_vs_openai': Decimal(0)
|
|
}
|
|
|
|
cost_by_provider[log.provider]['count'] += 1
|
|
cost_by_provider[log.provider]['total_cost'] += log.cost
|
|
total_cost += log.cost
|
|
|
|
# Calculate savings
|
|
self_hosted_usage = usage_logs.filter(provider='self_hosted_ai')
|
|
openai_equivalent_cost = Decimal(0)
|
|
|
|
for log in self_hosted_usage:
|
|
# Calculate what OpenAI would have charged
|
|
openai_cost = cls.calculate_text_generation_cost(
|
|
'openai',
|
|
'gpt-4',
|
|
log.input_tokens,
|
|
log.output_tokens
|
|
) if log.task_type == 'text_generation' else cls.calculate_image_generation_cost(
|
|
'openai',
|
|
'dall-e-3'
|
|
)
|
|
openai_equivalent_cost += openai_cost
|
|
|
|
return {
|
|
'cost_by_provider': cost_by_provider,
|
|
'total_cost': total_cost,
|
|
'savings_vs_openai': openai_equivalent_cost - cost_by_provider.get('self_hosted_ai', {}).get('total_cost', Decimal(0)),
|
|
'roi_vs_gpu_cost': openai_equivalent_cost - Decimal(200), # $200 = 1 month GPU
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Acceptance Criteria
|
|
|
|
### Infrastructure Ready
|
|
- [ ] Vast.ai GPU instance rented and running (2x RTX 3090 or better)
|
|
- [ ] SSH access confirmed from IGNY8 VPS
|
|
- [ ] Ollama container running with all Qwen3 models downloaded
|
|
- [ ] ComfyUI container running with FLUX.1 and Stable Diffusion 3.5 models
|
|
- [ ] Models tested via direct API calls (curl tests all pass)
|
|
|
|
### Network Tunnel Operational
|
|
- [ ] autossh service running on IGNY8 VPS
|
|
- [ ] SSH tunnel persists through network interruptions
|
|
- [ ] Ports 11434, 11435, 8188 accessible on localhost from VPS
|
|
- [ ] Tunnel auto-reconnects within 60 seconds of disconnect
|
|
- [ ] Systemd service enables on boot
|
|
|
|
### LiteLLM Proxy Functional
|
|
- [ ] LiteLLM service running on VPS port 8000
|
|
- [ ] OpenAI-compatible API endpoints working
|
|
- [ ] Text generation requests route to Ollama
|
|
- [ ] Image generation requests route to ComfyUI
|
|
- [ ] Fallback to OpenAI works when self-hosted unavailable
|
|
- [ ] Config includes all model variants
|
|
- [ ] Timeout values appropriate for each model
|
|
|
|
### IGNY8 Backend Integration Complete
|
|
- [ ] Self-hosted provider added to GlobalIntegrationSettings
|
|
- [ ] AIEngineRouter tries self-hosted before external APIs
|
|
- [ ] Celery tasks log which provider was used
|
|
- [ ] Content includes ai_provider tracking field
|
|
- [ ] Fallback chain works (self-hosted → OpenAI → Anthropic)
|
|
- [ ] Unit tests pass for all provider calls
|
|
|
|
### Health Check System Operational
|
|
- [ ] Health check task runs every 60 seconds
|
|
- [ ] ServiceHealthLog table populated
|
|
- [ ] Alerts generated when services down
|
|
- [ ] System continues working with degraded services
|
|
- [ ] Dashboard shows service status
|
|
|
|
### Cost Tracking Implemented
|
|
- [ ] AIUsageLog records all AI requests
|
|
- [ ] Cost calculation accurate per provider
|
|
- [ ] Monthly cost analysis working
|
|
- [ ] Cost comparison shows self-hosted savings
|
|
- [ ] Dashboard displays cost breakdown
|
|
|
|
### Documentation & Runbooks
|
|
- [ ] This build document complete and accurate
|
|
- [ ] Troubleshooting guide for common issues
|
|
- [ ] Runbook for GPU rental renewal
|
|
- [ ] Cost monitoring dashboard updated
|
|
- [ ] Team trained on fallback procedures
|
|
|
|
---
|
|
|
|
## 6. Claude Code Instructions
|
|
|
|
### Prerequisites
|
|
```bash
|
|
# Ensure VPS provisioned (see 00B)
|
|
# Have Vast.ai account created
|
|
# Have IGNY8 codebase cloned locally
|
|
```
|
|
|
|
### Build Execution
|
|
|
|
**Step 1: GPU Infrastructure (Operator)**
|
|
```bash
|
|
# Manual: Set up Vast.ai account, rent GPU, note IP
|
|
# This requires manual interaction with Vast.ai dashboard
|
|
# Once IP obtained, proceed to step 2
|
|
```
|
|
|
|
**Step 2: Vast.ai Setup (Automated)**
|
|
```bash
|
|
# Run on Vast.ai GPU server
|
|
VAST_AI_IP="<your-gpu-ip>"
|
|
|
|
ssh -i ~/.ssh/vast_key root@$VAST_AI_IP << 'EOF'
|
|
|
|
# Update system
|
|
apt update && apt upgrade -y
|
|
|
|
# Install Docker
|
|
curl https://get.docker.com -sSfL | sh
|
|
systemctl enable docker && systemctl start docker
|
|
|
|
# Create storage directories
|
|
mkdir -p /mnt/{models,ollama-cache,comfyui-models,comfyui-output}
|
|
chmod 777 /mnt/*
|
|
|
|
# Create docker network
|
|
docker network create ai-network
|
|
|
|
# Deploy Ollama
|
|
docker run -d \
|
|
--name ollama \
|
|
--network ai-network \
|
|
--gpus all \
|
|
-e OLLAMA_MODELS=/mnt/ollama-cache \
|
|
-v /mnt/ollama-cache:/root/.ollama \
|
|
-p 0.0.0.0:11434:11434 \
|
|
ollama/ollama:latest
|
|
|
|
sleep 30
|
|
|
|
# Pull models (takes 1-2 hours)
|
|
docker exec ollama ollama pull qwen3:32b
|
|
docker exec ollama ollama pull qwen3:30b-a3b
|
|
docker exec ollama ollama pull qwen3:14b
|
|
docker exec ollama ollama pull qwen3:8b
|
|
|
|
# Deploy ComfyUI
|
|
docker run -d \
|
|
--name comfyui \
|
|
--network ai-network \
|
|
--gpus all \
|
|
-v /mnt/comfyui-models:/ComfyUI/models \
|
|
-v /mnt/comfyui-output:/ComfyUI/output \
|
|
-p 0.0.0.0:8188:8188 \
|
|
comfyui-docker:latest
|
|
|
|
# Download image models
|
|
mkdir -p /mnt/comfyui-models/checkpoints
|
|
cd /mnt/comfyui-models/checkpoints
|
|
wget https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev-Q8_0.safetensors -O flux1-dev.safetensors
|
|
wget https://huggingface.co/stabilityai/stable-diffusion-3.5-large/resolve/main/sd_xl_base_1.0.safetensors -O sd3.5-large.safetensors
|
|
|
|
echo "✓ Vast.ai setup complete"
|
|
EOF
|
|
```
|
|
|
|
**Step 3: VPS Tunnel Setup (Automated)**
|
|
```bash
|
|
# Run on IGNY8 VPS
|
|
VAST_AI_IP="<your-gpu-ip>"
|
|
|
|
# Install autossh
|
|
apt install autossh -y
|
|
|
|
# Create tunnel user
|
|
useradd -m -s /bin/bash tunnel-user
|
|
mkdir -p /home/tunnel-user/.ssh
|
|
|
|
# Copy SSH key (paste private key content)
|
|
cat > /home/tunnel-user/.ssh/vast_ai << 'KEY'
|
|
-----BEGIN RSA PRIVATE KEY-----
|
|
<paste-private-key-here>
|
|
-----END RSA PRIVATE KEY-----
|
|
KEY
|
|
|
|
chmod 600 /home/tunnel-user/.ssh/vast_ai
|
|
chown -R tunnel-user:tunnel-user /home/tunnel-user/.ssh
|
|
|
|
# Create systemd service
|
|
cat > /etc/systemd/system/tunnel-vast-ai.service << 'SERVICE'
|
|
[Unit]
|
|
Description=SSH Tunnel to Vast.ai GPU Server
|
|
After=network.target
|
|
Wants=network-online.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=tunnel-user
|
|
ExecStart=/usr/bin/autossh \
|
|
-M 20000 \
|
|
-N \
|
|
-o "ServerAliveInterval=30" \
|
|
-o "ServerAliveCountMax=3" \
|
|
-o "ExitOnForwardFailure=no" \
|
|
-o "StrictHostKeyChecking=accept-new" \
|
|
-i /home/tunnel-user/.ssh/vast_ai \
|
|
-L 11434:localhost:11434 \
|
|
-L 11435:localhost:11435 \
|
|
-L 8188:localhost:8188 \
|
|
root@VAST_AI_IP
|
|
|
|
Restart=always
|
|
RestartSec=10
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
SERVICE
|
|
|
|
# Update IP in service file
|
|
sed -i "s/VAST_AI_IP/$VAST_AI_IP/g" /etc/systemd/system/tunnel-vast-ai.service
|
|
|
|
# Start tunnel
|
|
systemctl daemon-reload
|
|
systemctl start tunnel-vast-ai
|
|
systemctl enable tunnel-vast-ai
|
|
|
|
# Wait and verify
|
|
sleep 5
|
|
netstat -tlnp | grep -E '(11434|8188)'
|
|
|
|
echo "✓ SSH tunnel operational"
|
|
```
|
|
|
|
**Step 4: LiteLLM Installation (Automated)**
|
|
```bash
|
|
# Run on IGNY8 VPS
|
|
|
|
# Install LiteLLM
|
|
pip install litellm fastapi uvicorn python-dotenv requests
|
|
|
|
# Create directories
|
|
mkdir -p /opt/litellm
|
|
|
|
# Create config file
|
|
cat > /opt/litellm/config.yaml << 'CONFIG'
|
|
model_list:
|
|
- model_name: gpt-4
|
|
litellm_params:
|
|
model: ollama/qwen3:32b
|
|
api_base: http://localhost:11434
|
|
timeout: 300
|
|
max_tokens: 8000
|
|
|
|
- model_name: gpt-3.5-turbo
|
|
litellm_params:
|
|
model: ollama/qwen3:8b
|
|
api_base: http://localhost:11434
|
|
timeout: 120
|
|
max_tokens: 2048
|
|
|
|
- model_name: dall-e-3
|
|
litellm_params:
|
|
model: comfyui/flux.1-dev
|
|
api_base: http://localhost:8188
|
|
timeout: 120
|
|
|
|
litellm_settings:
|
|
verbose: true
|
|
log_level: INFO
|
|
cache_responses: true
|
|
CONFIG
|
|
|
|
# Create .env file
|
|
cat > /opt/litellm/.env << 'ENV'
|
|
OPENAI_API_KEY=your-openai-key
|
|
PORT=8000
|
|
HOST=127.0.0.1
|
|
ENV
|
|
|
|
# Create start script
|
|
cat > /opt/litellm/start.sh << 'SCRIPT'
|
|
#!/bin/bash
|
|
cd /opt/litellm
|
|
source .env
|
|
python -m litellm.server --config config.yaml --host 127.0.0.1 --port 8000 --num_workers 4
|
|
SCRIPT
|
|
|
|
chmod +x /opt/litellm/start.sh
|
|
|
|
# Create systemd service
|
|
cat > /etc/systemd/system/litellm.service << 'SERVICE'
|
|
[Unit]
|
|
Description=LiteLLM AI Proxy Gateway
|
|
After=network.target tunnel-vast-ai.service
|
|
Wants=tunnel-vast-ai.service
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=root
|
|
WorkingDirectory=/opt/litellm
|
|
ExecStart=/opt/litellm/start.sh
|
|
Restart=always
|
|
RestartSec=10
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
SERVICE
|
|
|
|
# Start LiteLLM
|
|
systemctl daemon-reload
|
|
systemctl start litellm
|
|
systemctl enable litellm
|
|
|
|
# Verify
|
|
sleep 5
|
|
curl -X POST http://localhost:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer test" \
|
|
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 100}'
|
|
|
|
echo "✓ LiteLLM operational"
|
|
```
|
|
|
|
**Step 5: IGNY8 Backend Integration (Developer)**
|
|
```bash
|
|
# In IGNY8 codebase
|
|
|
|
# 1. Add to IntegrationProvider enum (backend/models/integration.py)
|
|
# 2. Update management command to initialize self-hosted settings
|
|
# 3. Implement AIEngineRouter with fallback logic
|
|
# 4. Update Celery tasks to use router
|
|
# 5. Add database fields for provider tracking
|
|
# 6. Run migrations
|
|
# 7. Create health check monitoring
|
|
|
|
python manage.py makemigrations
|
|
python manage.py migrate
|
|
|
|
# Initialize self-hosted integration
|
|
python manage.py init_integrations
|
|
```
|
|
|
|
**Step 6: Verification (Automated)**
|
|
```bash
|
|
# Test full chain
|
|
curl -X POST http://localhost:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gpt-4",
|
|
"messages": [{"role": "user", "content": "Write a 100-word article about clouds"}],
|
|
"max_tokens": 200
|
|
}'
|
|
|
|
# Expected response: Article from Qwen3:32B model
|
|
|
|
# Test fallback by stopping tunnel
|
|
systemctl stop tunnel-vast-ai
|
|
# Wait 10 seconds
|
|
# Retry request - should now use OpenAI instead
|
|
```
|
|
|
|
---
|
|
|
|
## Timeline & Resource Allocation
|
|
|
|
| Phase | Days | Task | Owner | Status |
|
|
|-------|------|------|-------|--------|
|
|
| 1.1 | 1 | Vast.ai account & GPU rental | Operator | Ready |
|
|
| 1.2 | 1 | Docker & Ollama setup | DevOps | Ready |
|
|
| 1.3 | 1 | Model pulling & ComfyUI | DevOps | Ready |
|
|
| 2.1 | 0.5 | VPS tunnel infrastructure | DevOps | Ready |
|
|
| 2.2 | 0.5 | autossh systemd service | DevOps | Ready |
|
|
| 2.3 | 1 | LiteLLM installation & config | DevOps | Ready |
|
|
| 3.1 | 1 | Backend integration scaffolding | Developer | Ready |
|
|
| 3.2 | 1 | AI router & fallback logic | Developer | Ready |
|
|
| 3.3 | 1 | Celery task updates | Developer | Ready |
|
|
| 4.1 | 1 | Health check system | DevOps | Ready |
|
|
| 5.1 | 1 | Cost tracking & dashboard | Developer | Ready |
|
|
| **Total** | **7** | | | |
|
|
|
|
---
|
|
|
|
## Cost Analysis
|
|
|
|
### Monthly GPU Rental
|
|
- **Vast.ai 2x RTX 3090:** $180-220/month (auto-bid recommended)
|
|
- **Fixed cost:** $200/month (conservative)
|
|
|
|
### Monthly API Costs (Current)
|
|
Estimated current external API costs (before optimization):
|
|
- **OpenAI (GPT-4/3.5):** $800-1,200/month
|
|
- **Anthropic (Claude):** $200-400/month
|
|
- **Image generation (Runware/Bria):** $300-500/month
|
|
- **Total:** $1,300-2,100/month
|
|
|
|
### Monthly API Costs (After)
|
|
With self-hosted supplementing external:
|
|
- **Self-hosted cost:** $200/month (amortized GPU)
|
|
- **External APIs (fallback only):** $200-300/month
|
|
- **Total:** $400-500/month
|
|
|
|
### Savings & ROI
|
|
- **Monthly savings:** $800-1,700
|
|
- **Break-even:** 12-24 days (1 GPU rental cost)
|
|
- **Annual savings:** $9,600-20,400
|
|
|
|
### Cost Per Subscriber
|
|
- **Before:** $26-42/subscriber/month (on $49/month tier)
|
|
- **After:** $8-10/subscriber/month
|
|
- **Improvement:** 65-76% cost reduction
|
|
|
|
---
|
|
|
|
## Troubleshooting Guide
|
|
|
|
### SSH Tunnel Not Connecting
|
|
```bash
|
|
# Check service status
|
|
systemctl status tunnel-vast-ai
|
|
|
|
# View detailed logs
|
|
journalctl -u tunnel-vast-ai -n 100 -f
|
|
|
|
# Test SSH manually
|
|
ssh -v -i /home/tunnel-user/.ssh/vast_ai root@<vast_ai_ip>
|
|
|
|
# Ensure Vast.ai machine still running and has bandwidth
|
|
```
|
|
|
|
### Ollama Not Responding
|
|
```bash
|
|
# Check container
|
|
docker ps | grep ollama
|
|
|
|
# View logs
|
|
docker logs -f ollama
|
|
|
|
# Test directly
|
|
docker exec ollama curl http://localhost:11434/api/tags
|
|
|
|
# Restart if needed
|
|
docker restart ollama
|
|
```
|
|
|
|
### ComfyUI Port Not Accessible
|
|
```bash
|
|
# Check container
|
|
docker ps | grep comfyui
|
|
|
|
# Test through tunnel
|
|
curl http://localhost:8188/system_stats
|
|
|
|
# Restart if needed
|
|
docker restart comfyui
|
|
```
|
|
|
|
### LiteLLM Timeouts
|
|
```bash
|
|
# Check LiteLLM logs
|
|
journalctl -u litellm -n 100
|
|
|
|
# Increase timeout in config.yaml
|
|
# Test simple request
|
|
curl -X POST http://localhost:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Hi"}], "max_tokens": 10}'
|
|
```
|
|
|
|
### Fallback to External APIs Not Working
|
|
```bash
|
|
# Verify OpenAI API key in /opt/litellm/.env
|
|
# Test OpenAI directly (disable tunnel)
|
|
systemctl stop tunnel-vast-ai
|
|
curl -X POST http://localhost:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model": "gpt-3.5-turbo-fallback", "messages": [{"role": "user", "content": "Hi"}]}'
|
|
```
|
|
|
|
---
|
|
|
|
## Cross-References
|
|
|
|
**Dependency:** [00B VPS Provisioning & Infrastructure](./00B-vps-provisioning.md)
|
|
**Related:** [00A Project Planning](./00A-project-planning.md)
|
|
**Related:** [00C Database & Schema](./00C-database-schema.md)
|
|
**Related:** [00D Authentication & Security](./00D-auth-security.md)
|
|
|
|
---
|
|
|
|
## Document Version
|
|
|
|
| Version | Date | Changes |
|
|
|---------|------|---------|
|
|
| 1.0 | 2026-03-23 | Initial comprehensive build document |
|
|
|
|
---
|
|
|
|
**Status:** Ready for implementation
|
|
**Last Updated:** 2026-03-23
|
|
**Next Step:** Execute Phase 1 GPU infrastructure setup after 00B completion
|