import { useState } from "react"; const Section = ({ title, children, defaultOpen = false }) => { const [open, setOpen] = useState(defaultOpen); return (
{subtitle}
{children}{purpose}
{notes}
}
{code}
)}
Hostinger VPS + Vast.ai GPU — Zero API Costs Architecture
• All LLM inference runs on your own GPU — no OpenAI/Anthropic API charges
• Image generation uses open-source FLUX.1/SD 3.5 — no Midjourney/DALL-E costs
• Video generation uses Wan 2.1/LTX-Video — no Runway/Sora costs
• Only fixed costs: Hostinger VPS (~$12-25/mo) + Vast.ai GPU (~$200/mo)
• Total: ~$215-225/month for unlimited AI generation
KVM 4 or higher recommended
4 vCPU, 16GB RAM, 200GB NVMe — handles Plesk + all CPU services comfortably
What runs on Hostinger (CPU-only):
{`# docker-compose.yml on Hostinger VPS
version: '3.8'
services:
flowise:
image: flowiseai/flowise
ports:
- "3000:3000"
volumes:
- flowise_data:/root/.flowise
environment:
- FLOWISE_USERNAME=admin
- FLOWISE_PASSWORD=your_secure_password
restart: always
chromadb:
image: chromadb/chroma
ports:
- "8000:8000"
volumes:
- chroma_data:/chroma/chroma
restart: always
litellm:
image: ghcr.io/berriai/litellm:main-latest
ports:
- "4000:4000"
volumes:
- ./litellm_config.yaml:/app/config.yaml
command: ["--config", "/app/config.yaml"]
restart: always
open-webui:
image: ghcr.io/open-webui/open-webui:main
ports:
- "3001:8080"
environment:
# Point to Vast.ai Ollama (via SSH tunnel)
- OLLAMA_BASE_URL=http://litellm:4000
volumes:
- openwebui_data:/app/backend/data
restart: always
volumes:
flowise_data:
chroma_data:
openwebui_data:`}
{`# litellm_config.yaml
# Routes all AI requests to your Vast.ai GPU server
model_list:
- model_name: qwen3-32b
litellm_params:
model: ollama/qwen3:32b
api_base: http://VAST_AI_IP:11434
- model_name: qwen3-30b-moe
litellm_params:
model: ollama/qwen3:30b-a3b
api_base: http://VAST_AI_IP:11434
- model_name: qwen3-14b
litellm_params:
model: ollama/qwen3:14b
api_base: http://VAST_AI_IP:11434
- model_name: qwen3-8b
litellm_params:
model: ollama/qwen3:8b
api_base: http://VAST_AI_IP:11434
general_settings:
master_key: sk-your-master-key-here`}
{`# /etc/nginx/sites-available/ai-services
# Flowise — IGNY8 AI workflows
server {
listen 443 ssl;
server_name flowise.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/...;
location / {
proxy_pass http://localhost:3000;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
# Open WebUI — Client chat
server {
listen 443 ssl;
server_name chat.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/...;
location / {
proxy_pass http://localhost:3001;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
# LiteLLM — API endpoint
server {
listen 443 ssl;
server_name api.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/...;
location / {
proxy_pass http://localhost:4000;
}
}`}
GPU
2x RTX 3090 (48GB total)
CPU
AMD EPYC 7543 (13.5/128 cores)
RAM
54GB / 516GB
Storage
442GB NVMe
Location
Bulgaria
Cost
$0.277/hr (~$200/mo)
IGNY8 content writing, SEO content, chat responses
• Qwen3 32B (dense) — ~20GB VRAM, best quality
• Qwen3 30B-A3B (MoE) — ~19GB VRAM, fastest
• Qwen3 14B — ~9GB VRAM, for concurrent tasks
• Load models on demand via Ollama
Blog images, social media graphics, product visuals for IGNY8
• FLUX.1 [dev] — best quality, ~12GB VRAM
• Stable Diffusion 3.5 — wide ecosystem, ~8GB VRAM
• SDXL-Lightning — fast generation, ~6GB VRAM
• All via ComfyUI API (port 8188)
Short videos for social media, product demos
• Wan 2.1 (14B) — best quality, uses full 48GB
• Wan 2.1 (1.3B) — fast, only ~8GB VRAM
• LTX-Video — fastest, ~12GB VRAM, 768x512
• ⚠️ Unload LLM before running video gen (shared VRAM)
48GB total — you can't run everything simultaneously
Mode 1: Content Writing (Default)
Qwen3 32B (~20GB) + FLUX.1 image gen (~12GB) = ~32GB
✓ Fits comfortably, both can run simultaneously
Mode 2: Fast Throughput
Qwen3 30B-A3B MoE (~19GB) + FLUX.1 (~12GB) = ~31GB
✓ Blazing fast text + images together
Mode 3: Video Generation
Unload LLM → Wan 2.1 14B uses full 48GB
⚠️ Schedule video jobs during off-peak, unload text model first
Ollama auto-unloads models after 5min idle. Use OLLAMA_KEEP_ALIVE to control.
{`# Run on Hostinger VPS — creates persistent tunnel to Vast.ai
# This makes Vast.ai's ports available on localhost
# Install autossh for persistent tunnels
apt install autossh
# Create tunnel (run as systemd service)
autossh -M 0 -N \\
-L 11434:localhost:11434 \\ # Ollama
-L 8188:localhost:8188 \\ # ComfyUI
-o "ServerAliveInterval=30" \\
-o "ServerAliveCountMax=3" \\
-i /root/.ssh/vastai_key \\
root@VAST_AI_IP -p VAST_SSH_PORT
# Now on Hostinger:
# localhost:11434 → Vast.ai Ollama
# localhost:8188 → Vast.ai ComfyUI`}
| Model | VRAM | Speed | Best For | License |
|---|---|---|---|---|
| Qwen3 32B ⭐ | ~20GB | ~15-25 tok/s | Long-form SEO content, articles | |
| Qwen3 30B-A3B ⭐ | ~19GB | ~40-60 tok/s | Fast chat, SEO meta, bulk content | |
| Qwen3 14B | ~9GB | ~30-50 tok/s | Concurrent light tasks | |
| Qwen3 8B | ~5GB | ~50-80 tok/s | Classification, tagging, simple tasks |
| Model | VRAM | Speed | Quality | License |
|---|---|---|---|---|
| FLUX.1 [dev] ⭐ | ~12GB | ~8-15s/img | Excellent — rivals Midjourney | |
| Stable Diffusion 3.5 ⭐ | ~8GB | ~5-10s/img | Very good + huge ecosystem | |
| SDXL-Lightning | ~6GB | ~1-2s/img | Good for fast iterations | |
| Z-Image-Turbo | ~16GB | <1s/img | Excellent + text rendering |
| Model | VRAM | Resolution | Duration | License |
|---|---|---|---|---|
| Wan 2.1 (1.3B) ⭐ | ~8GB | 480p | 5-8s clips | |
| LTX-Video ⭐ | ~12GB | 768x512 | 5-7s clips | |
| Wan 2.1 (14B) | ~40GB+ | 720p | 5-10s clips | |
| Wan2GP (optimized) | ~12GB | 720p | 8-15s clips |
48GB VRAM Budget — Practical Combos:
For unlimited AI text + image + video generation
If Using APIs Instead:
• OpenAI GPT-4o: ~$0.005/1K tokens → 1M tokens/day = $150/mo just for text
• Midjourney: $30/mo (limited) or $0.01-0.04/image
• Runway Gen-3: $0.05/sec video → 100 videos/mo = $250/mo
• DALL-E 3: $0.04/image → 500 images/mo = $20/mo
Conservative estimate: $400-600+/month with scale
Your Self-Hosted Setup:
• Unlimited text generation — $0 marginal cost
• Unlimited image generation — $0 marginal cost
• Unlimited video generation — $0 marginal cost
• Fixed cost regardless of usage volume
Fixed: ~$240/month — scales to infinity
Break-even point: Once IGNY8 processes ~50+ articles/month with images, self-hosting is cheaper than APIs.
At scale: If IGNY8 generates 500+ articles + 1000+ images + 100+ videos per month, you're saving $300-500+/month vs APIs.
Added bonus: No rate limits, no API key management, no vendor lock-in, full data privacy.
Revenue potential: Sell the same infrastructure as a service to Alorig clients — chat.yourdomain.com as a white-label AI assistant.
Architecture Summary
Hostinger VPS handles web hosting (Plesk), AI orchestration (Flowise, LiteLLM, Open WebUI), and vector storage (Chroma). All AI inference routes through an SSH tunnel to Vast.ai's 2x RTX 3090 GPU server running Ollama (text), ComfyUI (images + video). Zero API costs — fixed monthly spend of ~$240 for unlimited generation.