IGNY8 Self-Hosted AI Infrastructure

Hostinger VPS + Vast.ai GPU — Zero API Costs Architecture

{tabs.map((t) => ( ))}

{tab === "overview" && (

How They Connect

Hostinger → LiteLLM proxy → Vast.ai Ollama Text generation (OpenAI-compatible API)

Flowise → LiteLLM → Qwen3 models IGNY8 content writing pipeline

IGNY8 App → HTTP API → ComfyUI Image/video generation requests

Open WebUI → Ollama API → Any loaded model Client chat interface

Zero API Cost Model

• All LLM inference runs on your own GPU — no OpenAI/Anthropic API charges

• Image generation uses open-source FLUX.1/SD 3.5 — no Midjourney/DALL-E costs

• Video generation uses Wan 2.1/LTX-Video — no Runway/Sora costs

• Only fixed costs: Hostinger VPS (~$12-25/mo) + Vast.ai GPU (~$200/mo)

• Total: ~$215-225/month for unlimited AI generation

)} {tab === "hostinger" && (

KVM 4 or higher recommended

4 vCPU, 16GB RAM, 200GB NVMe — handles Plesk + all CPU services comfortably

What runs on Hostinger (CPU-only):

Plesk — Web hosting panel

Flowise — AI workflow engine

Open WebUI — Chat interface

Chroma — Vector DB

LiteLLM — API proxy

Nginx — Reverse proxy

Important: Don't install Ollama on Hostinger — it will be too slow on CPU. All AI inference goes through LiteLLM → Vast.ai GPU server.

{`# docker-compose.yml on Hostinger VPS

version: '3.8'

services:
  flowise:
    image: flowiseai/flowise
    ports:
      - "3000:3000"
    volumes:
      - flowise_data:/root/.flowise
    environment:
      - FLOWISE_USERNAME=admin
      - FLOWISE_PASSWORD=your_secure_password
    restart: always

  chromadb:
    image: chromadb/chroma
    ports:
      - "8000:8000"
    volumes:
      - chroma_data:/chroma/chroma
    restart: always

  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports:
      - "4000:4000"
    volumes:
      - ./litellm_config.yaml:/app/config.yaml
    command: ["--config", "/app/config.yaml"]
    restart: always

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3001:8080"
    environment:
      # Point to Vast.ai Ollama (via SSH tunnel)
      - OLLAMA_BASE_URL=http://litellm:4000
    volumes:
      - openwebui_data:/app/backend/data
    restart: always

volumes:
  flowise_data:
  chroma_data:
  openwebui_data:`}

{`# litellm_config.yaml
# Routes all AI requests to your Vast.ai GPU server

model_list:
  - model_name: qwen3-32b
    litellm_params:
      model: ollama/qwen3:32b
      api_base: http://VAST_AI_IP:11434
      
  - model_name: qwen3-30b-moe
    litellm_params:
      model: ollama/qwen3:30b-a3b
      api_base: http://VAST_AI_IP:11434

  - model_name: qwen3-14b
    litellm_params:
      model: ollama/qwen3:14b
      api_base: http://VAST_AI_IP:11434

  - model_name: qwen3-8b
    litellm_params:
      model: ollama/qwen3:8b
      api_base: http://VAST_AI_IP:11434

general_settings:
  master_key: sk-your-master-key-here`}

Security: Use SSH tunnel or WireGuard VPN between Hostinger and Vast.ai. Never expose Ollama port directly to the internet.

{`# /etc/nginx/sites-available/ai-services

# Flowise — IGNY8 AI workflows
server {
    listen 443 ssl;
    server_name flowise.yourdomain.com;
    ssl_certificate /etc/letsencrypt/live/...;
    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

# Open WebUI — Client chat
server {
    listen 443 ssl;
    server_name chat.yourdomain.com;
    ssl_certificate /etc/letsencrypt/live/...;
    location / {
        proxy_pass http://localhost:3001;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

# LiteLLM — API endpoint
server {
    listen 443 ssl;
    server_name api.yourdomain.com;
    ssl_certificate /etc/letsencrypt/live/...;
    location / {
        proxy_pass http://localhost:4000;
    }
}`}

)} {tab === "vastai" && (

GPU

2x RTX 3090 (48GB total)

CPU

AMD EPYC 7543 (13.5/128 cores)

RAM

54GB / 516GB

Storage

442GB NVMe

Location

Bulgaria

Cost

$0.277/hr (~$200/mo)

Note: 97.2% reliability = ~20hrs downtime/month. For production, consider m:24191 (Czechia, 99.38% reliability, $0.288/hr) or set up auto-failback to a paid API in LiteLLM config.

Text Gen Ollama + Qwen3 Models

IGNY8 content writing, SEO content, chat responses

• Qwen3 32B (dense) — ~20GB VRAM, best quality

• Qwen3 30B-A3B (MoE) — ~19GB VRAM, fastest

• Qwen3 14B — ~9GB VRAM, for concurrent tasks

• Load models on demand via Ollama

Image Gen ComfyUI + FLUX.1 / Stable Diffusion

Blog images, social media graphics, product visuals for IGNY8

• FLUX.1 [dev] — best quality, ~12GB VRAM

• Stable Diffusion 3.5 — wide ecosystem, ~8GB VRAM

• SDXL-Lightning — fast generation, ~6GB VRAM

• All via ComfyUI API (port 8188)

Video Gen Wan 2.1 / LTX-Video via ComfyUI

Short videos for social media, product demos

• Wan 2.1 (14B) — best quality, uses full 48GB

• Wan 2.1 (1.3B) — fast, only ~8GB VRAM

• LTX-Video — fastest, ~12GB VRAM, 768x512

• ⚠️ Unload LLM before running video gen (shared VRAM)

48GB total — you can't run everything simultaneously

Mode 1: Content Writing (Default)

Qwen3 32B (~20GB) + FLUX.1 image gen (~12GB) = ~32GB

✓ Fits comfortably, both can run simultaneously

Mode 2: Fast Throughput

Qwen3 30B-A3B MoE (~19GB) + FLUX.1 (~12GB) = ~31GB

✓ Blazing fast text + images together

Mode 3: Video Generation

Unload LLM → Wan 2.1 14B uses full 48GB

⚠️ Schedule video jobs during off-peak, unload text model first

Ollama auto-unloads models after 5min idle. Use OLLAMA_KEEP_ALIVE to control.

{`# Run on Hostinger VPS — creates persistent tunnel to Vast.ai
# This makes Vast.ai's ports available on localhost

# Install autossh for persistent tunnels
apt install autossh

# Create tunnel (run as systemd service)
autossh -M 0 -N \\
  -L 11434:localhost:11434 \\   # Ollama
  -L 8188:localhost:8188 \\     # ComfyUI
  -o "ServerAliveInterval=30" \\
  -o "ServerAliveCountMax=3" \\
  -i /root/.ssh/vastai_key \\
  root@VAST_AI_IP -p VAST_SSH_PORT

# Now on Hostinger:
# localhost:11434 → Vast.ai Ollama
# localhost:8188 → Vast.ai ComfyUI`}

)} {tab === "models" && (

Model	VRAM	Speed	Best For	License
Qwen3 32B ⭐	~20GB	~15-25 tok/s	Long-form SEO content, articles	Apache 2.0
Qwen3 30B-A3B ⭐	~19GB	~40-60 tok/s	Fast chat, SEO meta, bulk content	Apache 2.0
Qwen3 14B	~9GB	~30-50 tok/s	Concurrent light tasks	Apache 2.0
Qwen3 8B	~5GB	~50-80 tok/s	Classification, tagging, simple tasks	Apache 2.0

IGNY8 Strategy: Use 30B-A3B MoE as default (fast bulk content), switch to 32B dense for premium long-form articles. Ollama handles model switching automatically.

Model	VRAM	Speed	Quality	License
FLUX.1 [dev] ⭐	~12GB	~8-15s/img	Excellent — rivals Midjourney	Non-commercial*
Stable Diffusion 3.5 ⭐	~8GB	~5-10s/img	Very good + huge ecosystem	Community
SDXL-Lightning	~6GB	~1-2s/img	Good for fast iterations	Open
Z-Image-Turbo	~16GB	<1s/img	Excellent + text rendering	Apache 2.0

*FLUX.1 [dev] licensing: Free for non-commercial. For IGNY8 commercial use, either get BFL license or use SD 3.5 / Z-Image-Turbo (Apache 2.0, fully commercial).

Model	VRAM	Resolution	Duration	License
Wan 2.1 (1.3B) ⭐	~8GB	480p	5-8s clips	Apache 2.0
LTX-Video ⭐	~12GB	768x512	5-7s clips	Apache 2.0
Wan 2.1 (14B)	~40GB+	720p	5-10s clips	Apache 2.0
Wan2GP (optimized)	~12GB	720p	8-15s clips	Open

Best approach: Use Wan 2.1 1.3B or LTX-Video for quick social clips alongside the LLM. For premium 720p video, schedule Wan 14B during off-peak (requires unloading LLM).

48GB VRAM Budget — Practical Combos:

🟢 Combo A (Daily IGNY8 ops): Qwen3 30B-A3B (19GB) + SD 3.5 images (8GB) + Wan 1.3B video (8GB) = 35GB ✓

🔵 Combo B (Premium content): Qwen3 32B (20GB) + FLUX.1 images (12GB) = 32GB ✓

🟠 Combo C (Video focus): Wan 14B video (40GB+) = needs all VRAM, unload everything else

🟣 Combo D (Multi-client chat): Qwen3 14B (9GB) × multiple concurrent users = fast, lightweight

)} {tab === "setup" && (

/etc/systemd/system/vastai-tunnel.service << 'EOF' [Unit] Description=SSH Tunnel to Vast.ai GPU After=network.target [Service] ExecStart=/usr/bin/autossh -M 0 -N \\ -L 11434:localhost:11434 \\ -L 8188:localhost:8188 \\ -o "ServerAliveInterval=30" \\ -o "ServerAliveCountMax=3" \\ -o "StrictHostKeyChecking=no" \\ -i /root/.ssh/vastai_key \\ root@VAST_IP -p PORT Restart=always RestartSec=10 [Install] WantedBy=multi-user.target EOF systemctl enable vastai-tunnel systemctl start vastai-tunnel`} />

)} {tab === "costs" && (

Total Monthly ~$235-245/mo

For unlimited AI text + image + video generation

If Using APIs Instead:

• OpenAI GPT-4o: ~$0.005/1K tokens → 1M tokens/day = $150/mo just for text

• Midjourney: $30/mo (limited) or $0.01-0.04/image

• Runway Gen-3: $0.05/sec video → 100 videos/mo = $250/mo

• DALL-E 3: $0.04/image → 500 images/mo = $20/mo

Conservative estimate: $400-600+/month with scale

Your Self-Hosted Setup:

• Unlimited text generation — $0 marginal cost

• Unlimited image generation — $0 marginal cost

• Unlimited video generation — $0 marginal cost

• Fixed cost regardless of usage volume

Fixed: ~$240/month — scales to infinity

Break-even point: Once IGNY8 processes ~50+ articles/month with images, self-hosting is cheaper than APIs.

At scale: If IGNY8 generates 500+ articles + 1000+ images + 100+ videos per month, you're saving $300-500+/month vs APIs.

Added bonus: No rate limits, no API key management, no vendor lock-in, full data privacy.

Revenue potential: Sell the same infrastructure as a service to Alorig clients — chat.yourdomain.com as a white-label AI assistant.

)}

Architecture Summary

Hostinger VPS handles web hosting (Plesk), AI orchestration (Flowise, LiteLLM, Open WebUI), and vector storage (Chroma). All AI inference routes through an SSH tunnel to Vast.ai's 2x RTX 3090 GPU server running Ollama (text), ComfyUI (images + video). Zero API costs — fixed monthly spend of ~$240 for unlimited generation.

{title}

IGNY8 Self-Hosted AI Infrastructure

How They Connect

Zero API Cost Model