Files
igny8/docs/50-DEPLOYMENT/DEVOPS-OPERATIONS-GUIDE.md
IGNY8 VPS (Salman) 4a200822bb Dev ops prep
2026-01-21 17:53:42 +00:00

11 KiB

DevOps Operations Guide

Purpose: Complete operational procedures for managing IGNY8 in production
Version: 1.0
Last Updated: January 20, 2026


📋 Executive Summary

This document provides a complete structure for:

  1. Automated Backups - Regular database + config backups
  2. Environment Management - Dev vs Staging vs Production
  3. Health Monitoring - Automated health checks & alerts
  4. Disaster Recovery - Quick recovery procedures
  5. Change Management - Safe deployment workflow

🗂️ Directory Structure (To Be Implemented)

/data/
├── app/
│   └── igny8/                          # Application code
│       ├── docker-compose.app.yml      # Production compose ✅
│       ├── docker-compose.staging.yml  # Staging compose ⚠️ TO CREATE
│       ├── .env                        # Production env
│       ├── .env.staging                # Staging env ⚠️ TO CREATE
│       └── scripts/
│           └── ops/                    # ⚠️ TO CREATE
│               ├── backup-db.sh        # Database backup
│               ├── backup-full.sh      # Full backup (db + code + config)
│               ├── restore-db.sh       # Database restore
│               ├── deploy-staging.sh   # Deploy to staging
│               ├── deploy-production.sh# Deploy to production
│               ├── rollback.sh         # Rollback deployment
│               ├── health-check.sh     # System health check
│               ├── sync-prod-to-staging.sh  # Sync data
│               └── log-rotate.sh       # Log rotation
│
├── backups/                            # Backup storage
│   ├── daily/                          # Daily automated backups
│   │   └── YYYYMMDD/
│   │       ├── db_igny8_YYYYMMDD_HHMMSS.sql.gz
│   │       └── config_YYYYMMDD.tar.gz
│   ├── weekly/                         # Weekly backups (kept 4 weeks)
│   ├── monthly/                        # Monthly backups (kept 12 months)
│   └── pre-deploy/                     # Pre-deployment snapshots
│       └── YYYYMMDD_HHMMSS/
│
├── logs/                               # Centralized logs
│   ├── production/
│   │   ├── backend.log
│   │   ├── celery-worker.log
│   │   ├── celery-beat.log
│   │   └── access.log
│   ├── staging/
│   └── caddy/
│
└── stack/                              # Infrastructure stack
    └── igny8-stack/                    # (Future - not yet separated)

🔄 Automated Backup System

Backup Strategy

Type Frequency Retention Content
Daily 1:00 AM 7 days Database + configs
Weekly Sunday 2:00 AM 4 weeks Full backup
Monthly 1st of month 12 months Full backup
Pre-Deploy Before each deploy 5 most recent Database snapshot

Cron Schedule

# /etc/cron.d/igny8-backup

# Daily database backup at 1:00 AM
0 1 * * * root /data/app/igny8/scripts/ops/backup-db.sh daily >> /data/logs/backup.log 2>&1

# Weekly full backup on Sunday at 2:00 AM
0 2 * * 0 root /data/app/igny8/scripts/ops/backup-full.sh weekly >> /data/logs/backup.log 2>&1

# Monthly full backup on 1st at 3:00 AM
0 3 1 * * root /data/app/igny8/scripts/ops/backup-full.sh monthly >> /data/logs/backup.log 2>&1

# Health check every 5 minutes
*/5 * * * * root /data/app/igny8/scripts/ops/health-check.sh >> /data/logs/health.log 2>&1

# Log rotation daily at midnight
0 0 * * * root /data/app/igny8/scripts/ops/log-rotate.sh >> /data/logs/maintenance.log 2>&1

🌍 Environment Management

Environment Comparison

Aspect Development Staging Production
Domain localhost:5173 staging.igny8.com app.igny8.com
API localhost:8010 staging-api.igny8.com api.igny8.com
Database igny8_dev_db igny8_staging_db igny8_db
Redis DB 2 1 0
Debug True False False
AI Keys Test/Limited Test/Limited Production
Payments Sandbox Sandbox Live
Compose File docker-compose.dev.yml docker-compose.staging.yml docker-compose.app.yml
Project Name igny8-dev igny8-staging igny8-app

Port Allocation

Service Dev Staging Production
Backend 8010 8012 8011
Frontend 5173 8024 8021
Marketing 5174 8026 8023
Flower - 5556 5555

🚀 Deployment Workflow

Safe Deployment Checklist

┌─────────────────────────────────────────────────────────────┐
│                    DEPLOYMENT CHECKLIST                      │
├─────────────────────────────────────────────────────────────┤
│ PRE-DEPLOYMENT                                               │
│ □ All tests passing on staging?                              │
│ □ Database migrations reviewed?                              │
│ □ Backup created?                                            │
│ □ Rollback plan ready?                                       │
│ □ Team notified?                                             │
├─────────────────────────────────────────────────────────────┤
│ DEPLOYMENT                                                   │
│ □ Create pre-deploy backup                                   │
│ □ Tag current images for rollback                            │
│ □ Pull latest code                                           │
│ □ Build new images                                           │
│ □ Apply migrations                                           │
│ □ Restart containers                                         │
│ □ Verify health check                                        │
├─────────────────────────────────────────────────────────────┤
│ POST-DEPLOYMENT                                              │
│ □ Monitor logs for 10 minutes                                │
│ □ Test critical paths (login, API, AI functions)             │
│ □ Check error rates                                          │
│ □ If issues → ROLLBACK                                       │
│ □ Update changelog                                           │
└─────────────────────────────────────────────────────────────┘

Git Branch Strategy

                    ┌──────────┐
                    │   main   │  ← Production deployments
                    └────▲─────┘
                         │ merge (after staging approval)
                    ┌────┴─────┐
                    │ staging  │  ← Staging deployments
                    └────▲─────┘
                         │ merge
        ┌────────────────┼────────────────┐
        │                │                │
┌───────┴───────┐ ┌──────┴──────┐ ┌───────┴───────┐
│feature/xyz    │ │feature/abc  │ │hotfix/urgent  │
└───────────────┘ └─────────────┘ └───────────────┘

🏥 Health Monitoring

Health Check Endpoints

Endpoint Purpose Expected Response
/api/v1/system/status/ Overall system status {"status": "healthy"}
/api/v1/system/health/ Detailed component health JSON with all components

Monitoring Targets

  1. Backend API - Response time < 500ms
  2. Database - Connection pool healthy
  3. Redis - Connection alive
  4. Celery Workers - Queue length < 100
  5. Celery Beat - Scheduler running
  6. Disk Space - > 20% free
  7. Memory - < 80% used

Alert Thresholds

Metric Warning Critical
API Response Time > 1s > 5s
Error Rate > 1% > 5%
CPU Usage > 70% > 90%
Memory Usage > 70% > 90%
Disk Usage > 70% > 90%
Celery Queue > 50 > 200

🔧 Common Operations

Daily Operations

# Check system health
/data/app/igny8/scripts/ops/health-check.sh

# View logs
tail -f /data/logs/production/backend.log

# Check container status
docker compose -f /data/app/igny8/docker-compose.app.yml -p igny8-app ps

Weekly Operations

# Review backup status
ls -la /data/backups/daily/
du -sh /data/backups/*

# Check disk space
df -h

# Review error logs
grep -i error /data/logs/production/backend.log | tail -50

Emergency Procedures

# Immediate rollback
/data/app/igny8/scripts/ops/rollback.sh

# Emergency restart
docker compose -f /data/app/igny8/docker-compose.app.yml -p igny8-app restart

# Emergency database restore
/data/app/igny8/scripts/ops/restore-db.sh /data/backups/latest.sql.gz

📊 What's Missing (Action Items)

Priority 1 - Critical (Before Go-Live)

Item Status Action
docker-compose.staging.yml Missing Create from documentation
.env.staging Missing Create from example
Deployment scripts Missing Create all ops scripts
Automated backup cron Missing Set up cron jobs
Pre-deploy backup Missing Add to deploy script

Priority 2 - Important (First Week)

Item Status Action
Health check automation Missing Create monitoring
Log rotation Missing Set up logrotate
Staging DNS Unknown Configure if needed
Caddyfile staging routes Unknown Add staging domains

Priority 3 - Nice to Have (First Month)

Item Status Action
CI/CD pipeline Not set Optional automation
External monitoring Not set UptimeRobot/Datadog
Alerting system Not set Email/Slack alerts

Next Steps

  1. Create ops scripts directory: /data/app/igny8/scripts/ops/
  2. Create all deployment scripts (see STAGING-SETUP-GUIDE.md)
  3. Create staging compose file (copy from documentation)
  4. Set up automated backups
  5. Test complete deployment cycle on staging
  6. Go live with confidence