# DevOps Operations Guide **Purpose:** Complete operational procedures for managing IGNY8 in production **Version:** 1.0 **Last Updated:** January 20, 2026 --- ## πŸ“‹ Executive Summary This document provides a complete structure for: 1. **Automated Backups** - Regular database + config backups 2. **Environment Management** - Dev vs Staging vs Production 3. **Health Monitoring** - Automated health checks & alerts 4. **Disaster Recovery** - Quick recovery procedures 5. **Change Management** - Safe deployment workflow --- ## πŸ—‚οΈ Directory Structure (To Be Implemented) ``` /data/ β”œβ”€β”€ app/ β”‚ └── igny8/ # Application code β”‚ β”œβ”€β”€ docker-compose.app.yml # Production compose βœ… β”‚ β”œβ”€β”€ docker-compose.staging.yml # Staging compose ⚠️ TO CREATE β”‚ β”œβ”€β”€ .env # Production env β”‚ β”œβ”€β”€ .env.staging # Staging env ⚠️ TO CREATE β”‚ └── scripts/ β”‚ └── ops/ # ⚠️ TO CREATE β”‚ β”œβ”€β”€ backup-db.sh # Database backup β”‚ β”œβ”€β”€ backup-full.sh # Full backup (db + code + config) β”‚ β”œβ”€β”€ restore-db.sh # Database restore β”‚ β”œβ”€β”€ deploy-staging.sh # Deploy to staging β”‚ β”œβ”€β”€ deploy-production.sh# Deploy to production β”‚ β”œβ”€β”€ rollback.sh # Rollback deployment β”‚ β”œβ”€β”€ health-check.sh # System health check β”‚ β”œβ”€β”€ sync-prod-to-staging.sh # Sync data β”‚ └── log-rotate.sh # Log rotation β”‚ β”œβ”€β”€ backups/ # Backup storage β”‚ β”œβ”€β”€ daily/ # Daily automated backups β”‚ β”‚ └── YYYYMMDD/ β”‚ β”‚ β”œβ”€β”€ db_igny8_YYYYMMDD_HHMMSS.sql.gz β”‚ β”‚ └── config_YYYYMMDD.tar.gz β”‚ β”œβ”€β”€ weekly/ # Weekly backups (kept 4 weeks) β”‚ β”œβ”€β”€ monthly/ # Monthly backups (kept 12 months) β”‚ └── pre-deploy/ # Pre-deployment snapshots β”‚ └── YYYYMMDD_HHMMSS/ β”‚ β”œβ”€β”€ logs/ # Centralized logs β”‚ β”œβ”€β”€ production/ β”‚ β”‚ β”œβ”€β”€ backend.log β”‚ β”‚ β”œβ”€β”€ celery-worker.log β”‚ β”‚ β”œβ”€β”€ celery-beat.log β”‚ β”‚ └── access.log β”‚ β”œβ”€β”€ staging/ β”‚ └── caddy/ β”‚ └── stack/ # Infrastructure stack └── igny8-stack/ # (Future - not yet separated) ``` --- ## πŸ”„ Automated Backup System ### Backup Strategy | Type | Frequency | Retention | Content | |------|-----------|-----------|---------| | **Daily** | 1:00 AM | 7 days | Database + configs | | **Weekly** | Sunday 2:00 AM | 4 weeks | Full backup | | **Monthly** | 1st of month | 12 months | Full backup | | **Pre-Deploy** | Before each deploy | 5 most recent | Database snapshot | ### Cron Schedule ```bash # /etc/cron.d/igny8-backup # Daily database backup at 1:00 AM 0 1 * * * root /data/app/igny8/scripts/ops/backup-db.sh daily >> /data/logs/backup.log 2>&1 # Weekly full backup on Sunday at 2:00 AM 0 2 * * 0 root /data/app/igny8/scripts/ops/backup-full.sh weekly >> /data/logs/backup.log 2>&1 # Monthly full backup on 1st at 3:00 AM 0 3 1 * * root /data/app/igny8/scripts/ops/backup-full.sh monthly >> /data/logs/backup.log 2>&1 # Health check every 5 minutes */5 * * * * root /data/app/igny8/scripts/ops/health-check.sh >> /data/logs/health.log 2>&1 # Log rotation daily at midnight 0 0 * * * root /data/app/igny8/scripts/ops/log-rotate.sh >> /data/logs/maintenance.log 2>&1 ``` --- ## 🌍 Environment Management ### Environment Comparison | Aspect | Development | Staging | Production | |--------|-------------|---------|------------| | **Domain** | localhost:5173 | staging.igny8.com | app.igny8.com | | **API** | localhost:8010 | staging-api.igny8.com | api.igny8.com | | **Database** | igny8_dev_db | igny8_staging_db | igny8_db | | **Redis DB** | 2 | 1 | 0 | | **Debug** | True | False | False | | **AI Keys** | Test/Limited | Test/Limited | Production | | **Payments** | Sandbox | Sandbox | Live | | **Compose File** | docker-compose.dev.yml | docker-compose.staging.yml | docker-compose.app.yml | | **Project Name** | igny8-dev | igny8-staging | igny8-app | ### Port Allocation | Service | Dev | Staging | Production | |---------|-----|---------|------------| | Backend | 8010 | 8012 | 8011 | | Frontend | 5173 | 8024 | 8021 | | Marketing | 5174 | 8026 | 8023 | | Flower | - | 5556 | 5555 | --- ## πŸš€ Deployment Workflow ### Safe Deployment Checklist ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ DEPLOYMENT CHECKLIST β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ PRE-DEPLOYMENT β”‚ β”‚ β–‘ All tests passing on staging? β”‚ β”‚ β–‘ Database migrations reviewed? β”‚ β”‚ β–‘ Backup created? β”‚ β”‚ β–‘ Rollback plan ready? β”‚ β”‚ β–‘ Team notified? β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ DEPLOYMENT β”‚ β”‚ β–‘ Create pre-deploy backup β”‚ β”‚ β–‘ Tag current images for rollback β”‚ β”‚ β–‘ Pull latest code β”‚ β”‚ β–‘ Build new images β”‚ β”‚ β–‘ Apply migrations β”‚ β”‚ β–‘ Restart containers β”‚ β”‚ β–‘ Verify health check β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ POST-DEPLOYMENT β”‚ β”‚ β–‘ Monitor logs for 10 minutes β”‚ β”‚ β–‘ Test critical paths (login, API, AI functions) β”‚ β”‚ β–‘ Check error rates β”‚ β”‚ β–‘ If issues β†’ ROLLBACK β”‚ β”‚ β–‘ Update changelog β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Git Branch Strategy ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ main β”‚ ← Production deployments β””β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”˜ β”‚ merge (after staging approval) β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β” β”‚ staging β”‚ ← Staging deployments β””β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”˜ β”‚ merge β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β” β”‚feature/xyz β”‚ β”‚feature/abc β”‚ β”‚hotfix/urgent β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` --- ## πŸ₯ Health Monitoring ### Health Check Endpoints | Endpoint | Purpose | Expected Response | |----------|---------|-------------------| | `/api/v1/system/status/` | Overall system status | `{"status": "healthy"}` | | `/api/v1/system/health/` | Detailed component health | JSON with all components | ### Monitoring Targets 1. **Backend API** - Response time < 500ms 2. **Database** - Connection pool healthy 3. **Redis** - Connection alive 4. **Celery Workers** - Queue length < 100 5. **Celery Beat** - Scheduler running 6. **Disk Space** - > 20% free 7. **Memory** - < 80% used ### Alert Thresholds | Metric | Warning | Critical | |--------|---------|----------| | API Response Time | > 1s | > 5s | | Error Rate | > 1% | > 5% | | CPU Usage | > 70% | > 90% | | Memory Usage | > 70% | > 90% | | Disk Usage | > 70% | > 90% | | Celery Queue | > 50 | > 200 | --- ## πŸ”§ Common Operations ### Daily Operations ```bash # Check system health /data/app/igny8/scripts/ops/health-check.sh # View logs tail -f /data/logs/production/backend.log # Check container status docker compose -f /data/app/igny8/docker-compose.app.yml -p igny8-app ps ``` ### Weekly Operations ```bash # Review backup status ls -la /data/backups/daily/ du -sh /data/backups/* # Check disk space df -h # Review error logs grep -i error /data/logs/production/backend.log | tail -50 ``` ### Emergency Procedures ```bash # Immediate rollback /data/app/igny8/scripts/ops/rollback.sh # Emergency restart docker compose -f /data/app/igny8/docker-compose.app.yml -p igny8-app restart # Emergency database restore /data/app/igny8/scripts/ops/restore-db.sh /data/backups/latest.sql.gz ``` --- ## πŸ“Š What's Missing (Action Items) ### Priority 1 - Critical (Before Go-Live) | Item | Status | Action | |------|--------|--------| | `docker-compose.staging.yml` | ❌ Missing | Create from documentation | | `.env.staging` | ❌ Missing | Create from example | | Deployment scripts | ❌ Missing | Create all ops scripts | | Automated backup cron | ❌ Missing | Set up cron jobs | | Pre-deploy backup | ❌ Missing | Add to deploy script | ### Priority 2 - Important (First Week) | Item | Status | Action | |------|--------|--------| | Health check automation | ❌ Missing | Create monitoring | | Log rotation | ❌ Missing | Set up logrotate | | Staging DNS | ❌ Unknown | Configure if needed | | Caddyfile staging routes | ❌ Unknown | Add staging domains | ### Priority 3 - Nice to Have (First Month) | Item | Status | Action | |------|--------|--------| | CI/CD pipeline | ❌ Not set | Optional automation | | External monitoring | ❌ Not set | UptimeRobot/Datadog | | Alerting system | ❌ Not set | Email/Slack alerts | --- ## Next Steps 1. **Create ops scripts directory**: `/data/app/igny8/scripts/ops/` 2. **Create all deployment scripts** (see STAGING-SETUP-GUIDE.md) 3. **Create staging compose file** (copy from documentation) 4. **Set up automated backups** 5. **Test complete deployment cycle** on staging 6. **Go live with confidence** --- ## Related Documentation - [STAGING-SETUP-GUIDE.md](final-clean-best-deployment-plan/STAGING-SETUP-GUIDE.md) - Detailed staging setup - [TWO-REPO-ARCHITECTURE.md](final-clean-best-deployment-plan/TWO-REPO-ARCHITECTURE.md) - Architecture overview - [INFRASTRUCTURE-STACK.md](final-clean-best-deployment-plan/INFRASTRUCTURE-STACK.md) - Stack details