11 KiB
11 KiB
DevOps Operations Guide
Purpose: Complete operational procedures for managing IGNY8 in production
Version: 1.0
Last Updated: January 20, 2026
📋 Executive Summary
This document provides a complete structure for:
- Automated Backups - Regular database + config backups
- Environment Management - Dev vs Staging vs Production
- Health Monitoring - Automated health checks & alerts
- Disaster Recovery - Quick recovery procedures
- Change Management - Safe deployment workflow
🗂️ Directory Structure (To Be Implemented)
/data/
├── app/
│ └── igny8/ # Application code
│ ├── docker-compose.app.yml # Production compose ✅
│ ├── docker-compose.staging.yml # Staging compose ⚠️ TO CREATE
│ ├── .env # Production env
│ ├── .env.staging # Staging env ⚠️ TO CREATE
│ └── scripts/
│ └── ops/ # ⚠️ TO CREATE
│ ├── backup-db.sh # Database backup
│ ├── backup-full.sh # Full backup (db + code + config)
│ ├── restore-db.sh # Database restore
│ ├── deploy-staging.sh # Deploy to staging
│ ├── deploy-production.sh# Deploy to production
│ ├── rollback.sh # Rollback deployment
│ ├── health-check.sh # System health check
│ ├── sync-prod-to-staging.sh # Sync data
│ └── log-rotate.sh # Log rotation
│
├── backups/ # Backup storage
│ ├── daily/ # Daily automated backups
│ │ └── YYYYMMDD/
│ │ ├── db_igny8_YYYYMMDD_HHMMSS.sql.gz
│ │ └── config_YYYYMMDD.tar.gz
│ ├── weekly/ # Weekly backups (kept 4 weeks)
│ ├── monthly/ # Monthly backups (kept 12 months)
│ └── pre-deploy/ # Pre-deployment snapshots
│ └── YYYYMMDD_HHMMSS/
│
├── logs/ # Centralized logs
│ ├── production/
│ │ ├── backend.log
│ │ ├── celery-worker.log
│ │ ├── celery-beat.log
│ │ └── access.log
│ ├── staging/
│ └── caddy/
│
└── stack/ # Infrastructure stack
└── igny8-stack/ # (Future - not yet separated)
🔄 Automated Backup System
Backup Strategy
| Type | Frequency | Retention | Content |
|---|---|---|---|
| Daily | 1:00 AM | 7 days | Database + configs |
| Weekly | Sunday 2:00 AM | 4 weeks | Full backup |
| Monthly | 1st of month | 12 months | Full backup |
| Pre-Deploy | Before each deploy | 5 most recent | Database snapshot |
Cron Schedule
# /etc/cron.d/igny8-backup
# Daily database backup at 1:00 AM
0 1 * * * root /data/app/igny8/scripts/ops/backup-db.sh daily >> /data/logs/backup.log 2>&1
# Weekly full backup on Sunday at 2:00 AM
0 2 * * 0 root /data/app/igny8/scripts/ops/backup-full.sh weekly >> /data/logs/backup.log 2>&1
# Monthly full backup on 1st at 3:00 AM
0 3 1 * * root /data/app/igny8/scripts/ops/backup-full.sh monthly >> /data/logs/backup.log 2>&1
# Health check every 5 minutes
*/5 * * * * root /data/app/igny8/scripts/ops/health-check.sh >> /data/logs/health.log 2>&1
# Log rotation daily at midnight
0 0 * * * root /data/app/igny8/scripts/ops/log-rotate.sh >> /data/logs/maintenance.log 2>&1
🌍 Environment Management
Environment Comparison
| Aspect | Development | Staging | Production |
|---|---|---|---|
| Domain | localhost:5173 | staging.igny8.com | app.igny8.com |
| API | localhost:8010 | staging-api.igny8.com | api.igny8.com |
| Database | igny8_dev_db | igny8_staging_db | igny8_db |
| Redis DB | 2 | 1 | 0 |
| Debug | True | False | False |
| AI Keys | Test/Limited | Test/Limited | Production |
| Payments | Sandbox | Sandbox | Live |
| Compose File | docker-compose.dev.yml | docker-compose.staging.yml | docker-compose.app.yml |
| Project Name | igny8-dev | igny8-staging | igny8-app |
Port Allocation
| Service | Dev | Staging | Production |
|---|---|---|---|
| Backend | 8010 | 8012 | 8011 |
| Frontend | 5173 | 8024 | 8021 |
| Marketing | 5174 | 8026 | 8023 |
| Flower | - | 5556 | 5555 |
🚀 Deployment Workflow
Safe Deployment Checklist
┌─────────────────────────────────────────────────────────────┐
│ DEPLOYMENT CHECKLIST │
├─────────────────────────────────────────────────────────────┤
│ PRE-DEPLOYMENT │
│ □ All tests passing on staging? │
│ □ Database migrations reviewed? │
│ □ Backup created? │
│ □ Rollback plan ready? │
│ □ Team notified? │
├─────────────────────────────────────────────────────────────┤
│ DEPLOYMENT │
│ □ Create pre-deploy backup │
│ □ Tag current images for rollback │
│ □ Pull latest code │
│ □ Build new images │
│ □ Apply migrations │
│ □ Restart containers │
│ □ Verify health check │
├─────────────────────────────────────────────────────────────┤
│ POST-DEPLOYMENT │
│ □ Monitor logs for 10 minutes │
│ □ Test critical paths (login, API, AI functions) │
│ □ Check error rates │
│ □ If issues → ROLLBACK │
│ □ Update changelog │
└─────────────────────────────────────────────────────────────┘
Git Branch Strategy
┌──────────┐
│ main │ ← Production deployments
└────▲─────┘
│ merge (after staging approval)
┌────┴─────┐
│ staging │ ← Staging deployments
└────▲─────┘
│ merge
┌────────────────┼────────────────┐
│ │ │
┌───────┴───────┐ ┌──────┴──────┐ ┌───────┴───────┐
│feature/xyz │ │feature/abc │ │hotfix/urgent │
└───────────────┘ └─────────────┘ └───────────────┘
🏥 Health Monitoring
Health Check Endpoints
| Endpoint | Purpose | Expected Response |
|---|---|---|
/api/v1/system/status/ |
Overall system status | {"status": "healthy"} |
/api/v1/system/health/ |
Detailed component health | JSON with all components |
Monitoring Targets
- Backend API - Response time < 500ms
- Database - Connection pool healthy
- Redis - Connection alive
- Celery Workers - Queue length < 100
- Celery Beat - Scheduler running
- Disk Space - > 20% free
- Memory - < 80% used
Alert Thresholds
| Metric | Warning | Critical |
|---|---|---|
| API Response Time | > 1s | > 5s |
| Error Rate | > 1% | > 5% |
| CPU Usage | > 70% | > 90% |
| Memory Usage | > 70% | > 90% |
| Disk Usage | > 70% | > 90% |
| Celery Queue | > 50 | > 200 |
🔧 Common Operations
Daily Operations
# Check system health
/data/app/igny8/scripts/ops/health-check.sh
# View logs
tail -f /data/logs/production/backend.log
# Check container status
docker compose -f /data/app/igny8/docker-compose.app.yml -p igny8-app ps
Weekly Operations
# Review backup status
ls -la /data/backups/daily/
du -sh /data/backups/*
# Check disk space
df -h
# Review error logs
grep -i error /data/logs/production/backend.log | tail -50
Emergency Procedures
# Immediate rollback
/data/app/igny8/scripts/ops/rollback.sh
# Emergency restart
docker compose -f /data/app/igny8/docker-compose.app.yml -p igny8-app restart
# Emergency database restore
/data/app/igny8/scripts/ops/restore-db.sh /data/backups/latest.sql.gz
📊 What's Missing (Action Items)
Priority 1 - Critical (Before Go-Live)
| Item | Status | Action |
|---|---|---|
docker-compose.staging.yml |
❌ Missing | Create from documentation |
.env.staging |
❌ Missing | Create from example |
| Deployment scripts | ❌ Missing | Create all ops scripts |
| Automated backup cron | ❌ Missing | Set up cron jobs |
| Pre-deploy backup | ❌ Missing | Add to deploy script |
Priority 2 - Important (First Week)
| Item | Status | Action |
|---|---|---|
| Health check automation | ❌ Missing | Create monitoring |
| Log rotation | ❌ Missing | Set up logrotate |
| Staging DNS | ❌ Unknown | Configure if needed |
| Caddyfile staging routes | ❌ Unknown | Add staging domains |
Priority 3 - Nice to Have (First Month)
| Item | Status | Action |
|---|---|---|
| CI/CD pipeline | ❌ Not set | Optional automation |
| External monitoring | ❌ Not set | UptimeRobot/Datadog |
| Alerting system | ❌ Not set | Email/Slack alerts |
Next Steps
- Create ops scripts directory:
/data/app/igny8/scripts/ops/ - Create all deployment scripts (see STAGING-SETUP-GUIDE.md)
- Create staging compose file (copy from documentation)
- Set up automated backups
- Test complete deployment cycle on staging
- Go live with confidence
Related Documentation
- STAGING-SETUP-GUIDE.md - Detailed staging setup
- TWO-REPO-ARCHITECTURE.md - Architecture overview
- INFRASTRUCTURE-STACK.md - Stack details