Dev ops prep

2026-01-21 17:53:42 +00:00
parent 3398130bff
commit 4a200822bb
14 changed files with 2027 additions and 0 deletions
--- a/docs/50-DEPLOYMENT/DEVOPS-OPERATIONS-GUIDE.md
+++ b/docs/50-DEPLOYMENT/DEVOPS-OPERATIONS-GUIDE.md
@@ -0,0 +1,303 @@
+# DevOps Operations Guide
+
+**Purpose:** Complete operational procedures for managing IGNY8 in production  
+**Version:** 1.0  
+**Last Updated:** January 20, 2026
+
+---
+
+## 📋 Executive Summary
+
+This document provides a complete structure for:
+1. **Automated Backups** - Regular database + config backups
+2. **Environment Management** - Dev vs Staging vs Production
+3. **Health Monitoring** - Automated health checks & alerts
+4. **Disaster Recovery** - Quick recovery procedures
+5. **Change Management** - Safe deployment workflow
+
+---
+
+## 🗂️ Directory Structure (To Be Implemented)
+
+```
+/data/
+├── app/
+│   └── igny8/                          # Application code
+│       ├── docker-compose.app.yml      # Production compose ✅
+│       ├── docker-compose.staging.yml  # Staging compose ⚠️ TO CREATE
+│       ├── .env                        # Production env
+│       ├── .env.staging                # Staging env ⚠️ TO CREATE
+│       └── scripts/
+│           └── ops/                    # ⚠️ TO CREATE
+│               ├── backup-db.sh        # Database backup
+│               ├── backup-full.sh      # Full backup (db + code + config)
+│               ├── restore-db.sh       # Database restore
+│               ├── deploy-staging.sh   # Deploy to staging
+│               ├── deploy-production.sh# Deploy to production
+│               ├── rollback.sh         # Rollback deployment
+│               ├── health-check.sh     # System health check
+│               ├── sync-prod-to-staging.sh  # Sync data
+│               └── log-rotate.sh       # Log rotation
+│
+├── backups/                            # Backup storage
+│   ├── daily/                          # Daily automated backups
+│   │   └── YYYYMMDD/
+│   │       ├── db_igny8_YYYYMMDD_HHMMSS.sql.gz
+│   │       └── config_YYYYMMDD.tar.gz
+│   ├── weekly/                         # Weekly backups (kept 4 weeks)
+│   ├── monthly/                        # Monthly backups (kept 12 months)
+│   └── pre-deploy/                     # Pre-deployment snapshots
+│       └── YYYYMMDD_HHMMSS/
+│
+├── logs/                               # Centralized logs
+│   ├── production/
+│   │   ├── backend.log
+│   │   ├── celery-worker.log
+│   │   ├── celery-beat.log
+│   │   └── access.log
+│   ├── staging/
+│   └── caddy/
+│
+└── stack/                              # Infrastructure stack
+    └── igny8-stack/                    # (Future - not yet separated)
+```
+
+---
+
+## 🔄 Automated Backup System
+
+### Backup Strategy
+
+| Type | Frequency | Retention | Content |
+|------|-----------|-----------|---------|
+| **Daily** | 1:00 AM | 7 days | Database + configs |
+| **Weekly** | Sunday 2:00 AM | 4 weeks | Full backup |
+| **Monthly** | 1st of month | 12 months | Full backup |
+| **Pre-Deploy** | Before each deploy | 5 most recent | Database snapshot |
+
+### Cron Schedule
+
+```bash
+# /etc/cron.d/igny8-backup
+
+# Daily database backup at 1:00 AM
+0 1 * * * root /data/app/igny8/scripts/ops/backup-db.sh daily >> /data/logs/backup.log 2>&1
+
+# Weekly full backup on Sunday at 2:00 AM
+0 2 * * 0 root /data/app/igny8/scripts/ops/backup-full.sh weekly >> /data/logs/backup.log 2>&1
+
+# Monthly full backup on 1st at 3:00 AM
+0 3 1 * * root /data/app/igny8/scripts/ops/backup-full.sh monthly >> /data/logs/backup.log 2>&1
+
+# Health check every 5 minutes
+*/5 * * * * root /data/app/igny8/scripts/ops/health-check.sh >> /data/logs/health.log 2>&1
+
+# Log rotation daily at midnight
+0 0 * * * root /data/app/igny8/scripts/ops/log-rotate.sh >> /data/logs/maintenance.log 2>&1
+```
+
+---
+
+## 🌍 Environment Management
+
+### Environment Comparison
+
+| Aspect | Development | Staging | Production |
+|--------|-------------|---------|------------|
+| **Domain** | localhost:5173 | staging.igny8.com | app.igny8.com |
+| **API** | localhost:8010 | staging-api.igny8.com | api.igny8.com |
+| **Database** | igny8_dev_db | igny8_staging_db | igny8_db |
+| **Redis DB** | 2 | 1 | 0 |
+| **Debug** | True | False | False |
+| **AI Keys** | Test/Limited | Test/Limited | Production |
+| **Payments** | Sandbox | Sandbox | Live |
+| **Compose File** | docker-compose.dev.yml | docker-compose.staging.yml | docker-compose.app.yml |
+| **Project Name** | igny8-dev | igny8-staging | igny8-app |
+
+### Port Allocation
+
+| Service | Dev | Staging | Production |
+|---------|-----|---------|------------|
+| Backend | 8010 | 8012 | 8011 |
+| Frontend | 5173 | 8024 | 8021 |
+| Marketing | 5174 | 8026 | 8023 |
+| Flower | - | 5556 | 5555 |
+
+---
+
+## 🚀 Deployment Workflow
+
+### Safe Deployment Checklist
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    DEPLOYMENT CHECKLIST                      │
+├─────────────────────────────────────────────────────────────┤
+│ PRE-DEPLOYMENT                                               │
+│ □ All tests passing on staging?                              │
+│ □ Database migrations reviewed?                              │
+│ □ Backup created?                                            │
+│ □ Rollback plan ready?                                       │
+│ □ Team notified?                                             │
+├─────────────────────────────────────────────────────────────┤
+│ DEPLOYMENT                                                   │
+│ □ Create pre-deploy backup                                   │
+│ □ Tag current images for rollback                            │
+│ □ Pull latest code                                           │
+│ □ Build new images                                           │
+│ □ Apply migrations                                           │
+│ □ Restart containers                                         │
+│ □ Verify health check                                        │
+├─────────────────────────────────────────────────────────────┤
+│ POST-DEPLOYMENT                                              │
+│ □ Monitor logs for 10 minutes                                │
+│ □ Test critical paths (login, API, AI functions)             │
+│ □ Check error rates                                          │
+│ □ If issues → ROLLBACK                                       │
+│ □ Update changelog                                           │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### Git Branch Strategy
+
+```
+                    ┌──────────┐
+                    │   main   │  ← Production deployments
+                    └────▲─────┘
+                         │ merge (after staging approval)
+                    ┌────┴─────┐
+                    │ staging  │  ← Staging deployments
+                    └────▲─────┘
+                         │ merge
+        ┌────────────────┼────────────────┐
+        │                │                │
+┌───────┴───────┐ ┌──────┴──────┐ ┌───────┴───────┐
+│feature/xyz    │ │feature/abc  │ │hotfix/urgent  │
+└───────────────┘ └─────────────┘ └───────────────┘
+```
+
+---
+
+## 🏥 Health Monitoring
+
+### Health Check Endpoints
+
+| Endpoint | Purpose | Expected Response |
+|----------|---------|-------------------|
+| `/api/v1/system/status/` | Overall system status | `{"status": "healthy"}` |
+| `/api/v1/system/health/` | Detailed component health | JSON with all components |
+
+### Monitoring Targets
+
+1. **Backend API** - Response time < 500ms
+2. **Database** - Connection pool healthy
+3. **Redis** - Connection alive
+4. **Celery Workers** - Queue length < 100
+5. **Celery Beat** - Scheduler running
+6. **Disk Space** - > 20% free
+7. **Memory** - < 80% used
+
+### Alert Thresholds
+
+| Metric | Warning | Critical |
+|--------|---------|----------|
+| API Response Time | > 1s | > 5s |
+| Error Rate | > 1% | > 5% |
+| CPU Usage | > 70% | > 90% |
+| Memory Usage | > 70% | > 90% |
+| Disk Usage | > 70% | > 90% |
+| Celery Queue | > 50 | > 200 |
+
+---
+
+## 🔧 Common Operations
+
+### Daily Operations
+
+```bash
+# Check system health
+/data/app/igny8/scripts/ops/health-check.sh
+
+# View logs
+tail -f /data/logs/production/backend.log
+
+# Check container status
+docker compose -f /data/app/igny8/docker-compose.app.yml -p igny8-app ps
+```
+
+### Weekly Operations
+
+```bash
+# Review backup status
+ls -la /data/backups/daily/
+du -sh /data/backups/*
+
+# Check disk space
+df -h
+
+# Review error logs
+grep -i error /data/logs/production/backend.log | tail -50
+```
+
+### Emergency Procedures
+
+```bash
+# Immediate rollback
+/data/app/igny8/scripts/ops/rollback.sh
+
+# Emergency restart
+docker compose -f /data/app/igny8/docker-compose.app.yml -p igny8-app restart
+
+# Emergency database restore
+/data/app/igny8/scripts/ops/restore-db.sh /data/backups/latest.sql.gz
+```
+
+---
+
+## 📊 What's Missing (Action Items)
+
+### Priority 1 - Critical (Before Go-Live)
+
+| Item | Status | Action |
+|------|--------|--------|
+| `docker-compose.staging.yml` | ❌ Missing | Create from documentation |
+| `.env.staging` | ❌ Missing | Create from example |
+| Deployment scripts | ❌ Missing | Create all ops scripts |
+| Automated backup cron | ❌ Missing | Set up cron jobs |
+| Pre-deploy backup | ❌ Missing | Add to deploy script |
+
+### Priority 2 - Important (First Week)
+
+| Item | Status | Action |
+|------|--------|--------|
+| Health check automation | ❌ Missing | Create monitoring |
+| Log rotation | ❌ Missing | Set up logrotate |
+| Staging DNS | ❌ Unknown | Configure if needed |
+| Caddyfile staging routes | ❌ Unknown | Add staging domains |
+
+### Priority 3 - Nice to Have (First Month)
+
+| Item | Status | Action |
+|------|--------|--------|
+| CI/CD pipeline | ❌ Not set | Optional automation |
+| External monitoring | ❌ Not set | UptimeRobot/Datadog |
+| Alerting system | ❌ Not set | Email/Slack alerts |
+
+---
+
+## Next Steps
+
+1. **Create ops scripts directory**: `/data/app/igny8/scripts/ops/`
+2. **Create all deployment scripts** (see STAGING-SETUP-GUIDE.md)
+3. **Create staging compose file** (copy from documentation)
+4. **Set up automated backups**
+5. **Test complete deployment cycle** on staging
+6. **Go live with confidence**
+
+---
+
+## Related Documentation
+
+- [STAGING-SETUP-GUIDE.md](final-clean-best-deployment-plan/STAGING-SETUP-GUIDE.md) - Detailed staging setup
+- [TWO-REPO-ARCHITECTURE.md](final-clean-best-deployment-plan/TWO-REPO-ARCHITECTURE.md) - Architecture overview
+- [INFRASTRUCTURE-STACK.md](final-clean-best-deployment-plan/INFRASTRUCTURE-STACK.md) - Stack details