304 lines
11 KiB
Markdown
304 lines
11 KiB
Markdown
# DevOps Operations Guide
|
|
|
|
**Purpose:** Complete operational procedures for managing IGNY8 in production
|
|
**Version:** 1.0
|
|
**Last Updated:** January 20, 2026
|
|
|
|
---
|
|
|
|
## 📋 Executive Summary
|
|
|
|
This document provides a complete structure for:
|
|
1. **Automated Backups** - Regular database + config backups
|
|
2. **Environment Management** - Dev vs Staging vs Production
|
|
3. **Health Monitoring** - Automated health checks & alerts
|
|
4. **Disaster Recovery** - Quick recovery procedures
|
|
5. **Change Management** - Safe deployment workflow
|
|
|
|
---
|
|
|
|
## 🗂️ Directory Structure (To Be Implemented)
|
|
|
|
```
|
|
/data/
|
|
├── app/
|
|
│ └── igny8/ # Application code
|
|
│ ├── docker-compose.app.yml # Production compose ✅
|
|
│ ├── docker-compose.staging.yml # Staging compose ⚠️ TO CREATE
|
|
│ ├── .env # Production env
|
|
│ ├── .env.staging # Staging env ⚠️ TO CREATE
|
|
│ └── scripts/
|
|
│ └── ops/ # ⚠️ TO CREATE
|
|
│ ├── backup-db.sh # Database backup
|
|
│ ├── backup-full.sh # Full backup (db + code + config)
|
|
│ ├── restore-db.sh # Database restore
|
|
│ ├── deploy-staging.sh # Deploy to staging
|
|
│ ├── deploy-production.sh# Deploy to production
|
|
│ ├── rollback.sh # Rollback deployment
|
|
│ ├── health-check.sh # System health check
|
|
│ ├── sync-prod-to-staging.sh # Sync data
|
|
│ └── log-rotate.sh # Log rotation
|
|
│
|
|
├── backups/ # Backup storage
|
|
│ ├── daily/ # Daily automated backups
|
|
│ │ └── YYYYMMDD/
|
|
│ │ ├── db_igny8_YYYYMMDD_HHMMSS.sql.gz
|
|
│ │ └── config_YYYYMMDD.tar.gz
|
|
│ ├── weekly/ # Weekly backups (kept 4 weeks)
|
|
│ ├── monthly/ # Monthly backups (kept 12 months)
|
|
│ └── pre-deploy/ # Pre-deployment snapshots
|
|
│ └── YYYYMMDD_HHMMSS/
|
|
│
|
|
├── logs/ # Centralized logs
|
|
│ ├── production/
|
|
│ │ ├── backend.log
|
|
│ │ ├── celery-worker.log
|
|
│ │ ├── celery-beat.log
|
|
│ │ └── access.log
|
|
│ ├── staging/
|
|
│ └── caddy/
|
|
│
|
|
└── stack/ # Infrastructure stack
|
|
└── igny8-stack/ # (Future - not yet separated)
|
|
```
|
|
|
|
---
|
|
|
|
## 🔄 Automated Backup System
|
|
|
|
### Backup Strategy
|
|
|
|
| Type | Frequency | Retention | Content |
|
|
|------|-----------|-----------|---------|
|
|
| **Daily** | 1:00 AM | 7 days | Database + configs |
|
|
| **Weekly** | Sunday 2:00 AM | 4 weeks | Full backup |
|
|
| **Monthly** | 1st of month | 12 months | Full backup |
|
|
| **Pre-Deploy** | Before each deploy | 5 most recent | Database snapshot |
|
|
|
|
### Cron Schedule
|
|
|
|
```bash
|
|
# /etc/cron.d/igny8-backup
|
|
|
|
# Daily database backup at 1:00 AM
|
|
0 1 * * * root /data/app/igny8/scripts/ops/backup-db.sh daily >> /data/logs/backup.log 2>&1
|
|
|
|
# Weekly full backup on Sunday at 2:00 AM
|
|
0 2 * * 0 root /data/app/igny8/scripts/ops/backup-full.sh weekly >> /data/logs/backup.log 2>&1
|
|
|
|
# Monthly full backup on 1st at 3:00 AM
|
|
0 3 1 * * root /data/app/igny8/scripts/ops/backup-full.sh monthly >> /data/logs/backup.log 2>&1
|
|
|
|
# Health check every 5 minutes
|
|
*/5 * * * * root /data/app/igny8/scripts/ops/health-check.sh >> /data/logs/health.log 2>&1
|
|
|
|
# Log rotation daily at midnight
|
|
0 0 * * * root /data/app/igny8/scripts/ops/log-rotate.sh >> /data/logs/maintenance.log 2>&1
|
|
```
|
|
|
|
---
|
|
|
|
## 🌍 Environment Management
|
|
|
|
### Environment Comparison
|
|
|
|
| Aspect | Development | Staging | Production |
|
|
|--------|-------------|---------|------------|
|
|
| **Domain** | localhost:5173 | staging.igny8.com | app.igny8.com |
|
|
| **API** | localhost:8010 | staging-api.igny8.com | api.igny8.com |
|
|
| **Database** | igny8_dev_db | igny8_staging_db | igny8_db |
|
|
| **Redis DB** | 2 | 1 | 0 |
|
|
| **Debug** | True | False | False |
|
|
| **AI Keys** | Test/Limited | Test/Limited | Production |
|
|
| **Payments** | Sandbox | Sandbox | Live |
|
|
| **Compose File** | docker-compose.dev.yml | docker-compose.staging.yml | docker-compose.app.yml |
|
|
| **Project Name** | igny8-dev | igny8-staging | igny8-app |
|
|
|
|
### Port Allocation
|
|
|
|
| Service | Dev | Staging | Production |
|
|
|---------|-----|---------|------------|
|
|
| Backend | 8010 | 8012 | 8011 |
|
|
| Frontend | 5173 | 8024 | 8021 |
|
|
| Marketing | 5174 | 8026 | 8023 |
|
|
| Flower | - | 5556 | 5555 |
|
|
|
|
---
|
|
|
|
## 🚀 Deployment Workflow
|
|
|
|
### Safe Deployment Checklist
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ DEPLOYMENT CHECKLIST │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ PRE-DEPLOYMENT │
|
|
│ □ All tests passing on staging? │
|
|
│ □ Database migrations reviewed? │
|
|
│ □ Backup created? │
|
|
│ □ Rollback plan ready? │
|
|
│ □ Team notified? │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ DEPLOYMENT │
|
|
│ □ Create pre-deploy backup │
|
|
│ □ Tag current images for rollback │
|
|
│ □ Pull latest code │
|
|
│ □ Build new images │
|
|
│ □ Apply migrations │
|
|
│ □ Restart containers │
|
|
│ □ Verify health check │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ POST-DEPLOYMENT │
|
|
│ □ Monitor logs for 10 minutes │
|
|
│ □ Test critical paths (login, API, AI functions) │
|
|
│ □ Check error rates │
|
|
│ □ If issues → ROLLBACK │
|
|
│ □ Update changelog │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Git Branch Strategy
|
|
|
|
```
|
|
┌──────────┐
|
|
│ main │ ← Production deployments
|
|
└────▲─────┘
|
|
│ merge (after staging approval)
|
|
┌────┴─────┐
|
|
│ staging │ ← Staging deployments
|
|
└────▲─────┘
|
|
│ merge
|
|
┌────────────────┼────────────────┐
|
|
│ │ │
|
|
┌───────┴───────┐ ┌──────┴──────┐ ┌───────┴───────┐
|
|
│feature/xyz │ │feature/abc │ │hotfix/urgent │
|
|
└───────────────┘ └─────────────┘ └───────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 🏥 Health Monitoring
|
|
|
|
### Health Check Endpoints
|
|
|
|
| Endpoint | Purpose | Expected Response |
|
|
|----------|---------|-------------------|
|
|
| `/api/v1/system/status/` | Overall system status | `{"status": "healthy"}` |
|
|
| `/api/v1/system/health/` | Detailed component health | JSON with all components |
|
|
|
|
### Monitoring Targets
|
|
|
|
1. **Backend API** - Response time < 500ms
|
|
2. **Database** - Connection pool healthy
|
|
3. **Redis** - Connection alive
|
|
4. **Celery Workers** - Queue length < 100
|
|
5. **Celery Beat** - Scheduler running
|
|
6. **Disk Space** - > 20% free
|
|
7. **Memory** - < 80% used
|
|
|
|
### Alert Thresholds
|
|
|
|
| Metric | Warning | Critical |
|
|
|--------|---------|----------|
|
|
| API Response Time | > 1s | > 5s |
|
|
| Error Rate | > 1% | > 5% |
|
|
| CPU Usage | > 70% | > 90% |
|
|
| Memory Usage | > 70% | > 90% |
|
|
| Disk Usage | > 70% | > 90% |
|
|
| Celery Queue | > 50 | > 200 |
|
|
|
|
---
|
|
|
|
## 🔧 Common Operations
|
|
|
|
### Daily Operations
|
|
|
|
```bash
|
|
# Check system health
|
|
/data/app/igny8/scripts/ops/health-check.sh
|
|
|
|
# View logs
|
|
tail -f /data/logs/production/backend.log
|
|
|
|
# Check container status
|
|
docker compose -f /data/app/igny8/docker-compose.app.yml -p igny8-app ps
|
|
```
|
|
|
|
### Weekly Operations
|
|
|
|
```bash
|
|
# Review backup status
|
|
ls -la /data/backups/daily/
|
|
du -sh /data/backups/*
|
|
|
|
# Check disk space
|
|
df -h
|
|
|
|
# Review error logs
|
|
grep -i error /data/logs/production/backend.log | tail -50
|
|
```
|
|
|
|
### Emergency Procedures
|
|
|
|
```bash
|
|
# Immediate rollback
|
|
/data/app/igny8/scripts/ops/rollback.sh
|
|
|
|
# Emergency restart
|
|
docker compose -f /data/app/igny8/docker-compose.app.yml -p igny8-app restart
|
|
|
|
# Emergency database restore
|
|
/data/app/igny8/scripts/ops/restore-db.sh /data/backups/latest.sql.gz
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 What's Missing (Action Items)
|
|
|
|
### Priority 1 - Critical (Before Go-Live)
|
|
|
|
| Item | Status | Action |
|
|
|------|--------|--------|
|
|
| `docker-compose.staging.yml` | ❌ Missing | Create from documentation |
|
|
| `.env.staging` | ❌ Missing | Create from example |
|
|
| Deployment scripts | ❌ Missing | Create all ops scripts |
|
|
| Automated backup cron | ❌ Missing | Set up cron jobs |
|
|
| Pre-deploy backup | ❌ Missing | Add to deploy script |
|
|
|
|
### Priority 2 - Important (First Week)
|
|
|
|
| Item | Status | Action |
|
|
|------|--------|--------|
|
|
| Health check automation | ❌ Missing | Create monitoring |
|
|
| Log rotation | ❌ Missing | Set up logrotate |
|
|
| Staging DNS | ❌ Unknown | Configure if needed |
|
|
| Caddyfile staging routes | ❌ Unknown | Add staging domains |
|
|
|
|
### Priority 3 - Nice to Have (First Month)
|
|
|
|
| Item | Status | Action |
|
|
|------|--------|--------|
|
|
| CI/CD pipeline | ❌ Not set | Optional automation |
|
|
| External monitoring | ❌ Not set | UptimeRobot/Datadog |
|
|
| Alerting system | ❌ Not set | Email/Slack alerts |
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Create ops scripts directory**: `/data/app/igny8/scripts/ops/`
|
|
2. **Create all deployment scripts** (see STAGING-SETUP-GUIDE.md)
|
|
3. **Create staging compose file** (copy from documentation)
|
|
4. **Set up automated backups**
|
|
5. **Test complete deployment cycle** on staging
|
|
6. **Go live with confidence**
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- [STAGING-SETUP-GUIDE.md](final-clean-best-deployment-plan/STAGING-SETUP-GUIDE.md) - Detailed staging setup
|
|
- [TWO-REPO-ARCHITECTURE.md](final-clean-best-deployment-plan/TWO-REPO-ARCHITECTURE.md) - Architecture overview
|
|
- [INFRASTRUCTURE-STACK.md](final-clean-best-deployment-plan/INFRASTRUCTURE-STACK.md) - Stack details
|