Dev ops prep

This commit is contained in:
IGNY8 VPS (Salman)
2026-01-21 17:53:42 +00:00
parent 3398130bff
commit 4a200822bb
14 changed files with 2027 additions and 0 deletions

View File

@@ -0,0 +1,303 @@
# DevOps Operations Guide
**Purpose:** Complete operational procedures for managing IGNY8 in production
**Version:** 1.0
**Last Updated:** January 20, 2026
---
## 📋 Executive Summary
This document provides a complete structure for:
1. **Automated Backups** - Regular database + config backups
2. **Environment Management** - Dev vs Staging vs Production
3. **Health Monitoring** - Automated health checks & alerts
4. **Disaster Recovery** - Quick recovery procedures
5. **Change Management** - Safe deployment workflow
---
## 🗂️ Directory Structure (To Be Implemented)
```
/data/
├── app/
│ └── igny8/ # Application code
│ ├── docker-compose.app.yml # Production compose ✅
│ ├── docker-compose.staging.yml # Staging compose ⚠️ TO CREATE
│ ├── .env # Production env
│ ├── .env.staging # Staging env ⚠️ TO CREATE
│ └── scripts/
│ └── ops/ # ⚠️ TO CREATE
│ ├── backup-db.sh # Database backup
│ ├── backup-full.sh # Full backup (db + code + config)
│ ├── restore-db.sh # Database restore
│ ├── deploy-staging.sh # Deploy to staging
│ ├── deploy-production.sh# Deploy to production
│ ├── rollback.sh # Rollback deployment
│ ├── health-check.sh # System health check
│ ├── sync-prod-to-staging.sh # Sync data
│ └── log-rotate.sh # Log rotation
├── backups/ # Backup storage
│ ├── daily/ # Daily automated backups
│ │ └── YYYYMMDD/
│ │ ├── db_igny8_YYYYMMDD_HHMMSS.sql.gz
│ │ └── config_YYYYMMDD.tar.gz
│ ├── weekly/ # Weekly backups (kept 4 weeks)
│ ├── monthly/ # Monthly backups (kept 12 months)
│ └── pre-deploy/ # Pre-deployment snapshots
│ └── YYYYMMDD_HHMMSS/
├── logs/ # Centralized logs
│ ├── production/
│ │ ├── backend.log
│ │ ├── celery-worker.log
│ │ ├── celery-beat.log
│ │ └── access.log
│ ├── staging/
│ └── caddy/
└── stack/ # Infrastructure stack
└── igny8-stack/ # (Future - not yet separated)
```
---
## 🔄 Automated Backup System
### Backup Strategy
| Type | Frequency | Retention | Content |
|------|-----------|-----------|---------|
| **Daily** | 1:00 AM | 7 days | Database + configs |
| **Weekly** | Sunday 2:00 AM | 4 weeks | Full backup |
| **Monthly** | 1st of month | 12 months | Full backup |
| **Pre-Deploy** | Before each deploy | 5 most recent | Database snapshot |
### Cron Schedule
```bash
# /etc/cron.d/igny8-backup
# Daily database backup at 1:00 AM
0 1 * * * root /data/app/igny8/scripts/ops/backup-db.sh daily >> /data/logs/backup.log 2>&1
# Weekly full backup on Sunday at 2:00 AM
0 2 * * 0 root /data/app/igny8/scripts/ops/backup-full.sh weekly >> /data/logs/backup.log 2>&1
# Monthly full backup on 1st at 3:00 AM
0 3 1 * * root /data/app/igny8/scripts/ops/backup-full.sh monthly >> /data/logs/backup.log 2>&1
# Health check every 5 minutes
*/5 * * * * root /data/app/igny8/scripts/ops/health-check.sh >> /data/logs/health.log 2>&1
# Log rotation daily at midnight
0 0 * * * root /data/app/igny8/scripts/ops/log-rotate.sh >> /data/logs/maintenance.log 2>&1
```
---
## 🌍 Environment Management
### Environment Comparison
| Aspect | Development | Staging | Production |
|--------|-------------|---------|------------|
| **Domain** | localhost:5173 | staging.igny8.com | app.igny8.com |
| **API** | localhost:8010 | staging-api.igny8.com | api.igny8.com |
| **Database** | igny8_dev_db | igny8_staging_db | igny8_db |
| **Redis DB** | 2 | 1 | 0 |
| **Debug** | True | False | False |
| **AI Keys** | Test/Limited | Test/Limited | Production |
| **Payments** | Sandbox | Sandbox | Live |
| **Compose File** | docker-compose.dev.yml | docker-compose.staging.yml | docker-compose.app.yml |
| **Project Name** | igny8-dev | igny8-staging | igny8-app |
### Port Allocation
| Service | Dev | Staging | Production |
|---------|-----|---------|------------|
| Backend | 8010 | 8012 | 8011 |
| Frontend | 5173 | 8024 | 8021 |
| Marketing | 5174 | 8026 | 8023 |
| Flower | - | 5556 | 5555 |
---
## 🚀 Deployment Workflow
### Safe Deployment Checklist
```
┌─────────────────────────────────────────────────────────────┐
│ DEPLOYMENT CHECKLIST │
├─────────────────────────────────────────────────────────────┤
│ PRE-DEPLOYMENT │
│ □ All tests passing on staging? │
│ □ Database migrations reviewed? │
│ □ Backup created? │
│ □ Rollback plan ready? │
│ □ Team notified? │
├─────────────────────────────────────────────────────────────┤
│ DEPLOYMENT │
│ □ Create pre-deploy backup │
│ □ Tag current images for rollback │
│ □ Pull latest code │
│ □ Build new images │
│ □ Apply migrations │
│ □ Restart containers │
│ □ Verify health check │
├─────────────────────────────────────────────────────────────┤
│ POST-DEPLOYMENT │
│ □ Monitor logs for 10 minutes │
│ □ Test critical paths (login, API, AI functions) │
│ □ Check error rates │
│ □ If issues → ROLLBACK │
│ □ Update changelog │
└─────────────────────────────────────────────────────────────┘
```
### Git Branch Strategy
```
┌──────────┐
│ main │ ← Production deployments
└────▲─────┘
│ merge (after staging approval)
┌────┴─────┐
│ staging │ ← Staging deployments
└────▲─────┘
│ merge
┌────────────────┼────────────────┐
│ │ │
┌───────┴───────┐ ┌──────┴──────┐ ┌───────┴───────┐
│feature/xyz │ │feature/abc │ │hotfix/urgent │
└───────────────┘ └─────────────┘ └───────────────┘
```
---
## 🏥 Health Monitoring
### Health Check Endpoints
| Endpoint | Purpose | Expected Response |
|----------|---------|-------------------|
| `/api/v1/system/status/` | Overall system status | `{"status": "healthy"}` |
| `/api/v1/system/health/` | Detailed component health | JSON with all components |
### Monitoring Targets
1. **Backend API** - Response time < 500ms
2. **Database** - Connection pool healthy
3. **Redis** - Connection alive
4. **Celery Workers** - Queue length < 100
5. **Celery Beat** - Scheduler running
6. **Disk Space** - > 20% free
7. **Memory** - < 80% used
### Alert Thresholds
| Metric | Warning | Critical |
|--------|---------|----------|
| API Response Time | > 1s | > 5s |
| Error Rate | > 1% | > 5% |
| CPU Usage | > 70% | > 90% |
| Memory Usage | > 70% | > 90% |
| Disk Usage | > 70% | > 90% |
| Celery Queue | > 50 | > 200 |
---
## 🔧 Common Operations
### Daily Operations
```bash
# Check system health
/data/app/igny8/scripts/ops/health-check.sh
# View logs
tail -f /data/logs/production/backend.log
# Check container status
docker compose -f /data/app/igny8/docker-compose.app.yml -p igny8-app ps
```
### Weekly Operations
```bash
# Review backup status
ls -la /data/backups/daily/
du -sh /data/backups/*
# Check disk space
df -h
# Review error logs
grep -i error /data/logs/production/backend.log | tail -50
```
### Emergency Procedures
```bash
# Immediate rollback
/data/app/igny8/scripts/ops/rollback.sh
# Emergency restart
docker compose -f /data/app/igny8/docker-compose.app.yml -p igny8-app restart
# Emergency database restore
/data/app/igny8/scripts/ops/restore-db.sh /data/backups/latest.sql.gz
```
---
## 📊 What's Missing (Action Items)
### Priority 1 - Critical (Before Go-Live)
| Item | Status | Action |
|------|--------|--------|
| `docker-compose.staging.yml` | ❌ Missing | Create from documentation |
| `.env.staging` | ❌ Missing | Create from example |
| Deployment scripts | ❌ Missing | Create all ops scripts |
| Automated backup cron | ❌ Missing | Set up cron jobs |
| Pre-deploy backup | ❌ Missing | Add to deploy script |
### Priority 2 - Important (First Week)
| Item | Status | Action |
|------|--------|--------|
| Health check automation | ❌ Missing | Create monitoring |
| Log rotation | ❌ Missing | Set up logrotate |
| Staging DNS | ❌ Unknown | Configure if needed |
| Caddyfile staging routes | ❌ Unknown | Add staging domains |
### Priority 3 - Nice to Have (First Month)
| Item | Status | Action |
|------|--------|--------|
| CI/CD pipeline | ❌ Not set | Optional automation |
| External monitoring | ❌ Not set | UptimeRobot/Datadog |
| Alerting system | ❌ Not set | Email/Slack alerts |
---
## Next Steps
1. **Create ops scripts directory**: `/data/app/igny8/scripts/ops/`
2. **Create all deployment scripts** (see STAGING-SETUP-GUIDE.md)
3. **Create staging compose file** (copy from documentation)
4. **Set up automated backups**
5. **Test complete deployment cycle** on staging
6. **Go live with confidence**
---
## Related Documentation
- [STAGING-SETUP-GUIDE.md](final-clean-best-deployment-plan/STAGING-SETUP-GUIDE.md) - Detailed staging setup
- [TWO-REPO-ARCHITECTURE.md](final-clean-best-deployment-plan/TWO-REPO-ARCHITECTURE.md) - Architecture overview
- [INFRASTRUCTURE-STACK.md](final-clean-best-deployment-plan/INFRASTRUCTURE-STACK.md) - Stack details