Files
igny8/DATA_SEGREGATION_SYSTEM_VS_USER.md
2025-12-20 02:46:00 +00:00

357 lines
15 KiB
Markdown

# Data Segregation: System vs User Data
## Purpose
This document categorizes all models in the Django admin sidebar to identify:
- **SYSTEM DATA**: Configuration, templates, and settings that must be preserved (pre-configured, production-ready data)
- **USER DATA**: Account-specific, tenant-specific, or test data that can be cleaned up during testing phase
---
## 1. Accounts & Tenancy
| Model | Type | Description | Clean/Keep |
|-------|------|-------------|------------|
| Account | USER DATA | Customer accounts (test accounts during development) | ✅ CLEAN - Remove test accounts |
| User | USER DATA | User profiles linked to accounts | ✅ CLEAN - Remove test users |
| Site | USER DATA | Sites/domains owned by accounts | ✅ CLEAN - Remove test sites |
| Sector | USER DATA | Sectors within sites (account-specific) | ✅ CLEAN - Remove test sectors |
| SiteUserAccess | USER DATA | User permissions per site | ✅ CLEAN - Remove test access records |
**Summary**: All models are USER DATA - Safe to clean for fresh production start
---
## 2. Global Resources
| Model | Type | Description | Clean/Keep |
|-------|------|-------------|------------|
| Industry | SYSTEM DATA | Global industry taxonomy (e.g., Healthcare, Finance, Technology) | ⚠️ KEEP - Pre-configured industries |
| IndustrySector | SYSTEM DATA | Sub-categories within industries (e.g., Cardiology, Investment Banking) | ⚠️ KEEP - Pre-configured sectors |
| SeedKeyword | MIXED DATA | Seed keywords for industries - can be seeded or user-generated | ⚠️ REVIEW - Keep system seeds, remove test seeds |
**Summary**:
- **KEEP**: Industry and IndustrySector (global taxonomy)
- **REVIEW**: SeedKeyword - separate system defaults from test data
---
## 3. Plans and Billing
| Model | Type | Description | Clean/Keep |
|-------|------|-------------|------------|
| Plan | SYSTEM DATA | Subscription plans (Free, Pro, Enterprise, etc.) | ⚠️ KEEP - Production pricing tiers |
| Subscription | USER DATA | Active subscriptions per account | ✅ CLEAN - Remove test subscriptions |
| Invoice | USER DATA | Generated invoices for accounts | ✅ CLEAN - Remove test invoices |
| Payment | USER DATA | Payment records | ✅ CLEAN - Remove test payments |
| CreditPackage | SYSTEM DATA | Available credit packages for purchase | ⚠️ KEEP - Production credit offerings |
| PaymentMethodConfig | SYSTEM DATA | Supported payment methods (Stripe, PayPal) | ⚠️ KEEP - Production payment configs |
| AccountPaymentMethod | USER DATA | Saved payment methods per account | ✅ CLEAN - Remove test payment methods |
**Summary**:
- **KEEP**: Plan, CreditPackage, PaymentMethodConfig (system pricing/config)
- **CLEAN**: Subscription, Invoice, Payment, AccountPaymentMethod (user transactions)
---
## 4. Credits
| Model | Type | Description | Clean/Keep |
|-------|------|-------------|------------|
| CreditTransaction | USER DATA | Credit add/subtract transactions | ✅ CLEAN - Remove test transactions |
| CreditUsageLog | USER DATA | Log of credit usage per operation | ✅ CLEAN - Remove test usage logs |
| CreditCostConfig | SYSTEM DATA | Cost configuration per operation type | ⚠️ KEEP - Production cost structure |
| PlanLimitUsage | USER DATA | Usage tracking per account/plan limits | ✅ CLEAN - Remove test usage data |
**Summary**:
- **KEEP**: CreditCostConfig (system cost rules)
- **CLEAN**: All transaction and usage logs (user activity)
---
## 5. Content Planning
| Model | Type | Description | Clean/Keep |
|-------|------|-------------|------------|
| Keywords | USER DATA | Keywords researched per site/sector | ✅ CLEAN - Remove test keywords |
| Clusters | USER DATA | Content clusters created per site | ✅ CLEAN - Remove test clusters |
| ContentIdeas | USER DATA | Content ideas generated for accounts | ✅ CLEAN - Remove test ideas |
**Summary**: All models are USER DATA - Safe to clean completely
---
## 6. Content Generation
| Model | Type | Description | Clean/Keep |
|-------|------|-------------|------------|
| Tasks | USER DATA | Content writing tasks assigned to users | ✅ CLEAN - Remove test tasks |
| Content | USER DATA | Generated content/articles | ✅ CLEAN - Remove test content |
| Images | USER DATA | Generated or uploaded images | ✅ CLEAN - Remove test images |
**Summary**: All models are USER DATA - Safe to clean completely
---
## 7. Taxonomy & Organization
| Model | Type | Description | Clean/Keep |
|-------|------|-------------|------------|
| ContentTaxonomy | USER DATA | Custom taxonomies (categories/tags) per site | ✅ CLEAN - Remove test taxonomies |
| ContentTaxonomyRelation | USER DATA | Relationships between content and taxonomies | ✅ CLEAN - Remove test relations |
| ContentClusterMap | USER DATA | Mapping of content to clusters | ✅ CLEAN - Remove test mappings |
| ContentAttribute | USER DATA | Custom attributes for content | ✅ CLEAN - Remove test attributes |
**Summary**: All models are USER DATA - Safe to clean completely
---
## 8. Publishing & Integration
| Model | Type | Description | Clean/Keep |
|-------|------|-------------|------------|
| SiteIntegration | USER DATA | WordPress/platform integrations per site | ✅ CLEAN - Remove test integrations |
| SyncEvent | USER DATA | Sync events between IGNY8 and external platforms | ✅ CLEAN - Remove test sync logs |
| PublishingRecord | USER DATA | Records of published content | ✅ CLEAN - Remove test publish records |
| PublishingChannel | SYSTEM DATA | Available publishing channels (WordPress, Ghost, etc.) | ⚠️ KEEP - Production channel configs |
| DeploymentRecord | USER DATA | Deployment history per account | ✅ CLEAN - Remove test deployments |
**Summary**:
- **KEEP**: PublishingChannel (system-wide channel definitions)
- **CLEAN**: All user-specific integration and sync data
---
## 9. AI & Automation
| Model | Type | Description | Clean/Keep |
|-------|------|-------------|------------|
| IntegrationSettings | MIXED DATA | API keys/settings for OpenAI, etc. | ⚠️ REVIEW - Keep system defaults, remove test configs |
| AIPrompt | SYSTEM DATA | AI prompt templates for content generation | ⚠️ KEEP - Production prompt library |
| Strategy | SYSTEM DATA | Content strategy templates | ⚠️ KEEP - Production strategy templates |
| AuthorProfile | SYSTEM DATA | Author persona templates | ⚠️ KEEP - Production author profiles |
| APIKey | USER DATA | User-generated API keys for platform access | ✅ CLEAN - Remove test API keys |
| WebhookConfig | USER DATA | Webhook configurations per account | ✅ CLEAN - Remove test webhooks |
| AutomationConfig | USER DATA | Automation rules per account/site | ✅ CLEAN - Remove test automations |
| AutomationRun | USER DATA | Execution history of automations | ✅ CLEAN - Remove test run logs |
**Summary**:
- **KEEP**: AIPrompt, Strategy, AuthorProfile (system templates)
- **REVIEW**: IntegrationSettings (separate system vs user API keys)
- **CLEAN**: APIKey, WebhookConfig, AutomationConfig, AutomationRun (user configs)
---
## 10. System Settings
| Model | Type | Description | Clean/Keep |
|-------|------|-------------|------------|
| ContentType | SYSTEM DATA | Django ContentTypes (auto-managed) | ⚠️ KEEP - Django core system table |
| ContentTemplate | SYSTEM DATA | Content templates for generation | ⚠️ KEEP - Production templates |
| TaxonomyConfig | SYSTEM DATA | Taxonomy configuration rules | ⚠️ KEEP - Production taxonomy rules |
| SystemSetting | SYSTEM DATA | Global system settings | ⚠️ KEEP - Production system config |
| ContentTypeConfig | SYSTEM DATA | Content type definitions (blog post, landing page, etc.) | ⚠️ KEEP - Production content types |
| NotificationConfig | SYSTEM DATA | Notification templates and rules | ⚠️ KEEP - Production notification configs |
**Summary**: All models are SYSTEM DATA - Must be kept and properly seeded for production
---
## 11. Django Admin
| Model | Type | Description | Clean/Keep |
|-------|------|-------------|------------|
| Group | SYSTEM DATA | Permission groups (Admin, Editor, Viewer, etc.) | ⚠️ KEEP - Production role definitions |
| Permission | SYSTEM DATA | Django permissions (auto-managed) | ⚠️ KEEP - Django core system table |
| PasswordResetToken | USER DATA | Password reset tokens (temporary) | ✅ CLEAN - Remove expired tokens |
| Session | USER DATA | User session data | ✅ CLEAN - Remove old sessions |
**Summary**:
- **KEEP**: Group, Permission (system access control)
- **CLEAN**: PasswordResetToken, Session (temporary user data)
---
## 12. Tasks & Logging
| Model | Type | Description | Clean/Keep |
|-------|------|-------------|------------|
| AITaskLog | USER DATA | Logs of AI operations per account | ✅ CLEAN - Remove test logs |
| AuditLog | USER DATA | Audit trail of user actions | ✅ CLEAN - Remove test audit logs |
| LogEntry | USER DATA | Django admin action logs | ✅ CLEAN - Remove test admin logs |
| TaskResult | USER DATA | Celery task execution results | ✅ CLEAN - Remove test task results |
| GroupResult | USER DATA | Celery group task results | ✅ CLEAN - Remove test group results |
**Summary**: All models are USER DATA - Safe to clean completely (logs/audit trails)
---
## Summary Table: Data Segregation by Category
| Category | System Data Models | User Data Models | Mixed/Review |
|----------|-------------------|------------------|--------------|
| **Accounts & Tenancy** | 0 | 5 | 0 |
| **Global Resources** | 2 | 0 | 1 |
| **Plans and Billing** | 3 | 4 | 0 |
| **Credits** | 1 | 3 | 0 |
| **Content Planning** | 0 | 3 | 0 |
| **Content Generation** | 0 | 3 | 0 |
| **Taxonomy & Organization** | 0 | 4 | 0 |
| **Publishing & Integration** | 1 | 4 | 0 |
| **AI & Automation** | 3 | 4 | 1 |
| **System Settings** | 6 | 0 | 0 |
| **Django Admin** | 2 | 2 | 0 |
| **Tasks & Logging** | 0 | 5 | 0 |
| **TOTAL** | **18** | **37** | **2** |
---
## Action Plan: Production Data Preparation
### Phase 1: Preserve System Data ⚠️
**Models to Keep & Seed Properly:**
1. **Global Taxonomy**
- Industry (pre-populate 10-15 major industries)
- IndustrySector (pre-populate 100+ sub-sectors)
- SeedKeyword (system-level seed keywords per industry)
2. **Pricing & Plans**
- Plan (Free, Starter, Pro, Enterprise tiers)
- CreditPackage (credit bundles for purchase)
- PaymentMethodConfig (Stripe, PayPal configs)
- CreditCostConfig (cost per operation type)
3. **Publishing Channels**
- PublishingChannel (WordPress, Ghost, Medium, etc.)
4. **AI & Content Templates**
- AIPrompt (100+ production-ready prompts)
- Strategy (content strategy templates)
- AuthorProfile (author persona library)
- ContentTemplate (article templates)
- ContentTypeConfig (blog post, landing page, etc.)
5. **System Configuration**
- SystemSetting (global platform settings)
- TaxonomyConfig (taxonomy rules)
- NotificationConfig (email/webhook templates)
6. **Access Control**
- Group (Admin, Editor, Viewer, Owner roles)
- Permission (Django-managed)
- ContentType (Django-managed)
### Phase 2: Clean User/Test Data ✅
**Models to Truncate/Delete:**
1. **Account Data**: Account, User, Site, Sector, SiteUserAccess
2. **Billing Transactions**: Subscription, Invoice, Payment, AccountPaymentMethod, CreditTransaction
3. **Content Data**: Keywords, Clusters, ContentIdeas, Tasks, Content, Images
4. **Taxonomy Relations**: ContentTaxonomy, ContentTaxonomyRelation, ContentClusterMap, ContentAttribute
5. **Integration Data**: SiteIntegration, SyncEvent, PublishingRecord, DeploymentRecord
6. **User Configs**: APIKey, WebhookConfig, AutomationConfig, AutomationRun
7. **Logs**: AITaskLog, AuditLog, LogEntry, TaskResult, GroupResult, CreditUsageLog, PlanLimitUsage, PasswordResetToken, Session
### Phase 3: Review Mixed Data ⚠️
**Models Requiring Manual Review:**
1. **SeedKeyword**: Separate system seeds from test data
2. **IntegrationSettings**: Keep system-level API configs, remove test account keys
---
## Database Cleanup Commands (Use with Caution)
### Safe Cleanup (Logs & Sessions)
```python
# Remove old logs (>90 days)
AITaskLog.objects.filter(created_at__lt=timezone.now() - timedelta(days=90)).delete()
CreditUsageLog.objects.filter(created_at__lt=timezone.now() - timedelta(days=90)).delete()
LogEntry.objects.filter(action_time__lt=timezone.now() - timedelta(days=90)).delete()
# Remove old sessions and tokens
Session.objects.filter(expire_date__lt=timezone.now()).delete()
PasswordResetToken.objects.filter(expires_at__lt=timezone.now()).delete()
# Remove old task results
TaskResult.objects.filter(date_done__lt=timezone.now() - timedelta(days=30)).delete()
```
### Full Test Data Cleanup (Development/Staging Only)
```python
# WARNING: Only run in development/staging environments
# This will delete ALL user-generated data
# User data
Account.objects.all().delete() # Cascades to most user data
User.objects.filter(is_superuser=False).delete()
# Remaining user data
SiteIntegration.objects.all().delete()
AutomationConfig.objects.all().delete()
APIKey.objects.all().delete()
WebhookConfig.objects.all().delete()
# Logs and history
AITaskLog.objects.all().delete()
AuditLog.objects.all().delete()
LogEntry.objects.all().delete()
TaskResult.objects.all().delete()
GroupResult.objects.all().delete()
```
### Verify System Data Exists
```python
# Check system data is properly seeded
print(f"Industries: {Industry.objects.count()}")
print(f"Plans: {Plan.objects.count()}")
print(f"AI Prompts: {AIPrompt.objects.count()}")
print(f"Strategies: {Strategy.objects.count()}")
print(f"Content Templates: {ContentTemplate.objects.count()}")
print(f"Publishing Channels: {PublishingChannel.objects.count()}")
print(f"Groups: {Group.objects.count()}")
```
---
## Recommendations
### Before Production Launch:
1. **Export System Data**: Export all SYSTEM DATA models to fixtures for reproducibility
```bash
python manage.py dumpdata igny8_core_auth.Industry > fixtures/industries.json
python manage.py dumpdata igny8_core_auth.Plan > fixtures/plans.json
python manage.py dumpdata system.AIPrompt > fixtures/prompts.json
# ... repeat for all system models
```
2. **Create Seed Script**: Create management command to populate fresh database with system data
```bash
python manage.py seed_system_data
```
3. **Database Snapshot**: Take snapshot after system data is seeded, before any user data
4. **Separate Databases**: Consider separate staging database with full test data vs production with clean start
5. **Data Migration Plan**:
- If migrating from old system: Only migrate Account, User, Content, and critical user data
- Leave test data behind in old system
---
## Next Steps
1. ✅ Review this document and confirm data segregation logic
2. ⚠️ Create fixtures/seeds for all 18 SYSTEM DATA models
3. ⚠️ Review 2 MIXED DATA models (SeedKeyword, IntegrationSettings)
4. ✅ Create cleanup script for 37 USER DATA models
5. ✅ Test cleanup script in staging environment
6. ✅ Execute cleanup before production launch
---
*Generated: December 20, 2025*
*Purpose: Production data preparation and test data cleanup*