# IGNY8 Seed Keywords Import Scripts This folder contains scripts for importing seed keywords from the KW_DB folder structure into the IGNY8 global keywords database. ## 📁 Folder Structure ``` /data/app/igny8/KW_DB/ {Industry}/ # e.g., HealthCare_Medical {Sector}/ # e.g., Physiotherapy_Rehabilitation *.csv # Keyword CSV files ``` ## 🔧 Available Scripts ### 1. `import_seed_keywords_single.py` Import keywords from a **single CSV file** (for testing). **Usage:** ```bash # Dry run (preview only) docker compose -f docker-compose.app.yml exec igny8_backend \ python3 /app/scripts/import_seed_keywords_single.py \ --csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \ --industry "HealthCare Medical" \ --sector "Physiotherapy Rehabilitation" \ --dry-run --verbose # Actual import docker compose -f docker-compose.app.yml exec igny8_backend \ python3 /app/scripts/import_seed_keywords_single.py \ --csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \ --industry "HealthCare Medical" \ --sector "Physiotherapy Rehabilitation" ``` **Options:** - `--csv` - Path to CSV file (required) - `--industry` - Industry name (required) - `--sector` - Sector name (required) - `--dry-run` - Preview without saving to database - `--verbose` - Show detailed progress for each keyword --- ### 2. `import_all_seed_keywords.py` Import keywords from **all CSV files** in the KW_DB folder structure. **Usage:** ```bash # Dry run (preview all imports) docker compose -f docker-compose.app.yml exec igny8_backend \ python3 /app/scripts/import_all_seed_keywords.py \ --base-path /data/app/igny8/KW_DB \ --dry-run # Actual import docker compose -f docker-compose.app.yml exec igny8_backend \ python3 /app/scripts/import_all_seed_keywords.py \ --base-path /data/app/igny8/KW_DB ``` **Options:** - `--base-path` - Base path to KW_DB folder (default: /data/app/igny8/KW_DB) - `--dry-run` - Preview without saving to database - `--verbose` - Show detailed progress for each keyword --- ## 📊 CSV File Format Expected CSV columns: - **Keyword** (required) - The keyword text - **Country** (optional) - Country code (default: US) - **Volume** (optional) - Search volume (default: 0) - **Difficulty** (optional) - Keyword difficulty 0-100 (default: 0) - **CPC** (ignored) - Not imported - **Parent Keyword** (ignored) - Not imported Example: ```csv Keyword,Country,Volume,Difficulty,CPC,Parent Keyword physical therapy,us,12000,45,3.20, tens unit,us,5000,32,2.50,physical therapy ``` --- ## 🔍 Duplicate Handling **Duplicate Check:** `keyword + country` (case-insensitive) within same industry+sector - If a keyword with the same country already exists in the same industry+sector → **SKIPS import** - Example: "physical therapy [US]" in "HealthCare Medical > Physiotherapy Rehabilitation" will be skipped if already exists --- ## 🗄️ Database Models ### Industry - `name` - Industry name (e.g., "HealthCare Medical") - `slug` - URL-friendly slug (e.g., "healthcare-medical") - `is_active` - Active status (default: True) ### IndustrySector - `name` - Sector name (e.g., "Physiotherapy Rehabilitation") - `slug` - URL-friendly slug (e.g., "physiotherapy-rehabilitation") - `industry` - Foreign key to Industry - `is_active` - Active status (default: True) ### SeedKeyword - `keyword` - Keyword text - `industry` - Foreign key to Industry - `sector` - Foreign key to IndustrySector - `country` - Country code (e.g., "US") - `volume` - Search volume - `difficulty` - Keyword difficulty (0-100) - `is_active` - Active status (default: True) **Unique Constraint:** `keyword + industry + sector` (at model level) **Script Duplicate Check:** `keyword + country + industry + sector` (stricter than model) --- ## ✅ Verification After import, verify the data: ```bash # Check counts in Django shell docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell >>> from igny8_core.auth.models import Industry, IndustrySector, SeedKeyword >>> Industry.objects.count() >>> IndustrySector.objects.count() >>> SeedKeyword.objects.count() >>> SeedKeyword.objects.filter(industry__name="HealthCare Medical").count() ``` Or check in Django admin: - Industries: `/admin/auth/industry/` - Sectors: `/admin/auth/industrysector/` - Keywords: `/admin/auth/seedkeyword/` --- ## 🐛 Troubleshooting ### Issue: "ModuleNotFoundError: No module named 'igny8_core'" **Solution:** Script must run inside Docker container: ```bash docker compose -f docker-compose.app.yml exec igny8_backend python3 /app/scripts/... ``` ### Issue: "CSV file not found" **Solution:** Use full path inside container: `/data/app/igny8/KW_DB/...` ### Issue: Keywords not importing (showing as duplicates) **Solution:** Check if keywords already exist: ```bash docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell >>> from igny8_core.auth.models import SeedKeyword >>> SeedKeyword.objects.filter(keyword__iexact="physical therapy", country="US") ``` ### Issue: Want to re-import after cleaning database **Solution:** Delete existing keywords first: ```bash docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell >>> from igny8_core.auth.models import SeedKeyword >>> SeedKeyword.objects.all().delete() # Delete all keywords >>> SeedKeyword.objects.filter(industry__slug="healthcare-medical").delete() # Delete specific industry ``` --- ## 📈 Import Statistics The scripts provide detailed statistics: - Total rows processed - Keywords imported successfully - Duplicates skipped (keyword + country) - Invalid rows skipped (empty keywords, bad data) - Errors encountered - Industries/Sectors created Example output: ``` ====================================================================== IMPORT SUMMARY ====================================================================== Total rows processed: 4,523 ✓ Imported: 4,201 ⊘ Skipped (duplicate): 280 ⊘ Skipped (invalid): 37 ✗ Errors: 5 ====================================================================== ``` --- ## 🚀 Quick Start Guide 1. **Test with single file first:** ```bash docker compose -f docker-compose.app.yml exec igny8_backend \ python3 /app/scripts/import_seed_keywords_single.py \ --csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \ --industry "HealthCare Medical" \ --sector "Physiotherapy Rehabilitation" \ --dry-run ``` 2. **If successful, remove `--dry-run` and run actual import:** ```bash docker compose -f docker-compose.app.yml exec igny8_backend \ python3 /app/scripts/import_seed_keywords_single.py \ --csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \ --industry "HealthCare Medical" \ --sector "Physiotherapy Rehabilitation" ``` 3. **Verify in Django admin:** `/admin/auth/seedkeyword/` 4. **Import all files:** ```bash docker compose -f docker-compose.app.yml exec igny8_backend \ python3 /app/scripts/import_all_seed_keywords.py \ --base-path /data/app/igny8/KW_DB \ --dry-run ``` 5. **If successful, run actual bulk import:** ```bash docker compose -f docker-compose.app.yml exec igny8_backend \ python3 /app/scripts/import_all_seed_keywords.py \ --base-path /data/app/igny8/KW_DB ``` --- ## 📝 Notes - Scripts automatically create Industries and Sectors if they don't exist - Folder names are converted to display names (underscores → spaces) - Slugs are auto-generated from names - All imports happen within transactions for data integrity - Dry-run mode uses transaction rollback (no database changes) - Empty or invalid CSV rows are skipped with warnings --- ## 🔗 Related Documentation - [Database Models](/docs/90-REFERENCE/MODELS.md) - [Django Admin Access](/docs/90-REFERENCE/DJANGO-ADMIN-ACCESS-GUIDE.md) - [System Architecture](/docs/00-SYSTEM/ARCHITECTURE.md) --- **Author:** IGNY8 Team **Created:** January 13, 2026 **Last Updated:** January 13, 2026