KW_DB & Maangement of KW

This commit is contained in:
IGNY8 VPS (Salman)
2026-01-13 12:00:16 +00:00
parent d2b733640c
commit 95e316cde2
37 changed files with 9224 additions and 0 deletions

265
backend/scripts/README.md Normal file
View File

@@ -0,0 +1,265 @@
# IGNY8 Seed Keywords Import Scripts
This folder contains scripts for importing seed keywords from the KW_DB folder structure into the IGNY8 global keywords database.
## 📁 Folder Structure
```
/data/app/igny8/KW_DB/
{Industry}/ # e.g., HealthCare_Medical
{Sector}/ # e.g., Physiotherapy_Rehabilitation
*.csv # Keyword CSV files
```
## 🔧 Available Scripts
### 1. `import_seed_keywords_single.py`
Import keywords from a **single CSV file** (for testing).
**Usage:**
```bash
# Dry run (preview only)
docker compose -f docker-compose.app.yml exec igny8_backend \
python3 /app/scripts/import_seed_keywords_single.py \
--csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
--industry "HealthCare Medical" \
--sector "Physiotherapy Rehabilitation" \
--dry-run --verbose
# Actual import
docker compose -f docker-compose.app.yml exec igny8_backend \
python3 /app/scripts/import_seed_keywords_single.py \
--csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
--industry "HealthCare Medical" \
--sector "Physiotherapy Rehabilitation"
```
**Options:**
- `--csv` - Path to CSV file (required)
- `--industry` - Industry name (required)
- `--sector` - Sector name (required)
- `--dry-run` - Preview without saving to database
- `--verbose` - Show detailed progress for each keyword
---
### 2. `import_all_seed_keywords.py`
Import keywords from **all CSV files** in the KW_DB folder structure.
**Usage:**
```bash
# Dry run (preview all imports)
docker compose -f docker-compose.app.yml exec igny8_backend \
python3 /app/scripts/import_all_seed_keywords.py \
--base-path /data/app/igny8/KW_DB \
--dry-run
# Actual import
docker compose -f docker-compose.app.yml exec igny8_backend \
python3 /app/scripts/import_all_seed_keywords.py \
--base-path /data/app/igny8/KW_DB
```
**Options:**
- `--base-path` - Base path to KW_DB folder (default: /data/app/igny8/KW_DB)
- `--dry-run` - Preview without saving to database
- `--verbose` - Show detailed progress for each keyword
---
## 📊 CSV File Format
Expected CSV columns:
- **Keyword** (required) - The keyword text
- **Country** (optional) - Country code (default: US)
- **Volume** (optional) - Search volume (default: 0)
- **Difficulty** (optional) - Keyword difficulty 0-100 (default: 0)
- **CPC** (ignored) - Not imported
- **Parent Keyword** (ignored) - Not imported
Example:
```csv
Keyword,Country,Volume,Difficulty,CPC,Parent Keyword
physical therapy,us,12000,45,3.20,
tens unit,us,5000,32,2.50,physical therapy
```
---
## 🔍 Duplicate Handling
**Duplicate Check:** `keyword + country` (case-insensitive) within same industry+sector
- If a keyword with the same country already exists in the same industry+sector → **SKIPS import**
- Example: "physical therapy [US]" in "HealthCare Medical > Physiotherapy Rehabilitation" will be skipped if already exists
---
## 🗄️ Database Models
### Industry
- `name` - Industry name (e.g., "HealthCare Medical")
- `slug` - URL-friendly slug (e.g., "healthcare-medical")
- `is_active` - Active status (default: True)
### IndustrySector
- `name` - Sector name (e.g., "Physiotherapy Rehabilitation")
- `slug` - URL-friendly slug (e.g., "physiotherapy-rehabilitation")
- `industry` - Foreign key to Industry
- `is_active` - Active status (default: True)
### SeedKeyword
- `keyword` - Keyword text
- `industry` - Foreign key to Industry
- `sector` - Foreign key to IndustrySector
- `country` - Country code (e.g., "US")
- `volume` - Search volume
- `difficulty` - Keyword difficulty (0-100)
- `is_active` - Active status (default: True)
**Unique Constraint:** `keyword + industry + sector` (at model level)
**Script Duplicate Check:** `keyword + country + industry + sector` (stricter than model)
---
## ✅ Verification
After import, verify the data:
```bash
# Check counts in Django shell
docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell
>>> from igny8_core.auth.models import Industry, IndustrySector, SeedKeyword
>>> Industry.objects.count()
>>> IndustrySector.objects.count()
>>> SeedKeyword.objects.count()
>>> SeedKeyword.objects.filter(industry__name="HealthCare Medical").count()
```
Or check in Django admin:
- Industries: `/admin/auth/industry/`
- Sectors: `/admin/auth/industrysector/`
- Keywords: `/admin/auth/seedkeyword/`
---
## 🐛 Troubleshooting
### Issue: "ModuleNotFoundError: No module named 'igny8_core'"
**Solution:** Script must run inside Docker container:
```bash
docker compose -f docker-compose.app.yml exec igny8_backend python3 /app/scripts/...
```
### Issue: "CSV file not found"
**Solution:** Use full path inside container: `/data/app/igny8/KW_DB/...`
### Issue: Keywords not importing (showing as duplicates)
**Solution:** Check if keywords already exist:
```bash
docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell
>>> from igny8_core.auth.models import SeedKeyword
>>> SeedKeyword.objects.filter(keyword__iexact="physical therapy", country="US")
```
### Issue: Want to re-import after cleaning database
**Solution:** Delete existing keywords first:
```bash
docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell
>>> from igny8_core.auth.models import SeedKeyword
>>> SeedKeyword.objects.all().delete() # Delete all keywords
>>> SeedKeyword.objects.filter(industry__slug="healthcare-medical").delete() # Delete specific industry
```
---
## 📈 Import Statistics
The scripts provide detailed statistics:
- Total rows processed
- Keywords imported successfully
- Duplicates skipped (keyword + country)
- Invalid rows skipped (empty keywords, bad data)
- Errors encountered
- Industries/Sectors created
Example output:
```
======================================================================
IMPORT SUMMARY
======================================================================
Total rows processed: 4,523
✓ Imported: 4,201
⊘ Skipped (duplicate): 280
⊘ Skipped (invalid): 37
✗ Errors: 5
======================================================================
```
---
## 🚀 Quick Start Guide
1. **Test with single file first:**
```bash
docker compose -f docker-compose.app.yml exec igny8_backend \
python3 /app/scripts/import_seed_keywords_single.py \
--csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
--industry "HealthCare Medical" \
--sector "Physiotherapy Rehabilitation" \
--dry-run
```
2. **If successful, remove `--dry-run` and run actual import:**
```bash
docker compose -f docker-compose.app.yml exec igny8_backend \
python3 /app/scripts/import_seed_keywords_single.py \
--csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
--industry "HealthCare Medical" \
--sector "Physiotherapy Rehabilitation"
```
3. **Verify in Django admin:** `/admin/auth/seedkeyword/`
4. **Import all files:**
```bash
docker compose -f docker-compose.app.yml exec igny8_backend \
python3 /app/scripts/import_all_seed_keywords.py \
--base-path /data/app/igny8/KW_DB \
--dry-run
```
5. **If successful, run actual bulk import:**
```bash
docker compose -f docker-compose.app.yml exec igny8_backend \
python3 /app/scripts/import_all_seed_keywords.py \
--base-path /data/app/igny8/KW_DB
```
---
## 📝 Notes
- Scripts automatically create Industries and Sectors if they don't exist
- Folder names are converted to display names (underscores → spaces)
- Slugs are auto-generated from names
- All imports happen within transactions for data integrity
- Dry-run mode uses transaction rollback (no database changes)
- Empty or invalid CSV rows are skipped with warnings
---
## 🔗 Related Documentation
- [Database Models](/docs/90-REFERENCE/MODELS.md)
- [Django Admin Access](/docs/90-REFERENCE/DJANGO-ADMIN-ACCESS-GUIDE.md)
- [System Architecture](/docs/00-SYSTEM/ARCHITECTURE.md)
---
**Author:** IGNY8 Team
**Created:** January 13, 2026
**Last Updated:** January 13, 2026