KW_DB & Maangement of KW
This commit is contained in:
265
backend/scripts/README.md
Normal file
265
backend/scripts/README.md
Normal file
@@ -0,0 +1,265 @@
|
||||
# IGNY8 Seed Keywords Import Scripts
|
||||
|
||||
This folder contains scripts for importing seed keywords from the KW_DB folder structure into the IGNY8 global keywords database.
|
||||
|
||||
## 📁 Folder Structure
|
||||
|
||||
```
|
||||
/data/app/igny8/KW_DB/
|
||||
{Industry}/ # e.g., HealthCare_Medical
|
||||
{Sector}/ # e.g., Physiotherapy_Rehabilitation
|
||||
*.csv # Keyword CSV files
|
||||
```
|
||||
|
||||
## 🔧 Available Scripts
|
||||
|
||||
### 1. `import_seed_keywords_single.py`
|
||||
Import keywords from a **single CSV file** (for testing).
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Dry run (preview only)
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_seed_keywords_single.py \
|
||||
--csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
|
||||
--industry "HealthCare Medical" \
|
||||
--sector "Physiotherapy Rehabilitation" \
|
||||
--dry-run --verbose
|
||||
|
||||
# Actual import
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_seed_keywords_single.py \
|
||||
--csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
|
||||
--industry "HealthCare Medical" \
|
||||
--sector "Physiotherapy Rehabilitation"
|
||||
```
|
||||
|
||||
**Options:**
|
||||
- `--csv` - Path to CSV file (required)
|
||||
- `--industry` - Industry name (required)
|
||||
- `--sector` - Sector name (required)
|
||||
- `--dry-run` - Preview without saving to database
|
||||
- `--verbose` - Show detailed progress for each keyword
|
||||
|
||||
---
|
||||
|
||||
### 2. `import_all_seed_keywords.py`
|
||||
Import keywords from **all CSV files** in the KW_DB folder structure.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Dry run (preview all imports)
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_all_seed_keywords.py \
|
||||
--base-path /data/app/igny8/KW_DB \
|
||||
--dry-run
|
||||
|
||||
# Actual import
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_all_seed_keywords.py \
|
||||
--base-path /data/app/igny8/KW_DB
|
||||
```
|
||||
|
||||
**Options:**
|
||||
- `--base-path` - Base path to KW_DB folder (default: /data/app/igny8/KW_DB)
|
||||
- `--dry-run` - Preview without saving to database
|
||||
- `--verbose` - Show detailed progress for each keyword
|
||||
|
||||
---
|
||||
|
||||
## 📊 CSV File Format
|
||||
|
||||
Expected CSV columns:
|
||||
- **Keyword** (required) - The keyword text
|
||||
- **Country** (optional) - Country code (default: US)
|
||||
- **Volume** (optional) - Search volume (default: 0)
|
||||
- **Difficulty** (optional) - Keyword difficulty 0-100 (default: 0)
|
||||
- **CPC** (ignored) - Not imported
|
||||
- **Parent Keyword** (ignored) - Not imported
|
||||
|
||||
Example:
|
||||
```csv
|
||||
Keyword,Country,Volume,Difficulty,CPC,Parent Keyword
|
||||
physical therapy,us,12000,45,3.20,
|
||||
tens unit,us,5000,32,2.50,physical therapy
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Duplicate Handling
|
||||
|
||||
**Duplicate Check:** `keyword + country` (case-insensitive) within same industry+sector
|
||||
|
||||
- If a keyword with the same country already exists in the same industry+sector → **SKIPS import**
|
||||
- Example: "physical therapy [US]" in "HealthCare Medical > Physiotherapy Rehabilitation" will be skipped if already exists
|
||||
|
||||
---
|
||||
|
||||
## 🗄️ Database Models
|
||||
|
||||
### Industry
|
||||
- `name` - Industry name (e.g., "HealthCare Medical")
|
||||
- `slug` - URL-friendly slug (e.g., "healthcare-medical")
|
||||
- `is_active` - Active status (default: True)
|
||||
|
||||
### IndustrySector
|
||||
- `name` - Sector name (e.g., "Physiotherapy Rehabilitation")
|
||||
- `slug` - URL-friendly slug (e.g., "physiotherapy-rehabilitation")
|
||||
- `industry` - Foreign key to Industry
|
||||
- `is_active` - Active status (default: True)
|
||||
|
||||
### SeedKeyword
|
||||
- `keyword` - Keyword text
|
||||
- `industry` - Foreign key to Industry
|
||||
- `sector` - Foreign key to IndustrySector
|
||||
- `country` - Country code (e.g., "US")
|
||||
- `volume` - Search volume
|
||||
- `difficulty` - Keyword difficulty (0-100)
|
||||
- `is_active` - Active status (default: True)
|
||||
|
||||
**Unique Constraint:** `keyword + industry + sector` (at model level)
|
||||
**Script Duplicate Check:** `keyword + country + industry + sector` (stricter than model)
|
||||
|
||||
---
|
||||
|
||||
## ✅ Verification
|
||||
|
||||
After import, verify the data:
|
||||
|
||||
```bash
|
||||
# Check counts in Django shell
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell
|
||||
|
||||
>>> from igny8_core.auth.models import Industry, IndustrySector, SeedKeyword
|
||||
>>> Industry.objects.count()
|
||||
>>> IndustrySector.objects.count()
|
||||
>>> SeedKeyword.objects.count()
|
||||
>>> SeedKeyword.objects.filter(industry__name="HealthCare Medical").count()
|
||||
```
|
||||
|
||||
Or check in Django admin:
|
||||
- Industries: `/admin/auth/industry/`
|
||||
- Sectors: `/admin/auth/industrysector/`
|
||||
- Keywords: `/admin/auth/seedkeyword/`
|
||||
|
||||
---
|
||||
|
||||
## 🐛 Troubleshooting
|
||||
|
||||
### Issue: "ModuleNotFoundError: No module named 'igny8_core'"
|
||||
**Solution:** Script must run inside Docker container:
|
||||
```bash
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend python3 /app/scripts/...
|
||||
```
|
||||
|
||||
### Issue: "CSV file not found"
|
||||
**Solution:** Use full path inside container: `/data/app/igny8/KW_DB/...`
|
||||
|
||||
### Issue: Keywords not importing (showing as duplicates)
|
||||
**Solution:** Check if keywords already exist:
|
||||
```bash
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell
|
||||
|
||||
>>> from igny8_core.auth.models import SeedKeyword
|
||||
>>> SeedKeyword.objects.filter(keyword__iexact="physical therapy", country="US")
|
||||
```
|
||||
|
||||
### Issue: Want to re-import after cleaning database
|
||||
**Solution:** Delete existing keywords first:
|
||||
```bash
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell
|
||||
|
||||
>>> from igny8_core.auth.models import SeedKeyword
|
||||
>>> SeedKeyword.objects.all().delete() # Delete all keywords
|
||||
>>> SeedKeyword.objects.filter(industry__slug="healthcare-medical").delete() # Delete specific industry
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📈 Import Statistics
|
||||
|
||||
The scripts provide detailed statistics:
|
||||
- Total rows processed
|
||||
- Keywords imported successfully
|
||||
- Duplicates skipped (keyword + country)
|
||||
- Invalid rows skipped (empty keywords, bad data)
|
||||
- Errors encountered
|
||||
- Industries/Sectors created
|
||||
|
||||
Example output:
|
||||
```
|
||||
======================================================================
|
||||
IMPORT SUMMARY
|
||||
======================================================================
|
||||
Total rows processed: 4,523
|
||||
✓ Imported: 4,201
|
||||
⊘ Skipped (duplicate): 280
|
||||
⊘ Skipped (invalid): 37
|
||||
✗ Errors: 5
|
||||
======================================================================
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Quick Start Guide
|
||||
|
||||
1. **Test with single file first:**
|
||||
```bash
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_seed_keywords_single.py \
|
||||
--csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
|
||||
--industry "HealthCare Medical" \
|
||||
--sector "Physiotherapy Rehabilitation" \
|
||||
--dry-run
|
||||
```
|
||||
|
||||
2. **If successful, remove `--dry-run` and run actual import:**
|
||||
```bash
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_seed_keywords_single.py \
|
||||
--csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
|
||||
--industry "HealthCare Medical" \
|
||||
--sector "Physiotherapy Rehabilitation"
|
||||
```
|
||||
|
||||
3. **Verify in Django admin:** `/admin/auth/seedkeyword/`
|
||||
|
||||
4. **Import all files:**
|
||||
```bash
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_all_seed_keywords.py \
|
||||
--base-path /data/app/igny8/KW_DB \
|
||||
--dry-run
|
||||
```
|
||||
|
||||
5. **If successful, run actual bulk import:**
|
||||
```bash
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_all_seed_keywords.py \
|
||||
--base-path /data/app/igny8/KW_DB
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📝 Notes
|
||||
|
||||
- Scripts automatically create Industries and Sectors if they don't exist
|
||||
- Folder names are converted to display names (underscores → spaces)
|
||||
- Slugs are auto-generated from names
|
||||
- All imports happen within transactions for data integrity
|
||||
- Dry-run mode uses transaction rollback (no database changes)
|
||||
- Empty or invalid CSV rows are skipped with warnings
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Related Documentation
|
||||
|
||||
- [Database Models](/docs/90-REFERENCE/MODELS.md)
|
||||
- [Django Admin Access](/docs/90-REFERENCE/DJANGO-ADMIN-ACCESS-GUIDE.md)
|
||||
- [System Architecture](/docs/00-SYSTEM/ARCHITECTURE.md)
|
||||
|
||||
---
|
||||
|
||||
**Author:** IGNY8 Team
|
||||
**Created:** January 13, 2026
|
||||
**Last Updated:** January 13, 2026
|
||||
Reference in New Issue
Block a user