KW_DB & Maangement of KW
This commit is contained in:
89
backend/scripts/IMPORT_STATUS.md
Normal file
89
backend/scripts/IMPORT_STATUS.md
Normal file
@@ -0,0 +1,89 @@
|
||||
# Seed Keywords Import - Quick Reference
|
||||
|
||||
## ✅ Single File Import (COMPLETED)
|
||||
|
||||
Successfully imported **23 keywords** from `google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv`
|
||||
|
||||
- Industry: **HealthCare Medical**
|
||||
- Sector: **Physiotherapy Rehabilitation**
|
||||
- Status: ✅ **COMPLETED**
|
||||
|
||||
## 📊 Full Import Preview (Dry-Run)
|
||||
|
||||
Total files discovered: **11 CSV files**
|
||||
Total keywords to import: **4,347 keywords**
|
||||
|
||||
### Files breakdown:
|
||||
1. `google_us_physical-therapy_matching-terms_2025-12-19_04-25-15.csv` - **2,500 keywords**
|
||||
2. `google_us_knee-brace_matching-terms_2025-12-19_04-25-58.csv` - **490 keywords**
|
||||
3. `google_us_knee-brace_matching-terms_2025-12-19_04-26-08.csv` - **490 keywords**
|
||||
4. `google_us_heating-pad_matching-terms_2025-12-19_04-25-43.csv` - **268 keywords**
|
||||
5. `google_us_tens-unit_matching-terms_2025-12-19_04-25-37.csv` - **249 keywords**
|
||||
6. `google_us_back-brace_matching-terms_2025-12-19_04-26-43.csv` - **199 keywords**
|
||||
7. `google_us_physiotherapy_related-terms_2025-12-19_04-25-01.csv` - **72 keywords**
|
||||
8. `google_us_posture-corrector_matching-terms_2025-12-19_04-25-50.csv` - **30 keywords**
|
||||
9. `google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv` - **23 keywords** ✅ (already imported)
|
||||
10. `google_us_therapy-equipment_matching-terms_2025-12-19_04-25-26.csv` - **22 keywords**
|
||||
11. `google_us_rehab-equipment_matching-terms_2025-12-19_04-25-22.csv` - **4 keywords**
|
||||
|
||||
## 🚀 Next Steps
|
||||
|
||||
### Run Full Import
|
||||
|
||||
Since the test was successful, you can now run the full import:
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_all_seed_keywords.py \
|
||||
--base-path /data/app/igny8/KW_DB
|
||||
```
|
||||
|
||||
**Note:** This will skip the 23 keywords already imported from the muscle-stimulator file (duplicate check: keyword + country).
|
||||
|
||||
Expected results:
|
||||
- **4,347 rows processed**
|
||||
- **~4,324 keywords imported** (4,347 - 23 already imported)
|
||||
- **~23 duplicates skipped** (from muscle-stimulator file)
|
||||
- **0 errors** (if all goes well)
|
||||
|
||||
## 📍 Files Location
|
||||
|
||||
All import scripts are in: `/data/app/igny8/backend/scripts/`
|
||||
|
||||
- `import_seed_keywords_single.py` - Import single CSV file
|
||||
- `import_all_seed_keywords.py` - Import all CSV files from folder structure
|
||||
- `README.md` - Full documentation
|
||||
|
||||
## 🔍 Verification Commands
|
||||
|
||||
After full import, verify the data:
|
||||
|
||||
```bash
|
||||
# Check total keywords
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell -c "from igny8_core.auth.models import SeedKeyword; print(f'Total keywords: {SeedKeyword.objects.count()}')"
|
||||
|
||||
# Check by industry
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell -c "from igny8_core.auth.models import SeedKeyword; print(f'HealthCare Medical: {SeedKeyword.objects.filter(industry__slug=\"healthcare-medical\").count()}')"
|
||||
|
||||
# Check by sector
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell -c "from igny8_core.auth.models import SeedKeyword; print(f'Physiotherapy: {SeedKeyword.objects.filter(sector__slug=\"physiotherapy-rehabilitation\").count()}')"
|
||||
```
|
||||
|
||||
Or check in Django admin:
|
||||
- Keywords: `/admin/auth/seedkeyword/`
|
||||
- Industries: `/admin/auth/industry/`
|
||||
- Sectors: `/admin/auth/industrysector/`
|
||||
|
||||
## 🎯 Key Features
|
||||
|
||||
✅ **Automatic Industry/Sector Creation** - Creates from folder names
|
||||
✅ **Duplicate Detection** - keyword + country (case-insensitive)
|
||||
✅ **Transaction Safety** - All imports in transactions
|
||||
✅ **Dry-Run Mode** - Preview before import
|
||||
✅ **Detailed Statistics** - Import counts and errors
|
||||
✅ **Error Handling** - Skips invalid rows gracefully
|
||||
✅ **Verbose Logging** - Optional detailed progress
|
||||
|
||||
## 📚 Documentation
|
||||
|
||||
Full documentation: `/data/app/igny8/backend/scripts/README.md`
|
||||
265
backend/scripts/README.md
Normal file
265
backend/scripts/README.md
Normal file
@@ -0,0 +1,265 @@
|
||||
# IGNY8 Seed Keywords Import Scripts
|
||||
|
||||
This folder contains scripts for importing seed keywords from the KW_DB folder structure into the IGNY8 global keywords database.
|
||||
|
||||
## 📁 Folder Structure
|
||||
|
||||
```
|
||||
/data/app/igny8/KW_DB/
|
||||
{Industry}/ # e.g., HealthCare_Medical
|
||||
{Sector}/ # e.g., Physiotherapy_Rehabilitation
|
||||
*.csv # Keyword CSV files
|
||||
```
|
||||
|
||||
## 🔧 Available Scripts
|
||||
|
||||
### 1. `import_seed_keywords_single.py`
|
||||
Import keywords from a **single CSV file** (for testing).
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Dry run (preview only)
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_seed_keywords_single.py \
|
||||
--csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
|
||||
--industry "HealthCare Medical" \
|
||||
--sector "Physiotherapy Rehabilitation" \
|
||||
--dry-run --verbose
|
||||
|
||||
# Actual import
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_seed_keywords_single.py \
|
||||
--csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
|
||||
--industry "HealthCare Medical" \
|
||||
--sector "Physiotherapy Rehabilitation"
|
||||
```
|
||||
|
||||
**Options:**
|
||||
- `--csv` - Path to CSV file (required)
|
||||
- `--industry` - Industry name (required)
|
||||
- `--sector` - Sector name (required)
|
||||
- `--dry-run` - Preview without saving to database
|
||||
- `--verbose` - Show detailed progress for each keyword
|
||||
|
||||
---
|
||||
|
||||
### 2. `import_all_seed_keywords.py`
|
||||
Import keywords from **all CSV files** in the KW_DB folder structure.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Dry run (preview all imports)
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_all_seed_keywords.py \
|
||||
--base-path /data/app/igny8/KW_DB \
|
||||
--dry-run
|
||||
|
||||
# Actual import
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_all_seed_keywords.py \
|
||||
--base-path /data/app/igny8/KW_DB
|
||||
```
|
||||
|
||||
**Options:**
|
||||
- `--base-path` - Base path to KW_DB folder (default: /data/app/igny8/KW_DB)
|
||||
- `--dry-run` - Preview without saving to database
|
||||
- `--verbose` - Show detailed progress for each keyword
|
||||
|
||||
---
|
||||
|
||||
## 📊 CSV File Format
|
||||
|
||||
Expected CSV columns:
|
||||
- **Keyword** (required) - The keyword text
|
||||
- **Country** (optional) - Country code (default: US)
|
||||
- **Volume** (optional) - Search volume (default: 0)
|
||||
- **Difficulty** (optional) - Keyword difficulty 0-100 (default: 0)
|
||||
- **CPC** (ignored) - Not imported
|
||||
- **Parent Keyword** (ignored) - Not imported
|
||||
|
||||
Example:
|
||||
```csv
|
||||
Keyword,Country,Volume,Difficulty,CPC,Parent Keyword
|
||||
physical therapy,us,12000,45,3.20,
|
||||
tens unit,us,5000,32,2.50,physical therapy
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Duplicate Handling
|
||||
|
||||
**Duplicate Check:** `keyword + country` (case-insensitive) within same industry+sector
|
||||
|
||||
- If a keyword with the same country already exists in the same industry+sector → **SKIPS import**
|
||||
- Example: "physical therapy [US]" in "HealthCare Medical > Physiotherapy Rehabilitation" will be skipped if already exists
|
||||
|
||||
---
|
||||
|
||||
## 🗄️ Database Models
|
||||
|
||||
### Industry
|
||||
- `name` - Industry name (e.g., "HealthCare Medical")
|
||||
- `slug` - URL-friendly slug (e.g., "healthcare-medical")
|
||||
- `is_active` - Active status (default: True)
|
||||
|
||||
### IndustrySector
|
||||
- `name` - Sector name (e.g., "Physiotherapy Rehabilitation")
|
||||
- `slug` - URL-friendly slug (e.g., "physiotherapy-rehabilitation")
|
||||
- `industry` - Foreign key to Industry
|
||||
- `is_active` - Active status (default: True)
|
||||
|
||||
### SeedKeyword
|
||||
- `keyword` - Keyword text
|
||||
- `industry` - Foreign key to Industry
|
||||
- `sector` - Foreign key to IndustrySector
|
||||
- `country` - Country code (e.g., "US")
|
||||
- `volume` - Search volume
|
||||
- `difficulty` - Keyword difficulty (0-100)
|
||||
- `is_active` - Active status (default: True)
|
||||
|
||||
**Unique Constraint:** `keyword + industry + sector` (at model level)
|
||||
**Script Duplicate Check:** `keyword + country + industry + sector` (stricter than model)
|
||||
|
||||
---
|
||||
|
||||
## ✅ Verification
|
||||
|
||||
After import, verify the data:
|
||||
|
||||
```bash
|
||||
# Check counts in Django shell
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell
|
||||
|
||||
>>> from igny8_core.auth.models import Industry, IndustrySector, SeedKeyword
|
||||
>>> Industry.objects.count()
|
||||
>>> IndustrySector.objects.count()
|
||||
>>> SeedKeyword.objects.count()
|
||||
>>> SeedKeyword.objects.filter(industry__name="HealthCare Medical").count()
|
||||
```
|
||||
|
||||
Or check in Django admin:
|
||||
- Industries: `/admin/auth/industry/`
|
||||
- Sectors: `/admin/auth/industrysector/`
|
||||
- Keywords: `/admin/auth/seedkeyword/`
|
||||
|
||||
---
|
||||
|
||||
## 🐛 Troubleshooting
|
||||
|
||||
### Issue: "ModuleNotFoundError: No module named 'igny8_core'"
|
||||
**Solution:** Script must run inside Docker container:
|
||||
```bash
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend python3 /app/scripts/...
|
||||
```
|
||||
|
||||
### Issue: "CSV file not found"
|
||||
**Solution:** Use full path inside container: `/data/app/igny8/KW_DB/...`
|
||||
|
||||
### Issue: Keywords not importing (showing as duplicates)
|
||||
**Solution:** Check if keywords already exist:
|
||||
```bash
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell
|
||||
|
||||
>>> from igny8_core.auth.models import SeedKeyword
|
||||
>>> SeedKeyword.objects.filter(keyword__iexact="physical therapy", country="US")
|
||||
```
|
||||
|
||||
### Issue: Want to re-import after cleaning database
|
||||
**Solution:** Delete existing keywords first:
|
||||
```bash
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell
|
||||
|
||||
>>> from igny8_core.auth.models import SeedKeyword
|
||||
>>> SeedKeyword.objects.all().delete() # Delete all keywords
|
||||
>>> SeedKeyword.objects.filter(industry__slug="healthcare-medical").delete() # Delete specific industry
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📈 Import Statistics
|
||||
|
||||
The scripts provide detailed statistics:
|
||||
- Total rows processed
|
||||
- Keywords imported successfully
|
||||
- Duplicates skipped (keyword + country)
|
||||
- Invalid rows skipped (empty keywords, bad data)
|
||||
- Errors encountered
|
||||
- Industries/Sectors created
|
||||
|
||||
Example output:
|
||||
```
|
||||
======================================================================
|
||||
IMPORT SUMMARY
|
||||
======================================================================
|
||||
Total rows processed: 4,523
|
||||
✓ Imported: 4,201
|
||||
⊘ Skipped (duplicate): 280
|
||||
⊘ Skipped (invalid): 37
|
||||
✗ Errors: 5
|
||||
======================================================================
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Quick Start Guide
|
||||
|
||||
1. **Test with single file first:**
|
||||
```bash
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_seed_keywords_single.py \
|
||||
--csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
|
||||
--industry "HealthCare Medical" \
|
||||
--sector "Physiotherapy Rehabilitation" \
|
||||
--dry-run
|
||||
```
|
||||
|
||||
2. **If successful, remove `--dry-run` and run actual import:**
|
||||
```bash
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_seed_keywords_single.py \
|
||||
--csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
|
||||
--industry "HealthCare Medical" \
|
||||
--sector "Physiotherapy Rehabilitation"
|
||||
```
|
||||
|
||||
3. **Verify in Django admin:** `/admin/auth/seedkeyword/`
|
||||
|
||||
4. **Import all files:**
|
||||
```bash
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_all_seed_keywords.py \
|
||||
--base-path /data/app/igny8/KW_DB \
|
||||
--dry-run
|
||||
```
|
||||
|
||||
5. **If successful, run actual bulk import:**
|
||||
```bash
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \
|
||||
python3 /app/scripts/import_all_seed_keywords.py \
|
||||
--base-path /data/app/igny8/KW_DB
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📝 Notes
|
||||
|
||||
- Scripts automatically create Industries and Sectors if they don't exist
|
||||
- Folder names are converted to display names (underscores → spaces)
|
||||
- Slugs are auto-generated from names
|
||||
- All imports happen within transactions for data integrity
|
||||
- Dry-run mode uses transaction rollback (no database changes)
|
||||
- Empty or invalid CSV rows are skipped with warnings
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Related Documentation
|
||||
|
||||
- [Database Models](/docs/90-REFERENCE/MODELS.md)
|
||||
- [Django Admin Access](/docs/90-REFERENCE/DJANGO-ADMIN-ACCESS-GUIDE.md)
|
||||
- [System Architecture](/docs/00-SYSTEM/ARCHITECTURE.md)
|
||||
|
||||
---
|
||||
|
||||
**Author:** IGNY8 Team
|
||||
**Created:** January 13, 2026
|
||||
**Last Updated:** January 13, 2026
|
||||
388
backend/scripts/import_all_seed_keywords.py
Normal file
388
backend/scripts/import_all_seed_keywords.py
Normal file
@@ -0,0 +1,388 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Import All Seed Keywords from KW_DB Folder Structure
|
||||
|
||||
This script automatically scans the KW_DB folder structure and imports all CSV files.
|
||||
It extracts Industry and Sector from folder names and imports keywords with duplicate checking.
|
||||
|
||||
FOLDER STRUCTURE EXPECTED:
|
||||
/data/app/igny8/KW_DB/
|
||||
{Industry}/
|
||||
{Sector}/
|
||||
*.csv files
|
||||
|
||||
DUPLICATE HANDLING:
|
||||
- Checks: keyword + country (case-insensitive)
|
||||
- If duplicate exists in same industry+sector: SKIPS import
|
||||
|
||||
Usage:
|
||||
# Dry run (preview all imports)
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \\
|
||||
python3 /app/scripts/import_all_seed_keywords.py \\
|
||||
--base-path /data/app/igny8/KW_DB \\
|
||||
--dry-run --verbose
|
||||
|
||||
# Actual import
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \\
|
||||
python3 /app/scripts/import_all_seed_keywords.py \\
|
||||
--base-path /data/app/igny8/KW_DB
|
||||
|
||||
Author: IGNY8 Team
|
||||
Date: January 13, 2026
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import csv
|
||||
import argparse
|
||||
import django
|
||||
from pathlib import Path
|
||||
|
||||
# Change to app directory for Django imports
|
||||
sys.path.insert(0, '/app')
|
||||
os.chdir('/app')
|
||||
|
||||
# Setup Django
|
||||
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'igny8_core.settings')
|
||||
django.setup()
|
||||
|
||||
from django.utils.text import slugify
|
||||
from django.db import transaction
|
||||
from igny8_core.auth.models import Industry, IndustrySector, SeedKeyword
|
||||
|
||||
|
||||
class BulkKeywordImporter:
|
||||
"""Import keywords from entire folder structure"""
|
||||
|
||||
def __init__(self, base_path, dry_run=False, verbose=False):
|
||||
self.base_path = Path(base_path)
|
||||
self.dry_run = dry_run
|
||||
self.verbose = verbose
|
||||
self.global_stats = {
|
||||
'total_files': 0,
|
||||
'total_processed': 0,
|
||||
'total_imported': 0,
|
||||
'total_skipped_duplicate': 0,
|
||||
'total_skipped_invalid': 0,
|
||||
'total_errors': 0,
|
||||
'industries_created': 0,
|
||||
'sectors_created': 0
|
||||
}
|
||||
self.file_stats = []
|
||||
|
||||
def log(self, message, force=False):
|
||||
"""Print message if verbose or forced"""
|
||||
if self.verbose or force:
|
||||
print(message)
|
||||
|
||||
def get_or_create_industry(self, name):
|
||||
"""Get or create Industry record"""
|
||||
slug = slugify(name)
|
||||
|
||||
if self.dry_run:
|
||||
self.log(f"[DRY RUN] Would get/create Industry: {name} (slug: {slug})")
|
||||
class MockIndustry:
|
||||
def __init__(self):
|
||||
self.id = 0
|
||||
self.name = name
|
||||
self.slug = slug
|
||||
return MockIndustry(), False
|
||||
|
||||
industry, created = Industry.objects.get_or_create(
|
||||
slug=slug,
|
||||
defaults={
|
||||
'name': name,
|
||||
'is_active': True,
|
||||
'description': f'Auto-imported from KW_DB'
|
||||
}
|
||||
)
|
||||
|
||||
if created:
|
||||
self.log(f"✓ Created Industry: {name}", force=True)
|
||||
self.global_stats['industries_created'] += 1
|
||||
else:
|
||||
self.log(f"✓ Found existing Industry: {name}")
|
||||
|
||||
return industry, created
|
||||
|
||||
def get_or_create_sector(self, industry, name):
|
||||
"""Get or create IndustrySector record"""
|
||||
slug = slugify(name)
|
||||
|
||||
if self.dry_run:
|
||||
self.log(f"[DRY RUN] Would get/create Sector: {name} (slug: {slug})")
|
||||
class MockSector:
|
||||
def __init__(self):
|
||||
self.id = 0
|
||||
self.name = name
|
||||
self.slug = slug
|
||||
return MockSector(), False
|
||||
|
||||
sector, created = IndustrySector.objects.get_or_create(
|
||||
industry=industry,
|
||||
slug=slug,
|
||||
defaults={
|
||||
'name': name,
|
||||
'is_active': True,
|
||||
'description': f'Auto-imported from KW_DB'
|
||||
}
|
||||
)
|
||||
|
||||
if created:
|
||||
self.log(f" ✓ Created Sector: {name}", force=True)
|
||||
self.global_stats['sectors_created'] += 1
|
||||
else:
|
||||
self.log(f" ✓ Found existing Sector: {name}")
|
||||
|
||||
return sector, created
|
||||
|
||||
def is_duplicate(self, keyword, country, industry, sector):
|
||||
"""
|
||||
Check if keyword already exists with same country in this industry+sector.
|
||||
Duplicate check: keyword + country (case-insensitive)
|
||||
"""
|
||||
if self.dry_run:
|
||||
return False # Skip duplicate check in dry run
|
||||
|
||||
exists = SeedKeyword.objects.filter(
|
||||
keyword__iexact=keyword,
|
||||
country=country,
|
||||
industry=industry,
|
||||
sector=sector
|
||||
).exists()
|
||||
|
||||
return exists
|
||||
|
||||
def parse_csv_row(self, row):
|
||||
"""Parse CSV row and extract keyword data"""
|
||||
try:
|
||||
keyword = row.get('Keyword', '').strip()
|
||||
if not keyword:
|
||||
return None
|
||||
|
||||
# Parse country (default to US)
|
||||
country_raw = row.get('Country', 'us').strip().upper()
|
||||
if not country_raw:
|
||||
country_raw = 'US'
|
||||
|
||||
# Parse volume (default to 0)
|
||||
volume_raw = row.get('Volume', '0').strip()
|
||||
try:
|
||||
volume = int(volume_raw) if volume_raw else 0
|
||||
except (ValueError, TypeError):
|
||||
volume = 0
|
||||
|
||||
# Parse difficulty (default to 0, clamp to 0-100)
|
||||
difficulty_raw = row.get('Difficulty', '0').strip()
|
||||
try:
|
||||
difficulty = int(difficulty_raw) if difficulty_raw else 0
|
||||
difficulty = max(0, min(100, difficulty)) # Clamp to 0-100
|
||||
except (ValueError, TypeError):
|
||||
difficulty = 0
|
||||
|
||||
return {
|
||||
'keyword': keyword,
|
||||
'country': country_raw,
|
||||
'volume': volume,
|
||||
'difficulty': difficulty
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
self.log(f" ⚠ Error parsing row: {e}")
|
||||
return None
|
||||
|
||||
def import_csv_file(self, csv_path, industry, sector):
|
||||
"""Import keywords from single CSV file"""
|
||||
file_stats = {
|
||||
'file': csv_path.name,
|
||||
'processed': 0,
|
||||
'imported': 0,
|
||||
'skipped_duplicate': 0,
|
||||
'skipped_invalid': 0,
|
||||
'errors': 0
|
||||
}
|
||||
|
||||
try:
|
||||
with open(csv_path, 'r', encoding='utf-8') as f:
|
||||
reader = csv.DictReader(f)
|
||||
|
||||
for row in reader:
|
||||
file_stats['processed'] += 1
|
||||
self.global_stats['total_processed'] += 1
|
||||
|
||||
keyword_data = self.parse_csv_row(row)
|
||||
if not keyword_data:
|
||||
file_stats['skipped_invalid'] += 1
|
||||
self.global_stats['total_skipped_invalid'] += 1
|
||||
continue
|
||||
|
||||
keyword = keyword_data['keyword']
|
||||
country = keyword_data['country']
|
||||
volume = keyword_data['volume']
|
||||
difficulty = keyword_data['difficulty']
|
||||
|
||||
# Check for duplicate (keyword + country)
|
||||
if self.is_duplicate(keyword, country, industry, sector):
|
||||
self.log(f" ⊘ SKIP (duplicate): {keyword} [{country}]")
|
||||
file_stats['skipped_duplicate'] += 1
|
||||
self.global_stats['total_skipped_duplicate'] += 1
|
||||
continue
|
||||
|
||||
if self.dry_run:
|
||||
self.log(f" [DRY RUN] Would import: {keyword} [{country}] (vol:{volume}, diff:{difficulty})")
|
||||
else:
|
||||
# Create keyword
|
||||
SeedKeyword.objects.create(
|
||||
keyword=keyword,
|
||||
industry=industry,
|
||||
sector=sector,
|
||||
volume=volume,
|
||||
difficulty=difficulty,
|
||||
country=country,
|
||||
is_active=True
|
||||
)
|
||||
self.log(f" ✓ Imported: {keyword} [{country}]")
|
||||
|
||||
file_stats['imported'] += 1
|
||||
self.global_stats['total_imported'] += 1
|
||||
|
||||
except Exception as e:
|
||||
self.log(f" ❌ Error processing file: {e}", force=True)
|
||||
file_stats['errors'] += 1
|
||||
self.global_stats['total_errors'] += 1
|
||||
|
||||
return file_stats
|
||||
|
||||
def scan_and_import(self):
|
||||
"""Scan folder structure and import all CSV files"""
|
||||
|
||||
if not self.base_path.exists():
|
||||
print(f"❌ ERROR: Base path not found: {self.base_path}")
|
||||
return False
|
||||
|
||||
print(f"\n{'='*70}")
|
||||
print(f"BULK IMPORT: ALL SEED KEYWORDS FROM KW_DB")
|
||||
print(f"{'='*70}")
|
||||
print(f"Base Path: {self.base_path}")
|
||||
if self.dry_run:
|
||||
print("Mode: DRY RUN (no database changes)")
|
||||
print(f"{'='*70}\n")
|
||||
|
||||
# Scan directory structure: Industry / Sector / *.csv
|
||||
for industry_dir in sorted(self.base_path.iterdir()):
|
||||
if not industry_dir.is_dir():
|
||||
continue
|
||||
|
||||
industry_name = industry_dir.name.replace('_', ' ')
|
||||
|
||||
print(f"\n📁 Industry: {industry_name}")
|
||||
print(f"{'─'*70}")
|
||||
|
||||
industry, _ = self.get_or_create_industry(industry_name)
|
||||
|
||||
for sector_dir in sorted(industry_dir.iterdir()):
|
||||
if not sector_dir.is_dir():
|
||||
continue
|
||||
|
||||
sector_name = sector_dir.name.replace('_', ' ')
|
||||
|
||||
print(f"\n 📂 Sector: {sector_name}")
|
||||
|
||||
sector, _ = self.get_or_create_sector(industry, sector_name)
|
||||
|
||||
# Find all CSV files
|
||||
csv_files = list(sector_dir.glob('*.csv'))
|
||||
|
||||
if not csv_files:
|
||||
print(f" ⚠ No CSV files found")
|
||||
continue
|
||||
|
||||
print(f" Found {len(csv_files)} CSV files")
|
||||
|
||||
# Import each CSV file
|
||||
with transaction.atomic():
|
||||
for csv_file in sorted(csv_files):
|
||||
self.global_stats['total_files'] += 1
|
||||
print(f"\n 📄 {csv_file.name}")
|
||||
|
||||
file_stats = self.import_csv_file(csv_file, industry, sector)
|
||||
self.file_stats.append(file_stats)
|
||||
|
||||
print(f" ✓ {file_stats['imported']} imported, "
|
||||
f"⊘ {file_stats['skipped_duplicate']} duplicates, "
|
||||
f"⊘ {file_stats['skipped_invalid']} invalid")
|
||||
|
||||
# Rollback in dry run mode
|
||||
if self.dry_run:
|
||||
transaction.set_rollback(True)
|
||||
|
||||
# Print summary
|
||||
print(f"\n\n{'='*70}")
|
||||
print(f"GLOBAL IMPORT SUMMARY")
|
||||
print(f"{'='*70}")
|
||||
print(f"Total CSV files: {self.global_stats['total_files']}")
|
||||
print(f"Industries created: {self.global_stats['industries_created']}")
|
||||
print(f"Sectors created: {self.global_stats['sectors_created']}")
|
||||
print(f"─────────────────────────────────────────────────────────────────────")
|
||||
print(f"Total rows processed: {self.global_stats['total_processed']}")
|
||||
print(f"✓ Total imported: {self.global_stats['total_imported']}")
|
||||
print(f"⊘ Skipped (duplicate): {self.global_stats['total_skipped_duplicate']}")
|
||||
print(f"⊘ Skipped (invalid): {self.global_stats['total_skipped_invalid']}")
|
||||
print(f"✗ Total errors: {self.global_stats['total_errors']}")
|
||||
print(f"{'='*70}\n")
|
||||
|
||||
if self.dry_run:
|
||||
print("ℹ This was a DRY RUN - no data was saved to database")
|
||||
print("Remove --dry-run flag to perform actual import\n")
|
||||
else:
|
||||
print("✓ Import completed successfully!")
|
||||
print(f"✓ Check Django admin: /admin/auth/seedkeyword/\n")
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Import all seed keywords from KW_DB folder structure',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Dry run (preview all imports)
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \\
|
||||
python3 /app/scripts/import_all_seed_keywords.py \\
|
||||
--base-path /data/app/igny8/KW_DB \\
|
||||
--dry-run --verbose
|
||||
|
||||
# Actual import
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \\
|
||||
python3 /app/scripts/import_all_seed_keywords.py \\
|
||||
--base-path /data/app/igny8/KW_DB
|
||||
|
||||
Folder Structure Expected:
|
||||
/data/app/igny8/KW_DB/
|
||||
HealthCare_Medical/
|
||||
Physiotherapy_Rehabilitation/
|
||||
google_us_physical-therapy*.csv
|
||||
google_us_muscle-stimulator*.csv
|
||||
...
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('--base-path', default='/data/app/igny8/KW_DB',
|
||||
help='Base path to KW_DB folder (default: /data/app/igny8/KW_DB)')
|
||||
parser.add_argument('--dry-run', action='store_true',
|
||||
help='Preview without saving to database')
|
||||
parser.add_argument('--verbose', action='store_true',
|
||||
help='Show detailed progress for each keyword')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Create importer and run
|
||||
importer = BulkKeywordImporter(args.base_path, dry_run=args.dry_run, verbose=args.verbose)
|
||||
success = importer.scan_and_import()
|
||||
|
||||
sys.exit(0 if success else 1)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
321
backend/scripts/import_seed_keywords_single.py
Normal file
321
backend/scripts/import_seed_keywords_single.py
Normal file
@@ -0,0 +1,321 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Import Seed Keywords from Single CSV File
|
||||
|
||||
This script imports keywords from a single CSV file into the IGNY8 global keywords database.
|
||||
Use this for testing before running full import.
|
||||
|
||||
DUPLICATE HANDLING:
|
||||
- Checks: keyword + country (case-insensitive)
|
||||
- If duplicate exists in same industry+sector: SKIPS import
|
||||
|
||||
Usage:
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \\
|
||||
python3 /app/scripts/import_seed_keywords_single.py \\
|
||||
--csv /app/../KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_physical-therapy_matching-terms_2025-12-19_04-25-15.csv \\
|
||||
--industry "HealthCare Medical" \\
|
||||
--sector "Physiotherapy Rehabilitation" \\
|
||||
--dry-run
|
||||
|
||||
Author: IGNY8 Team
|
||||
Date: January 13, 2026
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import csv
|
||||
import argparse
|
||||
import django
|
||||
from pathlib import Path
|
||||
|
||||
# Change to app directory for Django imports
|
||||
sys.path.insert(0, '/app')
|
||||
os.chdir('/app')
|
||||
|
||||
# Setup Django
|
||||
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'igny8_core.settings')
|
||||
django.setup()
|
||||
|
||||
from django.utils.text import slugify
|
||||
from django.db import transaction
|
||||
from igny8_core.auth.models import Industry, IndustrySector, SeedKeyword
|
||||
|
||||
|
||||
class KeywordImporter:
|
||||
"""Import keywords from CSV into database"""
|
||||
|
||||
def __init__(self, dry_run=False, verbose=False):
|
||||
self.dry_run = dry_run
|
||||
self.verbose = verbose
|
||||
self.stats = {
|
||||
'processed': 0,
|
||||
'imported': 0,
|
||||
'skipped_duplicate': 0,
|
||||
'skipped_invalid': 0,
|
||||
'errors': 0
|
||||
}
|
||||
|
||||
def log(self, message, force=False):
|
||||
"""Print message if verbose or forced"""
|
||||
if self.verbose or force:
|
||||
print(message)
|
||||
|
||||
def get_or_create_industry(self, name):
|
||||
"""Get or create Industry record"""
|
||||
slug = slugify(name)
|
||||
|
||||
if self.dry_run:
|
||||
self.log(f"[DRY RUN] Would get/create Industry: {name} (slug: {slug})")
|
||||
# Return a mock object for dry run
|
||||
class MockIndustry:
|
||||
def __init__(self):
|
||||
self.id = 0
|
||||
self.name = name
|
||||
self.slug = slug
|
||||
return MockIndustry(), False
|
||||
|
||||
industry, created = Industry.objects.get_or_create(
|
||||
slug=slug,
|
||||
defaults={
|
||||
'name': name,
|
||||
'is_active': True,
|
||||
'description': f'Auto-imported from KW_DB'
|
||||
}
|
||||
)
|
||||
|
||||
if created:
|
||||
self.log(f"✓ Created Industry: {name}", force=True)
|
||||
else:
|
||||
self.log(f"✓ Found existing Industry: {name}")
|
||||
|
||||
return industry, created
|
||||
|
||||
def get_or_create_sector(self, industry, name):
|
||||
"""Get or create IndustrySector record"""
|
||||
slug = slugify(name)
|
||||
|
||||
if self.dry_run:
|
||||
self.log(f"[DRY RUN] Would get/create Sector: {name} (slug: {slug})")
|
||||
class MockSector:
|
||||
def __init__(self):
|
||||
self.id = 0
|
||||
self.name = name
|
||||
self.slug = slug
|
||||
return MockSector(), False
|
||||
|
||||
sector, created = IndustrySector.objects.get_or_create(
|
||||
industry=industry,
|
||||
slug=slug,
|
||||
defaults={
|
||||
'name': name,
|
||||
'is_active': True,
|
||||
'description': f'Auto-imported from KW_DB'
|
||||
}
|
||||
)
|
||||
|
||||
if created:
|
||||
self.log(f" ✓ Created Sector: {name}", force=True)
|
||||
else:
|
||||
self.log(f" ✓ Found existing Sector: {name}")
|
||||
|
||||
return sector, created
|
||||
|
||||
def is_duplicate(self, keyword, country, industry, sector):
|
||||
"""
|
||||
Check if keyword already exists with same country in this industry+sector.
|
||||
Duplicate check: keyword + country (case-insensitive)
|
||||
"""
|
||||
if self.dry_run:
|
||||
return False # Skip duplicate check in dry run
|
||||
|
||||
exists = SeedKeyword.objects.filter(
|
||||
keyword__iexact=keyword,
|
||||
country=country,
|
||||
industry=industry,
|
||||
sector=sector
|
||||
).exists()
|
||||
|
||||
return exists
|
||||
|
||||
def import_keyword(self, keyword_data, industry, sector):
|
||||
"""Import single keyword record"""
|
||||
keyword = keyword_data['keyword']
|
||||
country = keyword_data['country']
|
||||
volume = keyword_data['volume']
|
||||
difficulty = keyword_data['difficulty']
|
||||
|
||||
# Check for duplicate (keyword + country)
|
||||
if self.is_duplicate(keyword, country, industry, sector):
|
||||
self.log(f" ⊘ SKIP (duplicate): {keyword} [{country}]")
|
||||
self.stats['skipped_duplicate'] += 1
|
||||
return False
|
||||
|
||||
if self.dry_run:
|
||||
self.log(f" [DRY RUN] Would import: {keyword} [{country}] (vol:{volume}, diff:{difficulty})")
|
||||
return True
|
||||
|
||||
# Create keyword
|
||||
SeedKeyword.objects.create(
|
||||
keyword=keyword,
|
||||
industry=industry,
|
||||
sector=sector,
|
||||
volume=volume,
|
||||
difficulty=difficulty,
|
||||
country=country,
|
||||
is_active=True
|
||||
)
|
||||
|
||||
self.log(f" ✓ Imported: {keyword} [{country}] (vol:{volume}, diff:{difficulty})")
|
||||
return True
|
||||
|
||||
def parse_csv_row(self, row):
|
||||
"""Parse CSV row and extract keyword data"""
|
||||
try:
|
||||
keyword = row.get('Keyword', '').strip()
|
||||
if not keyword:
|
||||
return None
|
||||
|
||||
# Parse country (default to US)
|
||||
country_raw = row.get('Country', 'us').strip().upper()
|
||||
if not country_raw:
|
||||
country_raw = 'US'
|
||||
|
||||
# Parse volume (default to 0)
|
||||
volume_raw = row.get('Volume', '0').strip()
|
||||
try:
|
||||
volume = int(volume_raw) if volume_raw else 0
|
||||
except (ValueError, TypeError):
|
||||
volume = 0
|
||||
|
||||
# Parse difficulty (default to 0, clamp to 0-100)
|
||||
difficulty_raw = row.get('Difficulty', '0').strip()
|
||||
try:
|
||||
difficulty = int(difficulty_raw) if difficulty_raw else 0
|
||||
difficulty = max(0, min(100, difficulty)) # Clamp to 0-100
|
||||
except (ValueError, TypeError):
|
||||
difficulty = 0
|
||||
|
||||
return {
|
||||
'keyword': keyword,
|
||||
'country': country_raw,
|
||||
'volume': volume,
|
||||
'difficulty': difficulty
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
self.log(f" ⚠ Error parsing row: {e}")
|
||||
return None
|
||||
|
||||
def import_csv(self, csv_path, industry_name, sector_name):
|
||||
"""Import keywords from CSV file"""
|
||||
csv_path = Path(csv_path)
|
||||
|
||||
if not csv_path.exists():
|
||||
print(f"❌ ERROR: CSV file not found: {csv_path}")
|
||||
return False
|
||||
|
||||
print(f"\n{'='*70}")
|
||||
print(f"IMPORTING SEED KEYWORDS FROM CSV")
|
||||
print(f"{'='*70}")
|
||||
print(f"File: {csv_path.name}")
|
||||
print(f"Industry: {industry_name}")
|
||||
print(f"Sector: {sector_name}")
|
||||
if self.dry_run:
|
||||
print("Mode: DRY RUN (no database changes)")
|
||||
print(f"{'='*70}\n")
|
||||
|
||||
# Get or create Industry and Sector
|
||||
industry, _ = self.get_or_create_industry(industry_name)
|
||||
sector, _ = self.get_or_create_sector(industry, sector_name)
|
||||
|
||||
# Read and import CSV
|
||||
print(f"Processing keywords...\n")
|
||||
|
||||
try:
|
||||
with transaction.atomic():
|
||||
with open(csv_path, 'r', encoding='utf-8') as f:
|
||||
reader = csv.DictReader(f)
|
||||
|
||||
for row in reader:
|
||||
self.stats['processed'] += 1
|
||||
|
||||
keyword_data = self.parse_csv_row(row)
|
||||
if not keyword_data:
|
||||
self.stats['skipped_invalid'] += 1
|
||||
continue
|
||||
|
||||
if self.import_keyword(keyword_data, industry, sector):
|
||||
self.stats['imported'] += 1
|
||||
|
||||
# Rollback in dry run mode
|
||||
if self.dry_run:
|
||||
transaction.set_rollback(True)
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n❌ ERROR: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
self.stats['errors'] += 1
|
||||
return False
|
||||
|
||||
# Print summary
|
||||
print(f"\n{'='*70}")
|
||||
print(f"IMPORT SUMMARY")
|
||||
print(f"{'='*70}")
|
||||
print(f"Total rows processed: {self.stats['processed']}")
|
||||
print(f"✓ Imported: {self.stats['imported']}")
|
||||
print(f"⊘ Skipped (duplicate): {self.stats['skipped_duplicate']}")
|
||||
print(f"⊘ Skipped (invalid): {self.stats['skipped_invalid']}")
|
||||
print(f"✗ Errors: {self.stats['errors']}")
|
||||
print(f"{'='*70}\n")
|
||||
|
||||
if self.dry_run:
|
||||
print("ℹ This was a DRY RUN - no data was saved to database")
|
||||
print("Remove --dry-run flag to perform actual import\n")
|
||||
else:
|
||||
print("✓ Import completed successfully!")
|
||||
print(f"✓ Check Django admin: /admin/auth/seedkeyword/\n")
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Import seed keywords from single CSV file (with duplicate check: keyword+country)',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Dry run (preview only)
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \\
|
||||
python3 /app/scripts/import_seed_keywords_single.py \\
|
||||
--csv /app/../KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \\
|
||||
--industry "HealthCare Medical" \\
|
||||
--sector "Physiotherapy Rehabilitation" \\
|
||||
--dry-run --verbose
|
||||
|
||||
# Actual import
|
||||
docker compose -f docker-compose.app.yml exec igny8_backend \\
|
||||
python3 /app/scripts/import_seed_keywords_single.py \\
|
||||
--csv /app/../KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \\
|
||||
--industry "HealthCare Medical" \\
|
||||
--sector "Physiotherapy Rehabilitation"
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('--csv', required=True, help='Path to CSV file')
|
||||
parser.add_argument('--industry', required=True, help='Industry name')
|
||||
parser.add_argument('--sector', required=True, help='Sector name')
|
||||
parser.add_argument('--dry-run', action='store_true', help='Preview without saving to database')
|
||||
parser.add_argument('--verbose', action='store_true', help='Show detailed progress')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Create importer and run
|
||||
importer = KeywordImporter(dry_run=args.dry_run, verbose=args.verbose)
|
||||
success = importer.import_csv(args.csv, args.industry, args.sector)
|
||||
|
||||
sys.exit(0 if success else 1)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
Reference in New Issue
Block a user