Files
igny8/backend/scripts/README.md
2026-01-13 12:00:16 +00:00

8.3 KiB

IGNY8 Seed Keywords Import Scripts

This folder contains scripts for importing seed keywords from the KW_DB folder structure into the IGNY8 global keywords database.

📁 Folder Structure

/data/app/igny8/KW_DB/
    {Industry}/                    # e.g., HealthCare_Medical
        {Sector}/                  # e.g., Physiotherapy_Rehabilitation
            *.csv                  # Keyword CSV files

🔧 Available Scripts

1. import_seed_keywords_single.py

Import keywords from a single CSV file (for testing).

Usage:

# Dry run (preview only)
docker compose -f docker-compose.app.yml exec igny8_backend \
  python3 /app/scripts/import_seed_keywords_single.py \
  --csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
  --industry "HealthCare Medical" \
  --sector "Physiotherapy Rehabilitation" \
  --dry-run --verbose

# Actual import
docker compose -f docker-compose.app.yml exec igny8_backend \
  python3 /app/scripts/import_seed_keywords_single.py \
  --csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
  --industry "HealthCare Medical" \
  --sector "Physiotherapy Rehabilitation"

Options:

  • --csv - Path to CSV file (required)
  • --industry - Industry name (required)
  • --sector - Sector name (required)
  • --dry-run - Preview without saving to database
  • --verbose - Show detailed progress for each keyword

2. import_all_seed_keywords.py

Import keywords from all CSV files in the KW_DB folder structure.

Usage:

# Dry run (preview all imports)
docker compose -f docker-compose.app.yml exec igny8_backend \
  python3 /app/scripts/import_all_seed_keywords.py \
  --base-path /data/app/igny8/KW_DB \
  --dry-run

# Actual import
docker compose -f docker-compose.app.yml exec igny8_backend \
  python3 /app/scripts/import_all_seed_keywords.py \
  --base-path /data/app/igny8/KW_DB

Options:

  • --base-path - Base path to KW_DB folder (default: /data/app/igny8/KW_DB)
  • --dry-run - Preview without saving to database
  • --verbose - Show detailed progress for each keyword

📊 CSV File Format

Expected CSV columns:

  • Keyword (required) - The keyword text
  • Country (optional) - Country code (default: US)
  • Volume (optional) - Search volume (default: 0)
  • Difficulty (optional) - Keyword difficulty 0-100 (default: 0)
  • CPC (ignored) - Not imported
  • Parent Keyword (ignored) - Not imported

Example:

Keyword,Country,Volume,Difficulty,CPC,Parent Keyword
physical therapy,us,12000,45,3.20,
tens unit,us,5000,32,2.50,physical therapy

🔍 Duplicate Handling

Duplicate Check: keyword + country (case-insensitive) within same industry+sector

  • If a keyword with the same country already exists in the same industry+sector → SKIPS import
  • Example: "physical therapy [US]" in "HealthCare Medical > Physiotherapy Rehabilitation" will be skipped if already exists

🗄️ Database Models

Industry

  • name - Industry name (e.g., "HealthCare Medical")
  • slug - URL-friendly slug (e.g., "healthcare-medical")
  • is_active - Active status (default: True)

IndustrySector

  • name - Sector name (e.g., "Physiotherapy Rehabilitation")
  • slug - URL-friendly slug (e.g., "physiotherapy-rehabilitation")
  • industry - Foreign key to Industry
  • is_active - Active status (default: True)

SeedKeyword

  • keyword - Keyword text
  • industry - Foreign key to Industry
  • sector - Foreign key to IndustrySector
  • country - Country code (e.g., "US")
  • volume - Search volume
  • difficulty - Keyword difficulty (0-100)
  • is_active - Active status (default: True)

Unique Constraint: keyword + industry + sector (at model level) Script Duplicate Check: keyword + country + industry + sector (stricter than model)


Verification

After import, verify the data:

# Check counts in Django shell
docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell

>>> from igny8_core.auth.models import Industry, IndustrySector, SeedKeyword
>>> Industry.objects.count()
>>> IndustrySector.objects.count()
>>> SeedKeyword.objects.count()
>>> SeedKeyword.objects.filter(industry__name="HealthCare Medical").count()

Or check in Django admin:

  • Industries: /admin/auth/industry/
  • Sectors: /admin/auth/industrysector/
  • Keywords: /admin/auth/seedkeyword/

🐛 Troubleshooting

Issue: "ModuleNotFoundError: No module named 'igny8_core'"

Solution: Script must run inside Docker container:

docker compose -f docker-compose.app.yml exec igny8_backend python3 /app/scripts/...

Issue: "CSV file not found"

Solution: Use full path inside container: /data/app/igny8/KW_DB/...

Issue: Keywords not importing (showing as duplicates)

Solution: Check if keywords already exist:

docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell

>>> from igny8_core.auth.models import SeedKeyword
>>> SeedKeyword.objects.filter(keyword__iexact="physical therapy", country="US")

Issue: Want to re-import after cleaning database

Solution: Delete existing keywords first:

docker compose -f docker-compose.app.yml exec igny8_backend python3 manage.py shell

>>> from igny8_core.auth.models import SeedKeyword
>>> SeedKeyword.objects.all().delete()  # Delete all keywords
>>> SeedKeyword.objects.filter(industry__slug="healthcare-medical").delete()  # Delete specific industry

📈 Import Statistics

The scripts provide detailed statistics:

  • Total rows processed
  • Keywords imported successfully
  • Duplicates skipped (keyword + country)
  • Invalid rows skipped (empty keywords, bad data)
  • Errors encountered
  • Industries/Sectors created

Example output:

======================================================================
IMPORT SUMMARY
======================================================================
Total rows processed:  4,523
✓ Imported:            4,201
⊘ Skipped (duplicate): 280
⊘ Skipped (invalid):   37
✗ Errors:              5
======================================================================

🚀 Quick Start Guide

  1. Test with single file first:

    docker compose -f docker-compose.app.yml exec igny8_backend \
      python3 /app/scripts/import_seed_keywords_single.py \
      --csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
      --industry "HealthCare Medical" \
      --sector "Physiotherapy Rehabilitation" \
      --dry-run
    
  2. If successful, remove --dry-run and run actual import:

    docker compose -f docker-compose.app.yml exec igny8_backend \
      python3 /app/scripts/import_seed_keywords_single.py \
      --csv /data/app/igny8/KW_DB/HealthCare_Medical/Physiotherapy_Rehabilitation/google_us_muscle-stimulator_matching-terms_2025-12-19_04-25-32.csv \
      --industry "HealthCare Medical" \
      --sector "Physiotherapy Rehabilitation"
    
  3. Verify in Django admin: /admin/auth/seedkeyword/

  4. Import all files:

    docker compose -f docker-compose.app.yml exec igny8_backend \
      python3 /app/scripts/import_all_seed_keywords.py \
      --base-path /data/app/igny8/KW_DB \
      --dry-run
    
  5. If successful, run actual bulk import:

    docker compose -f docker-compose.app.yml exec igny8_backend \
      python3 /app/scripts/import_all_seed_keywords.py \
      --base-path /data/app/igny8/KW_DB
    

📝 Notes

  • Scripts automatically create Industries and Sectors if they don't exist
  • Folder names are converted to display names (underscores → spaces)
  • Slugs are auto-generated from names
  • All imports happen within transactions for data integrity
  • Dry-run mode uses transaction rollback (no database changes)
  • Empty or invalid CSV rows are skipped with warnings


Author: IGNY8 Team
Created: January 13, 2026
Last Updated: January 13, 2026