Files
igny8/v2/V2-Execution-Docs/02B-taxonomy-term-content.md
IGNY8 VPS (Salman) 0570052fec 1
2026-03-23 17:20:51 +00:00

643 lines
25 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# IGNY8 Phase 2: Taxonomy Term Content (02B)
## Rich Content Generation for Taxonomy Terms
**Document Version:** 1.0
**Date:** 2026-03-23
**Phase:** IGNY8 Phase 2 — Feature Expansion
**Status:** Build Ready
**Source of Truth:** Codebase at `/data/app/igny8/`
**Audience:** Claude Code, Backend Developers, Architects
---
## 1. CURRENT STATE
### Existing Taxonomy Infrastructure
The taxonomy system is partially built:
**ContentTaxonomy** (writer app, db_table=`igny8_content_taxonomies`):
- Stores taxonomy term references synced from WordPress
- Fields: `name`, `slug`, `external_id` (WP term ID), `taxonomy_type` (category/tag/product_cat/product_tag/attribute)
- No content generation — terms are metadata only (name + slug + external reference)
**ContentTaxonomyRelation** (writer app):
- Links `Content` to `ContentTaxonomy` (many-to-many through table)
- Allows assigning existing taxonomy terms to content pieces
**Content Model** (writer app, db_table=`igny8_content`):
- `content_type='taxonomy'` exists in CONTENT_TYPE_CHOICES but is unused by the generation pipeline
- CONTENT_STRUCTURE_CHOICES includes `category_archive`, `tag_archive`, `attribute_archive`
- `taxonomy_terms` ManyToManyField through ContentTaxonomyRelation
**Tasks Model** (writer app, db_table=`igny8_tasks`):
- `taxonomy_term` ForeignKey to ContentTaxonomy (nullable, db_column='taxonomy_id')
- Not used by automation pipeline — present as a field only
**SiteIntegration** (integration app):
- WordPress connections exist via `SiteIntegration` model
- `SyncEvent` logs operations but taxonomy sync is stubbed/incomplete
### What Doesn't Exist
- No content generation for taxonomy terms (categories, tags, attributes)
- No cluster mapping for taxonomy terms
- No WordPress → IGNY8 taxonomy sync (full fetch and reconcile)
- No IGNY8 → WordPress term content push
- No AI function for term content generation
- No admin interface for managing term-to-cluster mapping
---
## 2. WHAT TO BUILD
### Overview
Make taxonomy terms first-class SEO content pages by:
1. **Syncing terms from WordPress** — fetch all categories, tags, WooCommerce taxonomies
2. **Mapping terms to clusters** — automatic keyword-overlap + semantic matching
3. **Generating rich content** — AI-generated landing page content for each term
4. **Pushing content back** — sync generated content to WordPress term descriptions + meta
### Taxonomy Sync (WordPress → IGNY8)
Full bidirectional sync leveraging existing `SiteIntegration`:
**Fetch targets:**
- WordPress categories (`taxonomy_type='category'`)
- WordPress tags (`taxonomy_type='tag'`)
- WooCommerce product categories (`taxonomy_type='product_cat'`)
- WooCommerce product tags (`taxonomy_type='product_tag'`)
- WooCommerce product attributes (`taxonomy_type='attribute'`, e.g., `pa_color`, `pa_size`)
**Sync logic:**
1. Use existing `SiteIntegration.credentials_json` to authenticate WP REST API
2. Fetch all terms via `GET /wp-json/wp/v2/categories`, `/tags`, `/product_cat`, etc.
3. Reconcile: create new `ContentTaxonomy` records, update changed ones, flag deleted
4. Store parent/child hierarchy for categories
5. Log sync as `SyncEvent` with `event_type='metadata_sync'`
### Cluster Mapping Service
A shared service (`cluster_mapping_service.py`) that maps taxonomy terms to keyword clusters:
**Algorithm:**
| Factor | Weight | Method |
|--------|--------|--------|
| Keyword overlap | 40% | Compare term name + slug against cluster keywords |
| Semantic similarity | 40% | Embedding-based cosine similarity (term name vs cluster description) |
| Title match | 20% | Exact/partial match of term name in cluster name |
**Output per term:**
- `primary_cluster_id` — best-match cluster
- `secondary_cluster_ids` — additional related clusters (up to 3)
- `mapping_confidence` — 0.0 to 1.0 score
- `mapping_status`:
- `auto_mapped` (confidence ≥ 0.6) — assigned automatically
- `suggested` (confidence 0.30.6) — suggested for manual review
- `unmapped` (confidence < 0.3) — no good match found
### Term Content Generation
Each taxonomy term gets rich, SEO-optimized content:
**Generated sections:**
1. **H1 Title** — optimized for the term + primary cluster keywords
2. **Rich description** — 5001,500 words covering the topic
3. **FAQ section** — 58 questions and answers
4. **Related terms** — links to sibling/child terms
5. **Meta title** — 5060 characters
6. **Meta description** — 150160 characters
**AI function:** `GenerateTermContentFunction(BaseAIFunction)`:
- Input: term name, taxonomy_type, assigned cluster keywords, existing content titles under term, parent/sibling terms for context
- Output: structured JSON with sections (intro, overview, FAQ, related)
- Uses `ContentTypeTemplate` from 02A where `content_type='taxonomy'`
### Term Content Sync (IGNY8 → WordPress)
Push generated content to WordPress:
- Custom WP REST endpoint: `POST /wp-json/igny8/v1/terms/{id}/content`
- Stores in WordPress term meta:
- `_igny8_term_content` — HTML content
- `_igny8_term_faq` — JSON FAQ array
- `_igny8_term_meta_title` — SEO title
- `_igny8_term_meta_description` — SEO description
- Updates native WordPress term description with the generated content
- Schema: CollectionPage with itemListElement for listed content
---
## 3. DATA MODELS & APIs
### Modified Models
**ContentTaxonomy** (db_table=`igny8_content_taxonomies`) — add fields:
```python
# Cluster mapping
cluster = models.ForeignKey(
'planner.Clusters', on_delete=models.SET_NULL,
null=True, blank=True, related_name='taxonomy_terms',
help_text="Primary cluster this term maps to"
)
secondary_cluster_ids = models.JSONField(
default=list, blank=True,
help_text="Additional related cluster IDs"
)
mapping_confidence = models.FloatField(
default=0.0,
help_text="Cluster mapping confidence score 0.0-1.0"
)
mapping_status = models.CharField(
max_length=20, default='unmapped',
choices=[
('auto_mapped', 'Auto Mapped'),
('manual_mapped', 'Manual Mapped'),
('suggested', 'Suggested'),
('unmapped', 'Unmapped'),
],
db_index=True
)
# Generated content
term_content = models.TextField(
blank=True, default='',
help_text="Generated rich HTML content for the term page"
)
term_faq = models.JSONField(
default=list, blank=True,
help_text="Generated FAQ: [{question, answer}]"
)
meta_title = models.CharField(max_length=255, blank=True, default='')
meta_description = models.TextField(blank=True, default='')
content_status = models.CharField(
max_length=20, default='none',
choices=[
('none', 'No Content'),
('generating', 'Generating'),
('generated', 'Generated'),
('published', 'Published to WP'),
],
db_index=True
)
# Hierarchy
parent_term = models.ForeignKey(
'self', on_delete=models.SET_NULL,
null=True, blank=True, related_name='child_terms'
)
term_count = models.IntegerField(
default=0,
help_text="Number of posts/products using this term"
)
# Sync tracking
last_synced_from_wp = models.DateTimeField(null=True, blank=True)
last_pushed_to_wp = models.DateTimeField(null=True, blank=True)
```
### New AI Function
```python
# igny8_core/ai/functions/generate_term_content.py
class GenerateTermContentFunction(BaseAIFunction):
"""Generate rich SEO content for taxonomy terms."""
def get_name(self) -> str:
return 'generate_term_content'
def get_metadata(self) -> Dict:
return {
'display_name': 'Generate Term Content',
'description': 'Generate rich landing page content for taxonomy terms',
'phases': {
'INIT': 'Initializing...',
'PREP': 'Loading term and cluster data...',
'AI_CALL': 'Generating term content...',
'PARSE': 'Parsing response...',
'SAVE': 'Saving term content...',
'DONE': 'Complete!'
}
}
def get_max_items(self) -> int:
return 10 # Process up to 10 terms per batch
def validate(self, payload: dict, account=None) -> Dict:
term_ids = payload.get('ids', [])
if not term_ids:
return {'valid': False, 'error': 'No term IDs provided'}
return {'valid': True}
def prepare(self, payload: dict, account=None) -> List:
term_ids = payload.get('ids', [])
terms = ContentTaxonomy.objects.filter(
id__in=term_ids,
account=account
).select_related('cluster', 'parent_term')
return list(terms)
def build_prompt(self, data: Any, account=None) -> str:
term = data # Single term
# Build context: cluster keywords, existing content, siblings
cluster_keywords = []
if term.cluster:
cluster_keywords = list(
term.cluster.keywords.values_list('keyword', flat=True)[:20]
)
sibling_terms = list(
ContentTaxonomy.objects.filter(
taxonomy_type=term.taxonomy_type,
site=term.site,
parent_term=term.parent_term
).exclude(id=term.id).values_list('name', flat=True)[:10]
)
# Use ContentTypeTemplate from 02A if available
# Fall back to default term prompt
return self._build_term_prompt(term, cluster_keywords, sibling_terms)
def parse_response(self, response: str, step_tracker=None) -> Dict:
# Parse structured JSON: {content_html, faq, meta_title, meta_description}
pass
def save_output(self, parsed, original_data, account=None, **kwargs) -> Dict:
term = original_data
term.term_content = parsed.get('content_html', '')
term.term_faq = parsed.get('faq', [])
term.meta_title = parsed.get('meta_title', '')
term.meta_description = parsed.get('meta_description', '')
term.content_status = 'generated'
term.save()
return {'count': 1, 'items_updated': [term.id]}
```
Register in `igny8_core/ai/registry.py`:
```python
register_lazy_function('generate_term_content', lambda: GenerateTermContentFunction)
```
### New Service
```python
# igny8_core/business/content/cluster_mapping_service.py
class ClusterMappingService:
"""Maps taxonomy terms to keyword clusters using multi-factor scoring."""
KEYWORD_OVERLAP_WEIGHT = 0.4
SEMANTIC_SIMILARITY_WEIGHT = 0.4
TITLE_MATCH_WEIGHT = 0.2
AUTO_MAP_THRESHOLD = 0.6
SUGGEST_THRESHOLD = 0.3
def map_terms_to_clusters(self, site_id: int, account_id: int) -> Dict:
"""
Map all unmapped ContentTaxonomy terms to Clusters for a site.
Returns: {mapped: int, suggested: int, unmapped: int}
"""
pass
def map_single_term(self, term: ContentTaxonomy) -> Dict:
"""
Map a single term. Returns:
{cluster_id, secondary_ids, confidence, status}
"""
pass
def _keyword_overlap_score(self, term_name: str, cluster_keywords: list) -> float:
pass
def _semantic_similarity_score(self, term_name: str, cluster_description: str) -> float:
pass
def _title_match_score(self, term_name: str, cluster_name: str) -> float:
pass
```
### New Celery Tasks
```python
# igny8_core/tasks/taxonomy_tasks.py
@shared_task(bind=True, max_retries=3, default_retry_delay=60)
def sync_taxonomy_from_wordpress(self, site_id: int, account_id: int):
"""Fetch all taxonomy terms from WordPress and reconcile with ContentTaxonomy."""
pass
@shared_task(bind=True, max_retries=3, default_retry_delay=60)
def map_terms_to_clusters(self, site_id: int, account_id: int):
"""Run cluster mapping on all unmapped terms for a site."""
pass
@shared_task(bind=True, max_retries=3, default_retry_delay=60)
def generate_term_content_task(self, term_ids: list, account_id: int):
"""Generate content for a batch of taxonomy terms."""
pass
@shared_task(bind=True, max_retries=3, default_retry_delay=60)
def push_term_content_to_wordpress(self, term_id: int, account_id: int):
"""Push generated term content to WordPress via REST API."""
pass
```
### Migration
```
igny8_core/migrations/XXXX_taxonomy_term_content.py
```
Fields added to `ContentTaxonomy`:
1. `cluster` — ForeignKey to Clusters (nullable)
2. `secondary_cluster_ids` — JSONField
3. `mapping_confidence` — FloatField
4. `mapping_status` — CharField
5. `term_content` — TextField
6. `term_faq` — JSONField
7. `meta_title` — CharField
8. `meta_description` — TextField
9. `content_status` — CharField
10. `parent_term` — ForeignKey to self (nullable)
11. `term_count` — IntegerField
12. `last_synced_from_wp` — DateTimeField (nullable)
13. `last_pushed_to_wp` — DateTimeField (nullable)
### API Endpoints
```
# Taxonomy Term Management
GET /api/v1/writer/taxonomy/terms/ # List terms with mapping status (filterable)
GET /api/v1/writer/taxonomy/terms/{id}/ # Term detail
GET /api/v1/writer/taxonomy/terms/unmapped/ # Terms needing cluster assignment
GET /api/v1/writer/taxonomy/terms/stats/ # Summary: mapped/unmapped/generated/published counts
# WordPress Sync
POST /api/v1/writer/taxonomy/terms/sync/ # Trigger WP → IGNY8 sync
GET /api/v1/writer/taxonomy/terms/sync/status/ # Last sync time + status
# Cluster Mapping
POST /api/v1/writer/taxonomy/terms/{id}/map-cluster/ # Manual cluster assignment
POST /api/v1/writer/taxonomy/terms/auto-map/ # Run auto-mapping for all unmapped terms
GET /api/v1/writer/taxonomy/terms/{id}/cluster-suggestions/ # Get AI cluster suggestions for a term
# Content Generation
POST /api/v1/writer/taxonomy/terms/create-tasks/ # Bulk create generation tasks for selected terms
POST /api/v1/writer/taxonomy/terms/{id}/generate/ # Generate content for single term
POST /api/v1/writer/taxonomy/terms/generate-bulk/ # Generate content for multiple terms
# Publishing to WordPress
POST /api/v1/writer/taxonomy/terms/{id}/publish/ # Push single term content to WP
POST /api/v1/writer/taxonomy/terms/publish-bulk/ # Push multiple terms to WP
```
**ViewSet:**
```python
# igny8_core/modules/writer/views/taxonomy_term_views.py
class TaxonomyTermViewSet(SiteSectorModelViewSet):
serializer_class = TaxonomyTermSerializer
queryset = ContentTaxonomy.objects.all()
filterset_fields = ['taxonomy_type', 'mapping_status', 'content_status', 'site']
@action(detail=False, methods=['get'])
def unmapped(self, request):
qs = self.get_queryset().filter(mapping_status='unmapped')
return self.paginate_and_respond(qs)
@action(detail=False, methods=['get'])
def stats(self, request):
site_id = request.query_params.get('site_id')
qs = self.get_queryset().filter(site_id=site_id)
return Response({
'total': qs.count(),
'mapped': qs.filter(mapping_status__in=['auto_mapped', 'manual_mapped']).count(),
'suggested': qs.filter(mapping_status='suggested').count(),
'unmapped': qs.filter(mapping_status='unmapped').count(),
'content_generated': qs.filter(content_status='generated').count(),
'content_published': qs.filter(content_status='published').count(),
})
@action(detail=False, methods=['post'])
def sync(self, request):
site_id = request.data.get('site_id')
sync_taxonomy_from_wordpress.delay(site_id, request.account.id)
return Response({'message': 'Taxonomy sync started'})
@action(detail=True, methods=['post'], url_path='map-cluster')
def map_cluster(self, request, pk=None):
term = self.get_object()
cluster_id = request.data.get('cluster_id')
term.cluster_id = cluster_id
term.mapping_status = 'manual_mapped'
term.mapping_confidence = 1.0
term.save()
return Response(TaxonomyTermSerializer(term).data)
@action(detail=False, methods=['post'], url_path='auto-map')
def auto_map(self, request):
site_id = request.data.get('site_id')
map_terms_to_clusters.delay(site_id, request.account.id)
return Response({'message': 'Auto-mapping started'})
@action(detail=True, methods=['get'], url_path='cluster-suggestions')
def cluster_suggestions(self, request, pk=None):
term = self.get_object()
service = ClusterMappingService()
suggestions = service.get_suggestions(term, top_n=5)
return Response({'suggestions': suggestions})
@action(detail=True, methods=['post'])
def generate(self, request, pk=None):
term = self.get_object()
generate_term_content_task.delay([term.id], request.account.id)
return Response({'message': 'Content generation started'})
@action(detail=True, methods=['post'])
def publish(self, request, pk=None):
term = self.get_object()
push_term_content_to_wordpress.delay(term.id, request.account.id)
return Response({'message': 'Publishing to WordPress started'})
```
**URL Registration:**
```python
# igny8_core/modules/writer/urls.py — add to existing router
router.register('taxonomy/terms', TaxonomyTermViewSet, basename='taxonomy-term')
```
### Credit Costs
| Operation | Credits | Via |
|-----------|---------|-----|
| Taxonomy sync (WordPress → IGNY8) | 1 per batch | CreditCostConfig: `taxonomy_sync` |
| Term content generation | 46 per term | CreditCostConfig: `term_content_generation` |
| Term content optimization | 35 per term | CreditCostConfig: `term_content_optimization` |
Add to `CreditCostConfig`:
```python
CreditCostConfig.objects.get_or_create(
operation_type='taxonomy_sync',
defaults={'display_name': 'Taxonomy Sync', 'base_credits': 1}
)
CreditCostConfig.objects.get_or_create(
operation_type='term_content_generation',
defaults={'display_name': 'Term Content Generation', 'base_credits': 5}
)
```
Add to `CreditUsageLog.OPERATION_TYPE_CHOICES`:
```python
('taxonomy_sync', 'Taxonomy Sync'),
('term_content_generation', 'Term Content Generation'),
```
---
## 4. IMPLEMENTATION STEPS
### Step 1: Add Fields to ContentTaxonomy
File to modify:
- `backend/igny8_core/business/content/models.py` (or wherever ContentTaxonomy is defined)
- Add all 13 new fields listed in migration section
### Step 2: Create and Run Migration
```bash
cd /data/app/igny8/backend
python manage.py makemigrations --name taxonomy_term_content
python manage.py migrate
```
### Step 3: Build ClusterMappingService
File to create:
- `backend/igny8_core/business/content/cluster_mapping_service.py`
### Step 4: Create GenerateTermContentFunction
File to create:
- `backend/igny8_core/ai/functions/generate_term_content.py`
Register in:
- `backend/igny8_core/ai/registry.py`
### Step 5: Create Celery Tasks
File to create:
- `backend/igny8_core/tasks/taxonomy_tasks.py`
Register in Celery beat schedule (optional — these are primarily on-demand):
- `sync_taxonomy_from_wordpress` — can be periodic (weekly) or on-demand
### Step 6: Add Credit Cost Entries
Add `taxonomy_sync` and `term_content_generation` to:
- `CreditCostConfig` seed data
- `CreditUsageLog.OPERATION_TYPE_CHOICES`
### Step 7: Build Serializers
File to create:
- `backend/igny8_core/modules/writer/serializers/taxonomy_term_serializer.py`
### Step 8: Build ViewSet and URLs
File to create:
- `backend/igny8_core/modules/writer/views/taxonomy_term_views.py`
Modify:
- `backend/igny8_core/modules/writer/urls.py`
### Step 9: Frontend
Files to create/modify in `frontend/src/`:
- `pages/Writer/TaxonomyTerms.tsx` — term list with mapping status indicators
- `pages/Writer/TaxonomyTermDetail.tsx` — term detail with generated content preview
- `components/Writer/ClusterMappingPanel.tsx` — cluster assignment/suggestion UI
- `stores/taxonomyTermStore.ts` — Zustand store
- `api/taxonomyTerms.ts` — API client
### Step 10: Tests
```bash
cd /data/app/igny8/backend
python manage.py test igny8_core.business.content.tests.test_cluster_mapping
python manage.py test igny8_core.ai.tests.test_generate_term_content
python manage.py test igny8_core.modules.writer.tests.test_taxonomy_term_views
```
---
## 5. ACCEPTANCE CRITERIA
- [ ] All 13 new fields on ContentTaxonomy migrate successfully
- [ ] `GenerateTermContentFunction` registered in AI function registry
- [ ] WordPress → IGNY8 taxonomy sync fetches categories, tags, WooCommerce taxonomies
- [ ] Sync creates/updates ContentTaxonomy records with correct taxonomy_type
- [ ] Parent/child hierarchy preserved via parent_term FK
- [ ] SyncEvent logged with event_type='metadata_sync' after each sync operation
- [ ] ClusterMappingService maps terms with confidence scores
- [ ] Terms with confidence ≥ 0.6 auto-mapped, 0.30.6 suggested, < 0.3 unmapped
- [ ] Manual cluster assignment sets mapping_status='manual_mapped' with confidence=1.0
- [ ] Term content generation produces: content_html, FAQ, meta_title, meta_description
- [ ] content_status transitions: none → generating → generated → published
- [ ] Publishing pushes content to WordPress via `POST /wp-json/igny8/v1/terms/{id}/content`
- [ ] All API endpoints require authentication and enforce account isolation
- [ ] Frontend term list shows mapping status badges (mapped/suggested/unmapped)
- [ ] Frontend supports manual cluster assignment from suggestion list
- [ ] Credit deduction works for taxonomy_sync and term_content_generation operations
- [ ] Backward compatible — existing ContentTaxonomy records unaffected (new fields nullable/defaulted)
---
## 6. CLAUDE CODE INSTRUCTIONS
### Execution Order
1. Read `backend/igny8_core/business/content/models.py` — find ContentTaxonomy and ContentTaxonomyRelation
2. Read `backend/igny8_core/business/planning/models.py` — understand Clusters model for FK reference
3. Read `backend/igny8_core/ai/functions/generate_content.py` — reference pattern for new AI function
4. Read `backend/igny8_core/ai/registry.py` — understand registration pattern
5. Add fields to ContentTaxonomy model
6. Create migration and run it
7. Build ClusterMappingService
8. Build GenerateTermContentFunction + register it
9. Build Celery tasks
10. Build serializers, ViewSet, URLs
11. Build frontend components
### Key Constraints
- ALL primary keys are `BigAutoField` (integer). No UUIDs.
- Model class names PLURAL: `Clusters`, `Keywords`, `Tasks`, `ContentIdeas`, `Images`. `Content` stays singular. `ContentTaxonomy` stays singular.
- Frontend: `.tsx` files, Zustand stores, Vitest testing
- Celery app name: `igny8_core`
- All new db_tables use `igny8_` prefix
- Follow existing ViewSet pattern: `SiteSectorModelViewSet` for site-scoped resources
- AI functions follow `BaseAIFunction` pattern with lazy registry
### File Tree (New/Modified)
```
backend/igny8_core/
├── business/content/
│ ├── models.py # MODIFY: add fields to ContentTaxonomy
│ └── cluster_mapping_service.py # NEW: ClusterMappingService
├── ai/functions/
│ └── generate_term_content.py # NEW: GenerateTermContentFunction
├── ai/
│ └── registry.py # MODIFY: register generate_term_content
├── tasks/
│ └── taxonomy_tasks.py # NEW: sync, map, generate, publish tasks
├── modules/writer/
│ ├── serializers/
│ │ └── taxonomy_term_serializer.py # NEW
│ ├── views/
│ │ └── taxonomy_term_views.py # NEW
│ └── urls.py # MODIFY: register taxonomy/terms route
├── migrations/
│ └── XXXX_taxonomy_term_content.py # NEW: auto-generated
frontend/src/
├── pages/Writer/
│ ├── TaxonomyTerms.tsx # NEW: term list page
│ └── TaxonomyTermDetail.tsx # NEW: term detail + content preview
├── components/Writer/
│ └── ClusterMappingPanel.tsx # NEW: cluster assignment UI
├── stores/
│ └── taxonomyTermStore.ts # NEW: Zustand store
├── api/
│ └── taxonomyTerms.ts # NEW: API client
```
### Cross-References
- **02A** (content types extension): ContentTypeTemplate for content_type='taxonomy' provides prompt template
- **01A** (SAG data foundation): SAGAttribute → taxonomy mapping context
- **01D** (setup wizard): wizard creates initial taxonomy plan used for cluster mapping
- **03B** (WP plugin connected): connected plugin receives term content via REST endpoint
- **03C** (companion theme): theme renders term landing pages using pushed content