Files
igny8/UNDER-OBSERVATION.md
2025-12-10 13:58:13 +00:00

5.6 KiB

UNDER OBSERVATION

Issue: User Logged Out During Image Prompt Generation (Dec 10, 2025)

Original Problem

User performed workflow: auto-cluster → generate ideas → queue to writer → generate content → generate image prompt. During image prompt generation (near completion), user was automatically logged out.

Investigation Timeline

Initial Analysis:

  • Suspected backend container restarts invalidating sessions
  • Docker ps showed all containers up 19+ minutes - NO RESTARTS during incident
  • Backend logs showed: [IsAuthenticatedAndActive] DENIED: User not authenticated and Client error: Authentication credentials were not provided
  • Token was not being sent with API requests

Root Cause Identified: The logout was NOT caused by backend issues or container restarts. It was caused by frontend state corruption during HMR (Hot Module Reload) triggered by code changes made to fix an unrelated useLocation() error.

What Actually Happened:

  1. Commit 5fb3687854 - Already had proper fix for useLocation() error (Suspense outside Routes)

  2. Additional "fixes" applied on Dec 10, 2025:

    • Changed cacheDir: "/tmp/vite-cache" in vite.config.ts
    • Moved BrowserRouter above ErrorBoundary in main.tsx
    • Added watch.interval: 100 and fs.strict: false
  3. These changes triggered:

    • Vite cache stored in /tmp got wiped on container operations
    • Full rebuild with HMR
    • Component tree restructuring (BrowserRouter position change)
    • Auth store (Zustand persist) lost state during rapid unmount/remount cycle
    • Frontend started making API calls WITHOUT Authorization header
    • Backend correctly rejected unauthenticated requests
    • Frontend logout() triggered

Fix Applied

Reverted the problematic changes:

  • Removed cacheDir: "/tmp/vite-cache" - let Vite use default node_modules/.vite
  • Restored BrowserRouter position inside ErrorBoundary/ThemeProvider (original structure)
  • Removed watch.interval and fs.strict additions

Kept the actual fixes:

  • Backend: Removed IsSystemAccountOrDeveloper from IntegrationSettingsViewSet class-level permissions
  • Backend: Auto-cluster extra_datadebug_info parameter fix
  • Frontend: Suspense wrapping Routes (from commit 5fb3687) - THIS was the real useLocation() fix

What to Watch For

1. useLocation() Error After Container Restarts

  • Symptom: "useLocation() may be used only in the context of a component"
  • Where: Keywords page, other planner/writer module pages (50-60% of pages)
  • If it happens:
    • Check if Vite cache is stale
    • Clear node_modules/.vite inside frontend container: docker compose exec igny8_frontend rm -rf /app/node_modules/.vite
    • Restart frontend container
    • DO NOT change cacheDir or component tree structure

2. Auth State Loss During Development

  • Symptom: Random logouts during active sessions, "Authentication credentials were not provided"
  • Triggers:
    • HMR with significant component tree changes
    • Rapid container restarts during development
    • Changes to context provider order in main.tsx
  • Prevention:
    • Avoid restructuring main.tsx component tree
    • Test auth persistence after any main.tsx changes
    • Monitor browser console for localStorage errors during HMR

3. Permission Errors for Normal Users

  • Symptom: "You do not have permission to perform this action" for valid users with complete account setup
  • Check:
    • Backend logs for permission class debug output: [IsAuthenticatedAndActive], [IsViewerOrAbove], [HasTenantAccess]
    • Verify user has role='owner' and is_active=True
    • Ensure viewset doesn't have IsSystemAccountOrDeveloper at class level for endpoints normal users need

4. Celery Task Progress Polling 403 Errors

  • Symptom: Task progress endpoint returns 403 for normal users
  • Root cause: ViewSet class-level permissions blocking action-level overrides
  • Solution: Ensure IntegrationSettingsViewSet permission_classes doesn't include IsSystemAccountOrDeveloper

Lessons Learned

  1. Don't layer fixes on top of fixes - Identify root cause first
  2. Vite cache location matters - /tmp gets wiped, breaking HMR state persistence
  3. Component tree structure is fragile - Moving BrowserRouter breaks auth rehydration timing
  4. Container uptime ≠ code stability - HMR can cause issues without restart
  5. Permission debugging - Added logging to permission classes was critical for diagnosis
  6. The original fix was already correct - Commit 5fb3687 had it right, additional "improvements" broke it

Files Modified (Reverted)

  • frontend/vite.config.ts - Removed cacheDir and watch config changes
  • frontend/src/main.tsx - Restored original component tree structure

Files Modified (Kept)

  • backend/igny8_core/modules/system/integration_views.py - Removed IsSystemAccountOrDeveloper
  • backend/igny8_core/modules/planner/views.py - Fixed extra_data → debug_info
  • backend/igny8_core/api/permissions.py - Added debug logging (can be removed later)

Status

RESOLVED - Auth state stable, backend permissions correct, useLocation fix preserved.

ADDITIONAL FIX (Dec 10, 2025 - Evening):

  • Fixed image generation task progress polling 403 errors
  • Root cause: IsSystemAccountOrDeveloper was still in class-level permissions
  • Solution: Moved to get_permissions() method to allow action-level overrides
  • task_progress and get_image_generation_settings now accessible to all authenticated users
  • Save/test operations still restricted to system accounts

Monitor for 48 hours - Watch for any recurrence of useLocation errors or auth issues after container restarts.