# UNDER OBSERVATION ## Issue: User Logged Out During Image Prompt Generation (Dec 10, 2025) ### Original Problem User performed workflow: auto-cluster → generate ideas → queue to writer → generate content → generate image prompt. During image prompt generation (near completion), user was automatically logged out. ### Investigation Timeline **Initial Analysis:** - Suspected backend container restarts invalidating sessions - Docker ps showed all containers up 19+ minutes - NO RESTARTS during incident - Backend logs showed: `[IsAuthenticatedAndActive] DENIED: User not authenticated` and `Client error: Authentication credentials were not provided` - Token was not being sent with API requests **Root Cause Identified:** The logout was NOT caused by backend issues or container restarts. It was caused by **frontend state corruption during HMR (Hot Module Reload)** triggered by code changes made to fix an unrelated useLocation() error. **What Actually Happened:** 1. **Commit 5fb3687854d9aadfc5d604470f3712004b23243c** - Already had proper fix for useLocation() error (Suspense outside Routes) 2. **Additional "fixes" applied on Dec 10, 2025:** - Changed `cacheDir: "/tmp/vite-cache"` in vite.config.ts - Moved BrowserRouter above ErrorBoundary in main.tsx - Added `watch.interval: 100` and `fs.strict: false` 3. **These changes triggered:** - Vite cache stored in /tmp got wiped on container operations - Full rebuild with HMR - Component tree restructuring (BrowserRouter position change) - Auth store (Zustand persist) lost state during rapid unmount/remount cycle - Frontend started making API calls WITHOUT Authorization header - Backend correctly rejected unauthenticated requests - Frontend logout() triggered ### Fix Applied **Reverted the problematic changes:** - Removed `cacheDir: "/tmp/vite-cache"` - let Vite use default node_modules/.vite - Restored BrowserRouter position inside ErrorBoundary/ThemeProvider (original structure) - Removed `watch.interval` and `fs.strict` additions **Kept the actual fixes:** - Backend: Removed `IsSystemAccountOrDeveloper` from IntegrationSettingsViewSet class-level permissions - Backend: Auto-cluster `extra_data` → `debug_info` parameter fix - Frontend: Suspense wrapping Routes (from commit 5fb3687) - THIS was the real useLocation() fix ### What to Watch For **1. useLocation() Error After Container Restarts** - **Symptom:** "useLocation() may be used only in the context of a component" - **Where:** Keywords page, other planner/writer module pages (50-60% of pages) - **If it happens:** - Check if Vite cache is stale - Clear node_modules/.vite inside frontend container: `docker compose exec igny8_frontend rm -rf /app/node_modules/.vite` - Restart frontend container - DO NOT change cacheDir or component tree structure **2. Auth State Loss During Development** - **Symptom:** Random logouts during active sessions, "Authentication credentials were not provided" - **Triggers:** - HMR with significant component tree changes - Rapid container restarts during development - Changes to context provider order in main.tsx - **Prevention:** - Avoid restructuring main.tsx component tree - Test auth persistence after any main.tsx changes - Monitor browser console for localStorage errors during HMR **3. Permission Errors for Normal Users** - **Symptom:** "You do not have permission to perform this action" for valid users with complete account setup - **Check:** - Backend logs for permission class debug output: `[IsAuthenticatedAndActive]`, `[IsViewerOrAbove]`, `[HasTenantAccess]` - Verify user has role='owner' and is_active=True - Ensure viewset doesn't have `IsSystemAccountOrDeveloper` at class level for endpoints normal users need **4. Celery Task Progress Polling 403 Errors** - **Symptom:** Task progress endpoint returns 403 for normal users - **Root cause:** ViewSet class-level permissions blocking action-level overrides - **Solution:** Ensure IntegrationSettingsViewSet permission_classes doesn't include IsSystemAccountOrDeveloper ### Lessons Learned 1. **Don't layer fixes on top of fixes** - Identify root cause first 2. **Vite cache location matters** - /tmp gets wiped, breaking HMR state persistence 3. **Component tree structure is fragile** - Moving BrowserRouter breaks auth rehydration timing 4. **Container uptime ≠ code stability** - HMR can cause issues without restart 5. **Permission debugging** - Added logging to permission classes was critical for diagnosis 6. **The original fix was already correct** - Commit 5fb3687 had it right, additional "improvements" broke it ### Files Modified (Reverted) - `frontend/vite.config.ts` - Removed cacheDir and watch config changes - `frontend/src/main.tsx` - Restored original component tree structure ### Files Modified (Kept) - `backend/igny8_core/modules/system/integration_views.py` - Removed IsSystemAccountOrDeveloper - `backend/igny8_core/modules/planner/views.py` - Fixed extra_data → debug_info - `backend/igny8_core/api/permissions.py` - Added debug logging (can be removed later) ### Status **RESOLVED** - Auth state stable, backend permissions correct, useLocation fix preserved. **ADDITIONAL FIX (Dec 10, 2025 - Evening):** 1. **Permission Fix**: Fixed image generation task progress polling 403 errors - Root cause: `IsSystemAccountOrDeveloper` was still in class-level permissions - Solution: Moved to `get_permissions()` method to allow action-level overrides - `task_progress` and `get_image_generation_settings` now accessible to all authenticated users - Save/test operations still restricted to system accounts 2. **System Account Fallback**: Fixed "Image generation settings not found" for normal users - Root cause: IntegrationSettings are account-specific - normal users don't have their own settings - Only super user account (aws-admin) has configured API keys - Solution: Added fallback to system account (aws-admin) settings in `process_image_generation_queue` task - When user's account doesn't have IntegrationSettings, falls back to system account - Allows normal users to use centralized API keys managed by super users - Files modified: `backend/igny8_core/ai/tasks.py` **Monitor for 48 hours** - Watch for any recurrence of useLocation errors or auth issues after container restarts. Test image generation with normal user accounts (paid-2).