- Simplified HasTenantAccess permission logic to ensure every authenticated user has an account. - Added fallback to system account for OpenAI settings in AI configuration. - Allowed any authenticated user to check task progress in IntegrationSettingsViewSet. - Created a script to identify and fix orphaned users without accounts. - Updated error response handling in business endpoints for clarity.
104 lines
5.2 KiB
Markdown
104 lines
5.2 KiB
Markdown
# UNDER OBSERVATION
|
|
|
|
## Issue: User Logged Out During Image Prompt Generation (Dec 10, 2025)
|
|
|
|
### Original Problem
|
|
User performed workflow: auto-cluster → generate ideas → queue to writer → generate content → generate image prompt. During image prompt generation (near completion), user was automatically logged out.
|
|
|
|
### Investigation Timeline
|
|
|
|
**Initial Analysis:**
|
|
- Suspected backend container restarts invalidating sessions
|
|
- Docker ps showed all containers up 19+ minutes - NO RESTARTS during incident
|
|
- Backend logs showed: `[IsAuthenticatedAndActive] DENIED: User not authenticated` and `Client error: Authentication credentials were not provided`
|
|
- Token was not being sent with API requests
|
|
|
|
**Root Cause Identified:**
|
|
The logout was NOT caused by backend issues or container restarts. It was caused by **frontend state corruption during HMR (Hot Module Reload)** triggered by code changes made to fix an unrelated useLocation() error.
|
|
|
|
**What Actually Happened:**
|
|
|
|
1. **Commit 5fb3687854d9aadfc5d604470f3712004b23243c** - Already had proper fix for useLocation() error (Suspense outside Routes)
|
|
|
|
2. **Additional "fixes" applied on Dec 10, 2025:**
|
|
- Changed `cacheDir: "/tmp/vite-cache"` in vite.config.ts
|
|
- Moved BrowserRouter above ErrorBoundary in main.tsx
|
|
- Added `watch.interval: 100` and `fs.strict: false`
|
|
|
|
3. **These changes triggered:**
|
|
- Vite cache stored in /tmp got wiped on container operations
|
|
- Full rebuild with HMR
|
|
- Component tree restructuring (BrowserRouter position change)
|
|
- Auth store (Zustand persist) lost state during rapid unmount/remount cycle
|
|
- Frontend started making API calls WITHOUT Authorization header
|
|
- Backend correctly rejected unauthenticated requests
|
|
- Frontend logout() triggered
|
|
|
|
### Fix Applied
|
|
**Reverted the problematic changes:**
|
|
- Removed `cacheDir: "/tmp/vite-cache"` - let Vite use default node_modules/.vite
|
|
- Restored BrowserRouter position inside ErrorBoundary/ThemeProvider (original structure)
|
|
- Removed `watch.interval` and `fs.strict` additions
|
|
|
|
**Kept the actual fixes:**
|
|
- Backend: Removed `IsSystemAccountOrDeveloper` from IntegrationSettingsViewSet class-level permissions
|
|
- Backend: Auto-cluster `extra_data` → `debug_info` parameter fix
|
|
- Frontend: Suspense wrapping Routes (from commit 5fb3687) - THIS was the real useLocation() fix
|
|
|
|
### What to Watch For
|
|
|
|
**1. useLocation() Error After Container Restarts**
|
|
- **Symptom:** "useLocation() may be used only in the context of a <Router> component"
|
|
- **Where:** Keywords page, other planner/writer module pages (50-60% of pages)
|
|
- **If it happens:**
|
|
- Check if Vite cache is stale
|
|
- Clear node_modules/.vite inside frontend container: `docker compose exec igny8_frontend rm -rf /app/node_modules/.vite`
|
|
- Restart frontend container
|
|
- DO NOT change cacheDir or component tree structure
|
|
|
|
**2. Auth State Loss During Development**
|
|
- **Symptom:** Random logouts during active sessions, "Authentication credentials were not provided"
|
|
- **Triggers:**
|
|
- HMR with significant component tree changes
|
|
- Rapid container restarts during development
|
|
- Changes to context provider order in main.tsx
|
|
- **Prevention:**
|
|
- Avoid restructuring main.tsx component tree
|
|
- Test auth persistence after any main.tsx changes
|
|
- Monitor browser console for localStorage errors during HMR
|
|
|
|
**3. Permission Errors for Normal Users**
|
|
- **Symptom:** "You do not have permission to perform this action" for valid users with complete account setup
|
|
- **Check:**
|
|
- Backend logs for permission class debug output: `[IsAuthenticatedAndActive]`, `[IsViewerOrAbove]`, `[HasTenantAccess]`
|
|
- Verify user has role='owner' and is_active=True
|
|
- Ensure viewset doesn't have `IsSystemAccountOrDeveloper` at class level for endpoints normal users need
|
|
|
|
**4. Celery Task Progress Polling 403 Errors**
|
|
- **Symptom:** Task progress endpoint returns 403 for normal users
|
|
- **Root cause:** ViewSet class-level permissions blocking action-level overrides
|
|
- **Solution:** Ensure IntegrationSettingsViewSet permission_classes doesn't include IsSystemAccountOrDeveloper
|
|
|
|
### Lessons Learned
|
|
|
|
1. **Don't layer fixes on top of fixes** - Identify root cause first
|
|
2. **Vite cache location matters** - /tmp gets wiped, breaking HMR state persistence
|
|
3. **Component tree structure is fragile** - Moving BrowserRouter breaks auth rehydration timing
|
|
4. **Container uptime ≠ code stability** - HMR can cause issues without restart
|
|
5. **Permission debugging** - Added logging to permission classes was critical for diagnosis
|
|
6. **The original fix was already correct** - Commit 5fb3687 had it right, additional "improvements" broke it
|
|
|
|
### Files Modified (Reverted)
|
|
- `frontend/vite.config.ts` - Removed cacheDir and watch config changes
|
|
- `frontend/src/main.tsx` - Restored original component tree structure
|
|
|
|
### Files Modified (Kept)
|
|
- `backend/igny8_core/modules/system/integration_views.py` - Removed IsSystemAccountOrDeveloper
|
|
- `backend/igny8_core/modules/planner/views.py` - Fixed extra_data → debug_info
|
|
- `backend/igny8_core/api/permissions.py` - Added debug logging (can be removed later)
|
|
|
|
### Status
|
|
**RESOLVED** - Auth state stable, backend permissions correct, useLocation fix preserved.
|
|
|
|
**Monitor for 48 hours** - Watch for any recurrence of useLocation errors or auth issues after container restarts.
|