igny8/docs/RATE-LIMITING.md

# Rate Limiting Guide

**Version**: 1.0.0
**Last Updated**: 2025-11-16

Complete guide for understanding and handling rate limits in the IGNY8 API v1.0.

---

## Overview

Rate limiting protects the API from abuse and ensures fair resource usage. Different operation types have different rate limits based on their resource intensity.

---

## Rate Limit Headers

Every API response includes rate limit information in headers:

- `X-Throttle-Limit`: Maximum requests allowed in the time window
- `X-Throttle-Remaining`: Remaining requests in current window
- `X-Throttle-Reset`: Unix timestamp when the limit resets

### Example Response Headers

```http
HTTP/1.1 200 OK
X-Throttle-Limit: 60
X-Throttle-Remaining: 45
X-Throttle-Reset: 1700123456
Content-Type: application/json
```

---

## Rate Limit Scopes

Rate limits are scoped by operation type:

### AI Functions (Expensive Operations)

| Scope | Limit | Endpoints |
|-------|-------|-----------|
| `ai_function` | 10/min | Auto-cluster, content generation |
| `image_gen` | 15/min | Image generation (DALL-E, Runware) |
| `planner_ai` | 10/min | AI-powered planner operations |
| `writer_ai` | 10/min | AI-powered writer operations |

### Content Operations

| Scope | Limit | Endpoints |
|-------|-------|-----------|
| `content_write` | 30/min | Content creation, updates |
| `content_read` | 100/min | Content listing, retrieval |

### Authentication

| Scope | Limit | Endpoints |
|-------|-------|-----------|
| `auth` | 20/min | Login, register, password reset |
| `auth_strict` | 5/min | Sensitive auth operations |

### Planner Operations

| Scope | Limit | Endpoints |
|-------|-------|-----------|
| `planner` | 60/min | Keywords, clusters, ideas CRUD |

### Writer Operations

| Scope | Limit | Endpoints |
|-------|-------|-----------|
| `writer` | 60/min | Tasks, content, images CRUD |

### System Operations

| Scope | Limit | Endpoints |
|-------|-------|-----------|
| `system` | 100/min | Settings, prompts, profiles |
| `system_admin` | 30/min | Admin-only system operations |

### Billing Operations

| Scope | Limit | Endpoints |
|-------|-------|-----------|
| `billing` | 30/min | Credit queries, usage logs |
| `billing_admin` | 10/min | Credit management (admin) |

### Default

| Scope | Limit | Endpoints |
|-------|-------|-----------|
| `default` | 100/min | Endpoints without explicit scope |

---

## Rate Limit Exceeded (429)

When rate limit is exceeded, you receive:

**Status Code**: `429 Too Many Requests`

**Response**:
```json
{
  "success": false,
  "error": "Rate limit exceeded",
  "request_id": "550e8400-e29b-41d4-a716-446655440000"
}
```

**Headers**:
```http
X-Throttle-Limit: 60
X-Throttle-Remaining: 0
X-Throttle-Reset: 1700123456
```

### Handling Rate Limits

**1. Check Headers Before Request**

```python
def make_request(url, headers):
    response = requests.get(url, headers=headers)

    # Check remaining requests
    remaining = int(response.headers.get('X-Throttle-Remaining', 0))

    if remaining < 5:
        # Approaching limit, slow down
        time.sleep(1)

    return response.json()
```

**2. Handle 429 Response**

```python
def make_request_with_backoff(url, headers, max_retries=3):
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)

        if response.status_code == 429:
            # Get reset time
            reset_time = int(response.headers.get('X-Throttle-Reset', 0))
            current_time = int(time.time())
            wait_seconds = max(1, reset_time - current_time)

            print(f"Rate limited. Waiting {wait_seconds} seconds...")
            time.sleep(wait_seconds)
            continue

        return response.json()

    raise Exception("Max retries exceeded")
```

**3. Implement Exponential Backoff**

```python
import time
import random

def make_request_with_exponential_backoff(url, headers):
    max_wait = 60  # Maximum wait time in seconds
    base_wait = 1  # Base wait time in seconds

    for attempt in range(5):
        response = requests.get(url, headers=headers)

        if response.status_code != 429:
            return response.json()

        # Exponential backoff with jitter
        wait_time = min(
            base_wait * (2 ** attempt) + random.uniform(0, 1),
            max_wait
        )

        print(f"Rate limited. Waiting {wait_time:.2f} seconds...")
        time.sleep(wait_time)

    raise Exception("Rate limit exceeded after retries")
```

---

## Best Practices

### 1. Monitor Rate Limit Headers

Always check `X-Throttle-Remaining` to avoid hitting limits:

```python
def check_rate_limit(response):
    remaining = int(response.headers.get('X-Throttle-Remaining', 0))

    if remaining < 10:
        print(f"Warning: Only {remaining} requests remaining")

    return remaining
```

### 2. Implement Request Queuing

For bulk operations, queue requests to stay within limits:

```python
import queue
import threading

class RateLimitedAPI:
    def __init__(self, requests_per_minute=60):
        self.queue = queue.Queue()
        self.requests_per_minute = requests_per_minute
        self.min_interval = 60 / requests_per_minute
        self.last_request_time = 0

    def make_request(self, url, headers):
        # Ensure minimum interval between requests
        elapsed = time.time() - self.last_request_time
        if elapsed < self.min_interval:
            time.sleep(self.min_interval - elapsed)

        response = requests.get(url, headers=headers)
        self.last_request_time = time.time()

        return response.json()
```

### 3. Cache Responses

Cache frequently accessed data to reduce API calls:

```python
from functools import lru_cache
import time

class CachedAPI:
    def __init__(self, cache_ttl=300):  # 5 minutes
        self.cache = {}
        self.cache_ttl = cache_ttl

    def get_cached(self, url, headers, cache_key):
        # Check cache
        if cache_key in self.cache:
            data, timestamp = self.cache[cache_key]
            if time.time() - timestamp < self.cache_ttl:
                return data

        # Fetch from API
        response = requests.get(url, headers=headers)
        data = response.json()

        # Store in cache
        self.cache[cache_key] = (data, time.time())

        return data
```

### 4. Batch Requests When Possible

Use bulk endpoints instead of multiple individual requests:

```python
# ❌ Don't: Multiple individual requests
for keyword_id in keyword_ids:
    response = requests.get(f"/api/v1/planner/keywords/{keyword_id}/", headers=headers)

# ✅ Do: Use bulk endpoint if available
response = requests.post(
    "/api/v1/planner/keywords/bulk/",
    json={"ids": keyword_ids},
    headers=headers
)
```

---

## Rate Limit Bypass

### Development/Debug Mode

Rate limiting is automatically bypassed when:
- `DEBUG=True` in Django settings
- `IGNY8_DEBUG_THROTTLE=True` environment variable
- User belongs to `aws-admin` account
- User has `admin` or `developer` role

**Note**: Headers are still set for debugging, but requests are not blocked.

---

## Monitoring Rate Limits

### Track Usage

```python
class RateLimitMonitor:
    def __init__(self):
        self.usage_by_scope = {}

    def track_request(self, response, scope):
        if scope not in self.usage_by_scope:
            self.usage_by_scope[scope] = {
                'total': 0,
                'limited': 0
            }

        self.usage_by_scope[scope]['total'] += 1

        if response.status_code == 429:
            self.usage_by_scope[scope]['limited'] += 1

        remaining = int(response.headers.get('X-Throttle-Remaining', 0))
        limit = int(response.headers.get('X-Throttle-Limit', 0))

        usage_percent = ((limit - remaining) / limit) * 100

        if usage_percent > 80:
            print(f"Warning: {scope} at {usage_percent:.1f}% capacity")

    def get_report(self):
        return self.usage_by_scope
```

---

## Troubleshooting

### Issue: Frequent 429 Errors

**Causes**:
- Too many requests in short time
- Not checking rate limit headers
- No request throttling implemented

**Solutions**:
1. Implement request throttling
2. Monitor `X-Throttle-Remaining` header
3. Add delays between requests
4. Use bulk endpoints when available

### Issue: Rate Limits Too Restrictive

**Solutions**:
1. Contact support for higher limits (if justified)
2. Optimize requests (cache, batch, reduce frequency)
3. Use development account for testing (bypass enabled)

---

## Code Examples

### Python - Complete Rate Limit Handler

```python
import requests
import time
from datetime import datetime

class RateLimitedClient:
    def __init__(self, base_url, token):
        self.base_url = base_url
        self.headers = {
            'Authorization': f'Bearer {token}',
            'Content-Type': 'application/json'
        }
        self.rate_limits = {}

    def _wait_for_rate_limit(self, scope='default'):
        """Wait if approaching rate limit"""
        if scope in self.rate_limits:
            limit_info = self.rate_limits[scope]
            remaining = limit_info.get('remaining', 0)
            reset_time = limit_info.get('reset_time', 0)

            if remaining < 5:
                wait_time = max(0, reset_time - time.time())
                if wait_time > 0:
                    print(f"Rate limit low. Waiting {wait_time:.1f}s...")
                    time.sleep(wait_time)

    def _update_rate_limit_info(self, response, scope='default'):
        """Update rate limit information from response headers"""
        limit = response.headers.get('X-Throttle-Limit')
        remaining = response.headers.get('X-Throttle-Remaining')
        reset = response.headers.get('X-Throttle-Reset')

        if limit and remaining and reset:
            self.rate_limits[scope] = {
                'limit': int(limit),
                'remaining': int(remaining),
                'reset_time': int(reset)
            }

    def request(self, method, endpoint, scope='default', **kwargs):
        """Make rate-limited request"""
        # Wait if approaching limit
        self._wait_for_rate_limit(scope)

        # Make request
        url = f"{self.base_url}{endpoint}"
        response = requests.request(method, url, headers=self.headers, **kwargs)

        # Update rate limit info
        self._update_rate_limit_info(response, scope)

        # Handle rate limit error
        if response.status_code == 429:
            reset_time = int(response.headers.get('X-Throttle-Reset', 0))
            wait_time = max(1, reset_time - time.time())
            print(f"Rate limited. Waiting {wait_time:.1f}s...")
            time.sleep(wait_time)
            # Retry once
            response = requests.request(method, url, headers=self.headers, **kwargs)
            self._update_rate_limit_info(response, scope)

        return response.json()

    def get(self, endpoint, scope='default'):
        return self.request('GET', endpoint, scope)

    def post(self, endpoint, data, scope='default'):
        return self.request('POST', endpoint, scope, json=data)

# Usage
client = RateLimitedClient("https://api.igny8.com/api/v1", "your_token")

# Make requests with automatic rate limit handling
keywords = client.get("/planner/keywords/", scope="planner")
```

---

**Last Updated**: 2025-11-16
**API Version**: 1.0.0