Skip to content

API Documentation

DocuElevate provides a powerful REST API for programmatic access to all its features. This document serves as a reference for the available endpoints and their usage.

Looking for a quick way to script against DocuElevate? The built-in CLI tool wraps the API and is ready to use from a terminal or shell script — no HTTP client code required.

API Overview

  • Base URL: http://<your-docuelevate-instance>/api
  • Authentication: OAuth2 (when enabled)
  • Response Format: JSON
  • Rate Limiting: Enabled by default (see Rate Limiting section below)

Interactive API Documentation

The most up-to-date and interactive API documentation is available at:

http://<your-docuelevate-instance>/docs

This Swagger UI provides a complete reference with the ability to try out API calls directly from your browser.

Rate Limiting

DocuElevate implements rate limiting to protect against abuse and DoS attacks. Rate limits are enforced per IP address for unauthenticated requests and per user for authenticated requests.

Default Limits

  • Default endpoints: 100 requests per minute
  • File upload: 600 requests per minute (global) + 20 per user per 60 s (per-user, health-aware)
  • Authentication: 10 requests per minute

Per-user upload rate limiting: Upload endpoints (/api/ui-upload, /api/process-url) enforce a per-user sliding-window limit that adapts to system load. Under heavy queue depth or high CPU usage, the effective limit is reduced automatically. See the Configuration Guide for details.

Note: Document processing endpoints (OCR, metadata extraction) use built-in queue throttling to control processing rates and prevent upstream API overloads. No additional API-level rate limit is applied to processing endpoints.

Rate Limit Headers

When a rate limit is exceeded, the API returns a 429 Too Many Requests response:

{
  "detail": "Rate limit exceeded: 100 per 1 minute"
}

The response includes a Retry-After header indicating when the client can retry the request.

Configuration

Rate limits can be configured via environment variables:

RATE_LIMITING_ENABLED=true
RATE_LIMIT_DEFAULT=100/minute
RATE_LIMIT_UPLOAD=600/minute
RATE_LIMIT_AUTH=10/minute

# Per-user upload rate limiting (health-aware)
UPLOAD_RATE_LIMIT_PER_USER=20   # Max uploads per user per window
UPLOAD_RATE_LIMIT_WINDOW=60     # Sliding window in seconds

See Configuration Guide for more details.

Best Practices

  1. Respect rate limits: Monitor your request rates and implement backoff strategies
  2. Cache responses: Reduce unnecessary API calls by caching responses when appropriate
  3. Batch operations: Use bulk endpoints when available instead of making multiple individual requests
  4. Handle 429 responses: Implement retry logic with exponential backoff when rate limits are exceeded

Example: Handling Rate Limits

import requests
import time

def make_api_request(url, max_retries=3):
    """Make API request with rate limit handling."""
    for attempt in range(max_retries):
        response = requests.get(url)

        if response.status_code == 429:
            # Rate limit exceeded
            retry_after = int(response.headers.get('Retry-After', 60))
            print(f"Rate limit exceeded. Retrying after {retry_after} seconds...")
            time.sleep(retry_after)
            continue

        return response

    raise Exception("Max retries exceeded")

Authentication

When authentication is enabled, you must include an authentication token in your requests.

DocuElevate supports personal API tokens for programmatic access. Tokens are the recommended way to authenticate scripts, CI/CD pipelines, and webhook integrations.

Creating a token:

  1. Log in to DocuElevate and navigate to API Tokens (available in your user menu or at /api-tokens).
  2. Enter a descriptive name (e.g. "CI Pipeline", "Scanner Integration") and click Create Token.
  3. Copy the token immediately — it is shown only once.

Using a token:

curl -X GET "http://<your-docuelevate-instance>/api/files" \
  -H "Authorization: Bearer <your-api-token>"

Managing tokens programmatically:

Method Endpoint Description
POST /api/api-tokens/ Create a new token
GET /api/api-tokens/ List all your tokens
DELETE /api/api-tokens/{id} Revoke (active) or permanently delete (revoked) a token
POST /api/api-tokens/{id}/reactivate Reactivate a revoked token

Session Authentication

Browser-based users authenticate via OAuth or local login. Session cookies are set automatically and used for subsequent requests:

curl -X GET "http://<your-docuelevate-instance>/api/files" \
  -H "Authorization: Bearer <your-token>"

Common Endpoints

Document Upload

Upload from Computer

Upload a file from your computer to DocuElevate for processing.

Endpoint: POST /api/upload

Request:

curl -X POST "http://<your-docuelevate-instance>/api/upload" \
  -H "Authorization: Bearer <your-token>" \
  -F "file=@/path/to/document.pdf"

Response (201 Created):

{
  "task_id": "abc-123-def",
  "status": "queued",
  "message": "File uploaded and queued for processing",
  "filename": "document.pdf"
}

Upload from URL

Download and process a file from a URL. This endpoint is used by the browser extension.

Endpoint: POST /api/process-url

Security Features: - SSRF protection (blocks private IPs, localhost, cloud metadata endpoints) - File type validation (only supported document/image types) - File size limits (enforces maximum upload size) - Timeout protection (prevents hanging on slow/malicious servers)

Request:

curl -X POST "http://<your-docuelevate-instance>/api/process-url" \
  -H "Authorization: Bearer <your-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/document.pdf",
    "filename": "custom-name.pdf"
  }'

Request Body:

{
  "url": "https://example.com/document.pdf",
  "filename": "optional-custom-name.pdf"
}

Response (200 OK):

{
  "task_id": "abc-123-def",
  "status": "queued",
  "message": "File downloaded from URL and queued for processing",
  "filename": "document.pdf",
  "size": 1048576
}

Error Responses:

// 400 Bad Request - Invalid URL or unsupported file type
{
  "detail": "Unsupported file type: text/html. Supported types: PDF, Office documents, images, plain text"
}

// 400 Bad Request - Private IP (SSRF protection)
{
  "detail": "Access to private/internal IP addresses is not allowed for security reasons"
}

// 408 Request Timeout
{
  "detail": "Request timeout: server took too long to respond"
}

// 413 Payload Too Large
{
  "detail": "File too large: 2097152 bytes (max 1048576 bytes)"
}

// 502 Bad Gateway
{
  "detail": "Failed to connect to URL: Connection refused"
}

Usage with Browser Extension:

The DocuElevate browser extension uses this endpoint to send files directly from your browser. See the Browser Extension Guide for installation and usage instructions.

Supported File Types: - Documents: PDF, DOC, DOCX, XLS, XLSX, PPT, PPTX, TXT, CSV, RTF - Images: JPG, PNG, GIF, BMP, TIFF, WebP, SVG

POST /api/ui-upload

Upload a file from your computer for processing.

Request: - Multipart form data with a single file field

Response (new file):

{
  "task_id": "abc-123",
  "status": "queued",
  "original_filename": "invoice.pdf",
  "stored_filename": "a1b2c3d4.pdf"
}

Response (exact duplicate, when ENABLE_DEDUPLICATION=True):

{
  "status": "duplicate",
  "original_filename": "invoice.pdf",
  "stored_filename": "e5f6a7b8.pdf",
  "duplicate_of": {
    "duplicate_type": "exact",
    "original_file_id": 42,
    "original_filename": "invoice.pdf",
    "message": "This file is an exact duplicate of an already-processed document. It has not been queued for processing again."
  }
}

Upload from URL

POST /api/process-url

Download a file from a URL and enqueue it for processing.

Security Features: - SSRF protection: Blocks private IPs, localhost, and cloud metadata endpoints - File type validation: Only allows supported document/image types - File size limits: Enforces maximum upload size - Timeout protection: Prevents hanging on slow/malicious servers

Request Body:

{
  "url": "https://example.com/document.pdf",
  "filename": "my-document.pdf"  // optional
}

Response:

{
  "task_id": "abc123",
  "status": "queued",
  "message": "File downloaded from URL and queued for processing",
  "filename": "document.pdf",
  "size": 1024000
}

Error Responses: - 400: Invalid URL, unsupported file type, or SSRF protection triggered - 408: Request timeout (server too slow) - 413: File too large - 502: Connection error - 404: File not found at URL - 500: Server error

Supported File Types: - PDF documents - Microsoft Office (Word, Excel, PowerPoint) - Images (JPEG, PNG, GIF, BMP, TIFF, WebP, SVG) - Plain text and CSV files

Example:

curl -X POST "http://localhost:8000/api/process-url" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/invoice.pdf",
    "filename": "march-invoice.pdf"
  }'

SSRF Protection: The endpoint blocks access to: - Private IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) - Localhost (127.0.0.1, ::1) - Link-local addresses (169.254.0.0/16) - Cloud metadata endpoints (169.254.169.254, metadata.google.internal)

Get Files

GET /api/files

Retrieve a paginated list of processed files with advanced filtering and sorting.

Query Parameters: - page (optional, default: 1): Page number - per_page (optional, default: 25, max: 200): Items per page - sort_by (optional, default: created_at): Sort field (id, original_filename, file_size, mime_type, created_at) - sort_order (optional, default: desc): Sort order (asc or desc) - search (optional): Search in filename (partial match) - mime_type (optional): Filter by exact MIME type (e.g. application/pdf) - status (optional): Filter by processing status (pending, processing, completed, failed, duplicate) - date_from (optional): Filter files created on or after this date (ISO 8601, e.g. 2026-01-01) - date_to (optional): Filter files created on or before this date (ISO 8601, e.g. 2026-12-31) - storage_provider (optional): Filter by storage provider (e.g. dropbox, s3, google_drive, onedrive, nextcloud) - tags (optional): Filter by tags in AI metadata (comma-separated, AND logic, e.g. invoice,amazon) - ocr_quality (optional): Filter by AI-assessed OCR quality score (poor = score below threshold, good = score at or above threshold, unchecked = not yet assessed). The threshold is configured via TEXT_QUALITY_THRESHOLD (default: 85).

All filters are combinable using AND logic.

Example:

GET /api/files?status=completed&mime_type=application/pdf&tags=invoice&date_from=2026-01-01&sort_by=created_at&sort_order=desc

Response:

{
  "files": [
    {
      "id": 123,
      "original_filename": "invoice.pdf",
      "file_size": 1024000,
      "mime_type": "application/pdf",
      "created_at": "2026-04-15T12:30:45Z",
      "processing_status": {
        "status": "completed",
        "last_step": "send_to_all_destinations",
        "has_errors": false,
        "total_steps": 8
      }
    }
  ],
  "pagination": {
    "page": 1,
    "per_page": 25,
    "total": 150,
    "pages": 6,
    "next": "http://host/api/files?page=2",
    "previous": null
  }
}

Tip: Filter state is reflected in query parameters, making URLs shareable as bookmarks or direct links.

GET /api/search

Search documents by full text across OCR content, titles, filenames, tags, sender, and document type. Powered by Meilisearch.

Query Parameters:

Parameter Type Required Description
q string Yes Full-text search query (1–512 chars)
mime_type string No Filter by MIME type (e.g. application/pdf)
document_type string No Filter by document type (e.g. Invoice)
language string No Filter by language code (e.g. de, en)
tags string No Filter by tag (exact match on a single tag)
sender string No Filter by sender/absender (exact match)
text_quality string No Filter by OCR text quality: no_text, low, medium, high
date_from int No Filter results created after this Unix timestamp
date_to int No Filter results created before this Unix timestamp
page int No Page number, default: 1
per_page int No Results per page (1–100), default: 20

Example:

GET /api/search?q=invoice&document_type=Invoice&tags=amazon&text_quality=high&page=1

Response:

{
  "results": [
    {
      "file_id": 42,
      "original_filename": "2026-01-15_Invoice_Amazon.pdf",
      "document_title": "Amazon Invoice January 2026",
      "document_type": "Invoice",
      "tags": ["amazon", "invoice"],
      "_formatted": {
        "document_title": "Amazon <mark>Invoice</mark> January 2026",
        "ocr_text": "...total amount of the <mark>invoice</mark> is..."
      }
    }
  ],
  "total": 42,
  "page": 1,
  "pages": 3,
  "query": "invoice"
}

Saved Searches

Saved searches allow users to save and reuse filter combinations. Each user can store up to 50 saved searches.

Saved searches are used on both the Files page (for file management filters) and the Search page (for content-finding filters including full-text queries).

List Saved Searches

GET /api/saved-searches

Returns all saved searches for the current user.

Response:

[
  {
    "id": 1,
    "name": "Recent Invoices",
    "filters": {
      "q": "invoice total",
      "tags": "invoice",
      "document_type": "Invoice",
      "date_from": "2026-01-01"
    },
    "created_at": "2026-03-01T10:00:00Z",
    "updated_at": "2026-03-01T10:00:00Z"
  }
]

POST /api/saved-searches

Request Body:

{
  "name": "Recent Invoices",
  "filters": {
    "q": "invoice total",
    "tags": "invoice",
    "document_type": "Invoice",
    "date_from": "2026-01-01"
  }
}

Allowed filter keys:

Files-view keys: search, mime_type, status, storage_provider, sort_by, sort_order

Search-view keys: q, document_type, language, sender, text_quality

Shared keys: tags, date_from, date_to

Response (201 Created): The created saved search object.

PUT /api/saved-searches/{id}

Request Body (all fields optional):

{
  "name": "Updated Name",
  "filters": {
    "tags": "invoice,amazon"
  }
}

Response: The updated saved search object.

DELETE /api/saved-searches/{id}

Response: 204 No Content

File Metadata

GET /api/files/{file_id}/metadata

Retrieve metadata for a specific file.

Response:

{
  "document_type": "invoice",
  "date": "2023-04-10",
  "vendor": "Acme Corp",
  "amount": "$1,234.56",
  "extracted_text": "..."
}

Process Control

POST /api/files/{file_id}/reprocess

Reprocess a specific file. This queues the file for complete reprocessing through the entire pipeline.

Response:

{
  "status": "success",
  "message": "File queued for reprocessing",
  "file_id": 123,
  "filename": "invoice.pdf",
  "task_id": "a1b2c3d4-e5f6-7g8h-9i0j-k1l2m3n4o5p6"
}

Error Responses: - 404: File not found - 400: Local file not found on disk (cannot reprocess)

POST /api/files/{file_id}/reprocess-with-cloud-ocr

Reprocess a specific file with forced Cloud OCR, regardless of embedded text quality. This is useful for documents with low-quality embedded text or when higher quality OCR is needed.

Response:

{
  "status": "success",
  "message": "File queued for Cloud OCR reprocessing",
  "file_id": 123,
  "filename": "invoice.pdf",
  "task_id": "a1b2c3d4-e5f6-7g8h-9i0j-k1l2m3n4o5p6",
  "force_cloud_ocr": true
}

Error Responses: - 404: File not found - 400: Neither original nor local file found on disk (cannot reprocess)

Note: This endpoint forces Azure Document Intelligence OCR processing even if the PDF contains embedded text. The original file (if available) is used for reprocessing to ensure the highest quality result.

Bulk Operations

POST /api/files/bulk-delete

Delete multiple file records in a single request.

Request body: JSON array of file IDs

curl -X POST "http://<your-instance>/api/files/bulk-delete" \
  -H "Content-Type: application/json" \
  -d '[1, 2, 3]'

Response:

{
  "status": "success",
  "message": "Successfully deleted 3 file records",
  "deleted_ids": [1, 2, 3]
}

Error Responses: - 403: File deletion is disabled in configuration - 404: No files found with the provided IDs


POST /api/files/bulk-reprocess

Queue multiple files for full reprocessing.

Request body: JSON array of file IDs

Response:

{
  "status": "success",
  "message": "Successfully queued 2 files for reprocessing",
  "processed_files": [
    {"file_id": 1, "filename": "a.pdf", "task_id": "abc123"},
    {"file_id": 2, "filename": "b.pdf", "task_id": "def456"}
  ],
  "errors": [],
  "task_ids": ["abc123", "def456"]
}

POST /api/files/bulk-reprocess-cloud-ocr

Queue multiple files for reprocessing with forced Cloud OCR (Azure Document Intelligence). Useful for files that have missing or low-quality OCR text.

Request body: JSON array of file IDs

Response:

{
  "status": "success",
  "message": "Successfully queued 2 files for Cloud OCR reprocessing",
  "processed_files": [
    {"file_id": 1, "filename": "a.pdf", "task_id": "abc123"}
  ],
  "errors": [],
  "task_ids": ["abc123"]
}

POST /api/files/bulk-download

Download multiple files as a single ZIP archive. For each file, the processed version is preferred; falls back to the original. Files not found on disk are silently skipped.

Request body: JSON array of file IDs

Response: application/zip stream with Content-Disposition: attachment; filename="docuelevate_bulk_<timestamp>.zip"

curl -X POST "http://<your-instance>/api/files/bulk-download" \
  -H "Content-Type: application/json" \
  -d '[1, 2, 3]' \
  --output bulk_download.zip

Error Responses: - 404: No files found with the provided IDs, or none of the selected files exist on disk

Document Ownership (Multi-User Mode)

These endpoints are available when MULTI_USER_ENABLED=true.


POST /api/files/{file_id}/claim

Claim an unclaimed document (owner_id is NULL) for the current user.

curl -X POST "http://<your-instance>/api/files/42/claim"

Response:

{
  "status": "success",
  "message": "Document claimed successfully",
  "file_id": 42,
  "owner_id": "alice@example.com"
}

Error Responses: - 400: Multi-user mode is not enabled - 401: Authentication required - 403: Document is already owned by another user


POST /api/files/bulk-claim

Claim multiple unclaimed documents at once. Already-owned documents are skipped.

Request body: JSON array of file IDs

curl -X POST "http://<your-instance>/api/files/bulk-claim" \
  -H "Content-Type: application/json" \
  -d '[1, 2, 3]'

Response:

{
  "status": "success",
  "claimed_count": 2,
  "claimed_ids": [1, 3],
  "skipped": [{"file_id": 2, "reason": "already owned"}],
  "owner_id": "alice@example.com"
}

POST /api/files/assign-owner

Admin only. Assign an owner to documents. If file_ids body is omitted, assigns all currently unclaimed documents to the specified owner.

Query Parameters: - owner_id (required): The user identifier to assign

Request body (optional): JSON array of specific file IDs

# Assign all unclaimed documents to a user
curl -X POST "http://<your-instance>/api/files/assign-owner?owner_id=alice@example.com"

# Assign specific files
curl -X POST "http://<your-instance>/api/files/assign-owner?owner_id=alice@example.com" \
  -H "Content-Type: application/json" \
  -d '[1, 2, 3]'

Response:

{
  "status": "success",
  "message": "Assigned owner to 5 document(s)",
  "updated_count": 5,
  "owner_id": "alice@example.com"
}

Error Responses: - 400: Multi-user mode is not enabled - 403: Only admins can assign document owners


GET /api/users/search

Search known user identifiers from existing documents. Powers the autocomplete widget in the settings page for the DEFAULT_OWNER_ID field.

Query Parameters: - q (optional): Substring to match against known owner IDs (case-insensitive) - limit (optional): Maximum results to return (default: 5, max: 20)

curl "http://<your-instance>/api/users/search?q=risti&limit=5"

Response:

{
  "users": ["christianlouis"]
}

Admin User Management

Admin only. These endpoints let administrators list all known users, view per-user statistics, and manage per-user settings such as custom upload limits, display names, and blocked status.


GET /api/admin/users/

List all known users — anyone who has uploaded a document or has an explicit profile. Returns aggregate document statistics merged with profile data.

Query Parameters: - q (optional): Substring filter on user ID (case-insensitive) - page (optional): Page number (default: 1) - per_page (optional): Items per page (default: 25, max: 100)

curl "http://<your-instance>/api/admin/users/" \
  -H "Cookie: session=<admin-session>"

Response:

{
  "users": [
    {
      "user_id": "alice@example.com",
      "display_name": "Alice Smith",
      "daily_upload_limit": 50,
      "notes": null,
      "is_blocked": false,
      "profile_id": 1,
      "document_count": 42,
      "last_upload": "2026-02-15T10:23:00"
    }
  ],
  "total": 1,
  "page": 1,
  "per_page": 25,
  "pages": 1
}

GET /api/admin/users/{user_id}

Return profile and document statistics for a specific user.

curl "http://<your-instance>/api/admin/users/alice%40example.com"

PUT /api/admin/users/{user_id}

Create or update the admin-managed profile for a user. If no profile exists one is created.

Request body:

{
  "display_name": "Alice Smith",
  "daily_upload_limit": 50,
  "notes": "VIP customer",
  "is_blocked": false
}
  • display_name (optional): Human-readable name shown in the admin UI
  • daily_upload_limit (optional): Per-user daily cap; null = use global default; 0 = unlimited
  • notes (optional): Admin-only text notes
  • is_blocked: When true, blocks new uploads from this user

DELETE /api/admin/users/{user_id}

Delete the admin-managed profile for a user. Documents owned by the user are not removed. Returns 204 No Content on success, 404 if no profile exists.


POST /api/admin/users/{user_id}/payment-issue

Report a payment issue for a user. Sends an admin notification (via configured Apprise channels) and fires a user.payment_issue webhook event. Use this endpoint when a payment processor (e.g. Stripe, PayPal) sends a failed-charge notification or when a manual billing review identifies a problem.

Request body:

{
  "issue": "Card declined: insufficient funds"
}
  • issue (required): Human-readable description of the payment problem (1–2048 characters)

Response (200):

{
  "acknowledged": true,
  "user_id": "alice@example.com",
  "profile": { ... }
}

Error Responses: - 404: User profile not found - 403: Admin access required - 422: Validation error (e.g. empty issue string)


GET /api/admin/users/local

List all local (email/password) user accounts with basic metadata.


POST /api/admin/users/local

Create a new local user account (admin-only, immediately active — no email verification required).

Request body:

{
  "email": "user@example.com",
  "username": "alice",
  "display_name": "Alice Smith",
  "password": "securepassword",
  "is_admin": false
}

PATCH /api/admin/users/local/{local_user_id}

Update an existing local user account. Only the provided (non-null) fields are modified. If the email is changed, the associated UserProfile.user_id is also updated automatically.

Request body (all fields optional):

{
  "email": "newemail@example.com",
  "display_name": "Alice Wonderland",
  "is_admin": true,
  "is_active": false
}

Error Responses: - 404: Local user not found - 409: New email already taken by another account


POST /api/admin/users/local/{local_user_id}/send-password-reset

Send a password reset email to a local user on their behalf. Useful when a user is locked out. Returns {"sent": true} on success or {"sent": false, "reason": "..."} when SMTP is not configured or sending fails (never returns an error status so the admin always gets feedback).

Error Responses: - 404: Local user not found


POST /api/admin/users/local/{local_user_id}/set-password

Directly set a new password for a local user without requiring an email token (last resort when email delivery is unavailable). The user should be advised to change their password after logging in.

Request body:

{
  "password": "temporarypassword"
}

Error Responses: - 404: Local user not found - 422: Password shorter than 8 characters


DELETE /api/admin/users/local/{local_user_id}

Delete a local user account by numeric ID. The associated UserProfile is also removed. Documents owned by this user are not deleted. Returns 204 No Content on success.


Local Authentication (self-service)

These endpoints are for local (email/password) users and do not require authentication.

POST /api/auth/request-password-reset

Send a password reset email. Always returns 200 to avoid leaking whether an email is registered.

Request body:

{ "email": "user@example.com" }

POST /api/auth/reset-password

Set a new password using a valid reset token (received via email).

Request body:

{
  "token": "the-token-from-email",
  "new_password": "newpassword",
  "new_password_confirm": "newpassword"
}

Error Responses: - 400: Token is invalid or expired - 422: Passwords do not match


POST /api/auth/forgot-username

Send a username reminder email. Always returns 200 to avoid leaking whether an email is registered.

Request body:

{ "email": "user@example.com" }

Settings Suggestions (Autocomplete)

GET /api/settings/{key}/suggestions

Return dynamic autocomplete suggestions for a setting. Providers attempt to resolve values from cloud SDKs or installed tools and fall back to curated static lists when unavailable.

Supported keys: aws_region, azure_region, tesseract_language, easyocr_languages, embedding_model

Query Parameters: - q (optional): Substring to filter suggestions (case-insensitive) - limit (optional): Maximum results to return (default: 10, max: 50)

curl "http://<your-instance>/api/settings/aws_region/suggestions?q=east&limit=5"

Response:

{
  "key": "aws_region",
  "suggestions": ["ap-east-1", "ap-northeast-1", "ap-southeast-1", "us-east-1", "us-east-2"]
}

Error Responses: - 404: No suggestion provider registered for the given key

File Preview

GET /api/files/{file_id}/preview

Retrieve the file content for preview purposes.

Parameters: - version (required): Either original or processed - original: Returns the immutable original file from the original directory - processed: Returns the file after metadata embedding from the processed directory

Response: Returns the file content with appropriate MIME type for browser display.

Example:

# Preview original file
curl "http://<your-instance>/api/files/123/preview?version=original"

# Preview processed file
curl "http://<your-instance>/api/files/123/preview?version=processed"

Error Responses: - 404: File not found in database or on disk - 400: Invalid version parameter

File Download

GET /api/files/{file_id}/download

Download a file as an attachment. The Content-Disposition header is set to attachment with the original filename so the browser prompts a save dialog.

Parameters: - version (optional, default: processed): Either processed or original - processed (default): Downloads the post-processing file (with embedded metadata) - original: Downloads the raw file as originally uploaded

Response: File content with Content-Disposition: attachment; filename="<original_filename>".

Example:

# Download processed file (default)
curl -OJ "http://<your-instance>/api/files/123/download"

# Download original upload
curl -OJ "http://<your-instance>/api/files/123/download?version=original"

Error Responses: - 404: File not found in database or on disk - 400: Invalid version parameter (must be processed or original)

Similar Documents

GET /api/files/{file_id}/similar

Find documents similar to the specified file using pre-computed text embeddings and cosine similarity. Similarity scores range from 0 (completely different) to 1 (identical content). Embeddings are computed automatically during document ingestion and cached in the database.

Parameters: - limit (optional, default: 5, max: 20): Maximum number of similar documents to return - threshold (optional, default: 0.3, range: 0.0–1.0): Minimum similarity score to include

Response:

{
  "file_id": 42,
  "similar_documents": [
    {
      "file_id": 15,
      "original_filename": "Invoice_2026-01.pdf",
      "document_title": "January Invoice",
      "similarity_score": 0.8934,
      "mime_type": "application/pdf",
      "created_at": "2026-01-15T10:30:00+00:00"
    }
  ],
  "count": 1
}

Example:

# Find top 5 similar documents
curl "http://<your-instance>/api/files/42/similar"

# Find top 10 documents with at least 50% similarity
curl "http://<your-instance>/api/files/42/similar?limit=10&threshold=0.5"

Error Responses: - 404: File not found - 422: Invalid query parameters (limit or threshold out of range) - 500: Internal error

Note: Only pre-computed embeddings are used — no API calls are made during the query. If a file's embedding has not been computed yet, the response includes a message field explaining this. Documents without OCR text are excluded from similarity comparisons.

Similarity Pairs (Corpus-Wide)

GET /api/similarity/pairs

Scan the entire document corpus for pairs of highly similar documents, ranked by score. Unlike the per-file /files/{id}/similar endpoint, this discovers all matching pairs across all files.

Parameters: - threshold (optional, default: 0.7, range: 0.0–1.0): Minimum similarity score for a pair - limit (optional, default: 50, max: 200): Maximum pairs per page - page (optional, default: 1): Page number

Response:

{
  "pairs": [
    {
      "file_a": {
        "file_id": 1,
        "original_filename": "invoice_jan.pdf",
        "document_title": "January Invoice",
        "mime_type": "application/pdf",
        "created_at": "2026-01-15T10:30:00+00:00"
      },
      "file_b": {
        "file_id": 5,
        "original_filename": "invoice_feb.pdf",
        "document_title": "February Invoice",
        "mime_type": "application/pdf",
        "created_at": "2026-02-15T10:30:00+00:00"
      },
      "similarity_score": 0.94
    }
  ],
  "total_pairs": 12,
  "threshold": 0.7,
  "page": 1,
  "pages": 1,
  "per_page": 50,
  "embedding_coverage": {
    "total_files": 120,
    "files_with_embedding": 95
  }
}

Example:

# Find all document pairs above 90% similarity
curl "http://<your-instance>/api/similarity/pairs?threshold=0.9"

Embedding Diagnostics

GET /api/files/{file_id}/embedding-status

Check the embedding status for a specific file: whether OCR text is available, whether an embedding has been computed, and how many dimensions it has.

curl "http://<your-instance>/api/files/42/embedding-status"

POST /api/files/{file_id}/compute-embedding

Manually trigger embedding computation for a single file. Useful for debugging or re-computing after configuration changes. Requires OCR text to be available.

curl -X POST "http://<your-instance>/api/files/42/compute-embedding"

GET /api/diagnostic/embeddings

Get an overview of embedding coverage across all files: total files, how many have OCR text, how many have embeddings, and per-file status.

curl "http://<your-instance>/api/diagnostic/embeddings"

POST /api/diagnostic/compute-all-embeddings

Queue embedding computation for all files that have OCR text but no embedding yet. Each file is processed as a separate background task.

curl -X POST "http://<your-instance>/api/diagnostic/compute-all-embeddings"

Batch Processing

POST /api/processall

Process all PDF files in the configured workdir directory.

Throttling: For large batches (>20 files by default), tasks are automatically staggered to prevent overwhelming downstream APIs. The throttling behavior can be configured via environment variables:

  • PROCESSALL_THROTTLE_THRESHOLD: Number of files above which throttling is applied (default: 20)
  • PROCESSALL_THROTTLE_DELAY: Delay in seconds between each task submission when throttling (default: 3)

Example: When processing 25 files with default settings, the first file is queued immediately, the second after 3 seconds, the third after 6 seconds, etc., spreading the load over 72 seconds total.

Response:

{
  "message": "Enqueued 25 PDFs for processing (throttled over 72 seconds)",
  "pdf_files": ["file1.pdf", "file2.pdf", ...],
  "task_ids": ["a1b2c3...", "d4e5f6...", ...],
  "throttled": true
}

POST /send_to_google_drive/

Send a processed file to Google Drive.

Parameters: - file_path: Path to the file to upload

Response:

{
  "task_id": "a1b2c3d4-e5f6-7g8h-9i0j-k1l2m3n4o5p6",
  "status": "queued"
}

Integrations

Manage per-user integrations (sources and destinations). All endpoints require authentication and are scoped to the current user's integrations. Subscription-tier quota enforcement is applied on creation.

Quota Enforcement

When creating an integration, the API checks the user's subscription tier:

Tier Storage Destinations IMAP Sources
Free 1 0
Starter 2 1
Professional 5 3
Power 10 Unlimited

Exceeding a quota returns HTTP 403 with a descriptive error message.

GET /api/integrations/

List all integrations for the current user. Supports optional query-string filters.

Query Parameters:

Parameter Type Description
direction string Filter by SOURCE or DESTINATION
integration_type string Filter by type (e.g. IMAP, S3, DROPBOX)

Response (200):

[
  {
    "id": 1,
    "owner_id": "user@example.com",
    "direction": "DESTINATION",
    "integration_type": "S3",
    "name": "Archive Bucket",
    "config": {"bucket": "my-bucket", "region": "us-east-1"},
    "has_credentials": true,
    "is_active": true,
    "last_used_at": null,
    "last_error": null,
    "created_at": "2025-01-01T00:00:00",
    "updated_at": "2025-01-01T00:00:00"
  }
]

POST /api/integrations/

Create a new integration. Quota is enforced before creation.

Request:

{
  "direction": "DESTINATION",
  "integration_type": "S3",
  "name": "Archive Bucket",
  "config": {"bucket": "my-bucket", "region": "us-east-1"},
  "credentials": {"access_key_id": "AKIA...", "secret_access_key": "..."},
  "is_active": true
}

Response (201): The created integration (same shape as list response).

Response (403): Quota exceeded.

{
  "detail": "You have reached your plan limit of 1 storage destination(s). Please remove an existing destination or upgrade your plan."
}

PUT /api/integrations/{id}

Update an existing integration. Only provided fields are changed.

DELETE /api/integrations/{id}

Delete an integration permanently. Returns 204 on success.

POST /api/integrations/test

Test an integration connection without saving. Useful for "Test connection" UI buttons.

Request:

{
  "integration_type": "IMAP",
  "config": {"host": "imap.gmail.com", "port": 993, "username": "user@example.com", "use_ssl": true},
  "credentials": {"password": "app-password"}
}

Response (200):

{"success": true, "message": "IMAP connection successful"}

Supported connection tests: DROPBOX, IMAP, S3, WEBDAV, NEXTCLOUD. Other types return a message that testing is not yet supported.

GET /api/integrations/quota/

Get the current user's integration quota usage.

Response (200):

{
  "tier_id": "starter",
  "tier_name": "Starter",
  "destinations": {
    "current_count": 1,
    "max_allowed": 2,
    "can_add": true
  },
  "sources": {
    "current_count": 0,
    "max_allowed": 1,
    "can_add": true
  }
}

Cloud Provider Folder Browser

Browse folders in connected cloud storage providers. These endpoints are used by the OAuth callback pages to let users select a target folder after authorization.

POST /api/dropbox/list-folders

List folders in a Dropbox account. Requires a short-lived OAuth access token obtained during the authorization flow.

Request (form-data):

Field Type Required Description
access_token string Yes Dropbox OAuth access token
path string No Folder path to list (default: root)

Response (200):

{
  "folders": [
    { "name": "Documents", "path": "/Documents", "id": "id:abc123" },
    { "name": "Photos", "path": "/Photos", "id": "id:def456" }
  ],
  "path": "/",
  "has_more": false
}

POST /api/onedrive/list-folders

List folders in a OneDrive account. Requires a short-lived OAuth access token obtained during the authorization flow.

Request (form-data):

Field Type Required Description
access_token string Yes Microsoft Graph access token
path string No Folder path to list (default: root)

Response (200):

{
  "folders": [
    { "name": "Documents", "path": "/Documents", "id": "abc123", "child_count": 5 },
    { "name": "Pictures", "path": "/Pictures", "id": "def456", "child_count": 12 }
  ],
  "path": "/"
}

Webhooks

Manage webhook configurations for notifying external systems when document events occur. All webhook endpoints require admin access.

Supported Events

Event Description
document.uploaded A new document has been ingested
document.processed A document finished processing successfully
document.failed Document processing failed
user.signup A new user account was created
user.plan_changed A user's subscription plan changed
user.payment_issue A payment issue was reported for a user

GET /api/webhooks/events/

List all valid webhook event types.

Response (200):

["document.failed", "document.processed", "document.uploaded", "user.payment_issue", "user.plan_changed", "user.signup"]

GET /api/webhooks/

List all webhook configurations. Secrets are never included in responses.

Response (200):

[
  {
    "id": 1,
    "url": "https://example.com/webhook",
    "events": ["document.processed", "document.uploaded"],
    "is_active": true,
    "description": "Production webhook",
    "has_secret": true
  }
]

POST /api/webhooks/

Create a new webhook configuration.

Request:

{
  "url": "https://example.com/webhook",
  "secret": "my-shared-secret",
  "events": ["document.uploaded", "document.processed", "document.failed"],
  "is_active": true,
  "description": "My integration"
}

Response (201):

{
  "id": 1,
  "url": "https://example.com/webhook",
  "events": ["document.failed", "document.processed", "document.uploaded"],
  "is_active": true,
  "description": "My integration",
  "has_secret": true
}

GET /api/webhooks/{webhook_id}

Get a single webhook configuration.

Response (200): Same shape as list items above.

PUT /api/webhooks/{webhook_id}

Update an existing webhook. Only supplied fields are changed.

Request:

{
  "url": "https://new-url.example.com/webhook",
  "is_active": false
}

DELETE /api/webhooks/{webhook_id}

Delete a webhook configuration. Returns 204 No Content on success.

Webhook Payload Format

When a subscribed event occurs, a JSON POST request is sent to the configured URL:

{
  "event": "document.processed",
  "timestamp": 1709322559.123456,
  "data": {
    "file_id": 42,
    "filename": "invoice.pdf"
  }
}

HMAC Signature

If a secret is configured, an X-Webhook-Signature header is included with each request. The signature is computed as sha256=<hex-digest> using HMAC-SHA256 over the raw JSON body.

To verify in Python:

import hashlib, hmac

def verify_signature(body: bytes, secret: str, signature: str) -> bool:
    expected = "sha256=" + hmac.new(
        secret.encode(), body, hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, signature)

Retry Behavior

Failed deliveries (non-2xx responses or network errors) are automatically retried with exponential backoff: 60 s, 300 s, then 900 s (up to 3 retries with ±20 % jitter).

Error Handling

Errors follow standard HTTP status codes with descriptive messages:

{
  "detail": "File not found",
  "status_code": 404
}

Queue Monitoring

GET /api/queue/stats

Get comprehensive queue and processing statistics, including Redis queue lengths, Celery worker inspection data, and database-level processing summaries.

Authentication: Required

Response (200 OK):

{
  "queues": {
    "document_processor": 12,
    "default": 0,
    "celery": 0
  },
  "total_queued": 12,
  "celery": {
    "active": [
      {"id": "abc123", "name": "process_document", "args": "[42]", "started": 1700000000}
    ],
    "reserved": [],
    "scheduled": [],
    "workers_online": 1
  },
  "db_summary": {
    "total_files": 5000,
    "processing": 3,
    "failed": 1,
    "completed": 4900,
    "pending": 96,
    "recent_processing": [
      {"file_id": 42, "filename": "invoice.pdf", "current_step": "extract_metadata_with_gpt"}
    ]
  }
}

GET /api/queue/pending-count

Lightweight endpoint returning the total number of queued + in-progress items. Designed for the files page banner indicator.

Authentication: Required

Response (200 OK):

{
  "total_pending": 15
}

Diagnostic

GET /api/diagnostic/healthz/live

Lightweight liveness probe for Kubernetes. Returns 200 OK as long as the process is running. This endpoint does not check external dependencies and is intentionally cheap.

Authentication: None (designed for kubelet probes)

Response (200 OK):

{
  "status": "ok"
}

GET /api/diagnostic/healthz/ready

Readiness probe for Kubernetes. Verifies that the application can serve traffic by checking database and Redis connectivity.

Authentication: None (designed for kubelet probes)

Response (200 OK) – ready to serve traffic:

{
  "status": "ready",
  "checks": {
    "database": {"status": "ok"},
    "redis":    {"status": "ok"}
  }
}

Response (503 Service Unavailable) – database unreachable:

{
  "status": "not_ready",
  "checks": {
    "database": {"status": "error", "detail": "..."},
    "redis":    {"status": "ok"}
  }
}

GET /api/diagnostic/health

System health endpoint designed for monitoring tools such as Grafana, Uptime Kuma, Prometheus blackbox exporter, or any HTTP-based health checker.

Checks the database and Redis connectivity and returns a machine-readable JSON summary.

Authentication: Required (bypassed when AUTH_ENABLED=False)

Response (200 OK) – all subsystems healthy:

{
  "status": "healthy",
  "version": "1.2.3",
  "timestamp": "2024-01-15T10:30:00+00:00",
  "checks": {
    "database": {"status": "ok"},
    "redis":    {"status": "ok"}
  }
}

Response (200 OK) – one or more non-critical checks failed:

{
  "status": "degraded",
  "version": "1.2.3",
  "timestamp": "2024-01-15T10:30:00+00:00",
  "checks": {
    "database": {"status": "ok"},
    "redis":    {"status": "error", "detail": "Connection refused"}
  }
}

Response (503 Service Unavailable) – critical check (database) failed:

{
  "status": "unhealthy",
  "version": "1.2.3",
  "timestamp": "2024-01-15T10:30:00+00:00",
  "checks": {
    "database": {"status": "error", "detail": "..."},
    "redis":    {"status": "ok"}
  }
}

The status field is always one of: - "healthy" – all checks passed - "degraded" – at least one non-critical check failed (Redis unavailable) - "unhealthy" – a critical check failed (database unavailable); HTTP 503 is returned

Grafana / Uptime Kuma integration: point your health check at GET /api/diagnostic/health and check for HTTP 200 or the JSON status field.

POST /api/diagnostic/test-notification

Send a test notification through all configured notification channels.

Authentication: Required

Response (200 OK):

{
  "status": "success",
  "message": "Test notification sent successfully to 2 service(s)",
  "services_count": 2
}

Rate Limiting

The API implements rate limiting to ensure system stability. If you exceed the limits, you'll receive a 429 Too Many Requests response.

Database Configuration Wizard

Endpoints for building and testing database connection strings and migrating data between databases. All write endpoints require admin authentication.

GET /api/database/backends

List supported database backends with metadata.

Response (200):

[
  {
    "id": "sqlite",
    "label": "SQLite (Development)",
    "default_port": null,
    "description": "File-based database. Best for development and single-user setups.",
    "requires_host": false
  },
  {
    "id": "postgresql",
    "label": "PostgreSQL (Recommended for Production)",
    "default_port": 5432,
    "description": "Robust, full-featured database. Recommended for production.",
    "requires_host": true
  }
]

POST /api/database/build-url

Build a SQLAlchemy connection string from individual components.

Request:

{
  "backend": "postgresql",
  "host": "my-db.rds.amazonaws.com",
  "port": 5432,
  "database": "docuelevate",
  "username": "admin",
  "password": "secret",
  "ssl_mode": "require"
}

Response (200):

{
  "url": "postgresql://admin:secret@my-db.rds.amazonaws.com:5432/docuelevate?sslmode=require"
}

POST /api/database/test-connection

Test connectivity to a database.

Request:

{
  "url": "postgresql://admin:secret@my-db.rds.amazonaws.com:5432/docuelevate?sslmode=require"
}

Response (200):

{
  "success": true,
  "message": "Connection successful",
  "backend": "postgresql",
  "server_version": "PostgreSQL 16.2 on x86_64-pc-linux-gnu"
}

POST /api/database/preview-migration

Preview a data migration (table-by-table row counts) without copying data.

Request:

{
  "url": "sqlite:///./app/database.db"
}

Response (200):

{
  "success": true,
  "tables": [
    {"name": "documents", "row_count": 42},
    {"name": "files", "row_count": 150}
  ],
  "total_rows": 192
}

POST /api/database/migrate

Execute a full data migration from source to target database.

Request:

{
  "source_url": "sqlite:///./app/database.db",
  "target_url": "postgresql://admin:secret@host:5432/docuelevate"
}

Response (200):

{
  "success": true,
  "tables_copied": 8,
  "rows_copied": 192,
  "errors": []
}

Pipelines

The pipeline API lets you build and manage custom document processing workflows. Each pipeline is owned by a single user (or by the system when owner_id is null).

Step-types catalogue

GET /api/pipelines/step-types

Returns the catalogue of built-in step types.

Response (200):

{
  "convert_to_pdf": {
    "label": "Convert to PDF",
    "description": "Convert non-PDF documents to PDF format using Gotenberg.",
    "config_schema": {}
  },
  "ocr": {
    "label": "OCR Processing",
    "description": "Extract text using Azure Document Intelligence or local Tesseract.",
    "config_schema": {
      "force_cloud_ocr": { "type": "boolean", "default": false },
      "ocr_language": {
        "type": "select",
        "default": "auto",
        "description": "Language(s) for OCR. Overrides the global setting for Tesseract/EasyOCR. Azure/Mistral auto-detect.",
        "options": [
          { "value": "auto", "label": "Auto (use system default)" },
          { "value": "eng",  "label": "English" },
          { "value": "deu",  "label": "German" },
          { "value": "fra",  "label": "French" },
          { "value": "spa",  "label": "Spanish" },
          "..."
        ]
      }
    }
  }
}

The ocr_language field accepts Tesseract language codes (e.g. "eng", "deu", "eng+deu" for multi-language) or "auto" to fall back to the global system setting. The full list of 28 supported language codes is returned by the step-types endpoint.

List pipelines

GET /api/pipelines

Returns pipelines visible to the current user (own + system pipelines). Admins see all pipelines.

Create pipeline

POST /api/pipelines
Content-Type: application/json

{
  "name": "My Workflow",
  "description": "Converts, OCRs, and stores documents.",
  "is_default": false,
  "is_active": true
}

Response (201):

{
  "id": 1,
  "owner_id": "alice",
  "name": "My Workflow",
  "description": "Converts, OCRs, and stores documents.",
  "is_default": false,
  "is_active": true,
  "created_at": "2026-03-07T10:00:00+00:00",
  "updated_at": "2026-03-07T10:00:00+00:00"
}

Create system pipeline (admin only)

POST /api/pipelines/admin/system
Content-Type: application/json

{
  "name": "Global Default",
  "is_default": true
}

Get pipeline with steps

GET /api/pipelines/{pipeline_id}

Response (200):

{
  "id": 1,
  "owner_id": "alice",
  "name": "My Workflow",
  "steps": [
    { "id": 1, "position": 0, "step_type": "convert_to_pdf", "enabled": true, "config": {} },
    { "id": 2, "position": 1, "step_type": "ocr", "enabled": true, "config": { "force_cloud_ocr": false } }
  ]
}

Update pipeline

PUT /api/pipelines/{pipeline_id}
Content-Type: application/json

{ "name": "Renamed Workflow", "is_default": true }

Delete pipeline

DELETE /api/pipelines/{pipeline_id}

Returns 204 No Content.

Add step

POST /api/pipelines/{pipeline_id}/steps
Content-Type: application/json

{
  "step_type": "ocr",
  "label": "German OCR",
  "config": { "force_cloud_ocr": false, "ocr_language": "deu" },
  "enabled": true
}

Multi-language (Tesseract +-separated codes):

{
  "step_type": "ocr",
  "config": { "ocr_language": "eng+deu" }
}

Use "ocr_language": "auto" (or omit the field) to fall back to the global system language setting.

Update step

PUT /api/pipelines/{pipeline_id}/steps/{step_id}
Content-Type: application/json

{ "enabled": false }

Delete step

DELETE /api/pipelines/{pipeline_id}/steps/{step_id}

Returns 204 No Content.

Reorder steps

PUT /api/pipelines/{pipeline_id}/steps/reorder
Content-Type: application/json

[3, 1, 2]

Provide a complete ordered list of all step IDs. Their positions are reassigned 0, 1, 2, … in the given order.

Assign pipeline to a file

POST /api/files/{file_id}/assign-pipeline?pipeline_id=2

Pass no pipeline_id query parameter (or omit it) to clear the assignment.

Response (200):

{ "file_id": 42, "pipeline_id": 2 }

Routing Rules

Routing rules let you conditionally assign documents to different pipelines based on file properties such as type, size, filename, or AI-extracted metadata. Rules are evaluated in position order (lowest first); the first rule that matches wins. If no rule matches, the system falls back to the owner's (or global) default pipeline.

Supported operators and fields

GET /api/routing-rules/operators

Returns the catalogue of valid operators and built-in fields so UIs can populate dropdowns without hard-coding values.

Response (200):

{
  "operators": ["contains", "equals", "gt", "gte", "lt", "lte", "not_contains", "not_equals", "regex"],
  "builtin_fields": ["category", "document_type", "file_type", "filename", "size"],
  "metadata_prefix": "metadata."
}

Tip: For AI metadata fields use the metadata. prefix, e.g. metadata.sender, metadata.amount.

List routing rules

GET /api/routing-rules

Returns the current user's rules plus any system-wide rules (owner_id = null), ordered by position.

Response (200):

[
  {
    "id": 1,
    "owner_id": "alice",
    "name": "Route invoices",
    "position": 0,
    "field": "document_type",
    "operator": "equals",
    "value": "Invoice",
    "target_pipeline_id": 3,
    "is_active": true,
    "created_at": "2026-03-09T12:00:00+00:00",
    "updated_at": "2026-03-09T12:00:00+00:00"
  }
]

Create routing rule

POST /api/routing-rules
Content-Type: application/json

{
  "name": "Route invoices",
  "field": "document_type",
  "operator": "equals",
  "value": "Invoice",
  "target_pipeline_id": 3
}

Optional fields: position (auto-assigned if omitted), is_active (default true).

Response (201 Created): The created rule object.

Get routing rule

GET /api/routing-rules/{rule_id}

Response (200): A single rule object.

Update routing rule

PUT /api/routing-rules/{rule_id}
Content-Type: application/json

{ "name": "Renamed rule", "operator": "contains", "is_active": false }

Only the supplied fields are updated.

Response (200): The updated rule object.

Delete routing rule

DELETE /api/routing-rules/{rule_id}

Returns 204 No Content.

Reorder routing rules

PUT /api/routing-rules/reorder
Content-Type: application/json

{ "rule_ids": [3, 1, 2] }

Provide the complete ordered list of your rule IDs. Positions are reassigned 0, 1, 2, … in the given order.

Evaluate rules (dry run)

POST /api/routing-rules/evaluate
Content-Type: application/json

{
  "file_type": "application/pdf",
  "filename": "invoice_2024.pdf",
  "size": 204800,
  "document_type": "Invoice",
  "metadata": { "sender": "Acme Corp" }
}

Tests which rule (if any) would match the given properties without actually routing a document.

Response (200) – match found:

{
  "matched": true,
  "rule": { "id": 1, "name": "Route invoices", "..." : "..." },
  "target_pipeline": { "id": 3, "name": "Invoice Pipeline", "is_active": true }
}

Response (200) – no match:

{
  "matched": false,
  "rule": null,
  "target_pipeline": null
}

API Tokens

Personal API tokens allow programmatic access to the DocuElevate API without session cookies. Tokens are ideal for CI/CD pipelines, webhook integrations, and automation scripts.

Each token is prefixed with de_ for easy identification. Only a SHA-256 hash is stored server-side; the plaintext is returned exactly once at creation time.

Usage tracking records when each token was last used and from which IP address.

POST /api/api-tokens/

Create a new API token. Optionally specify a lifetime in days via expires_in_days (1–3650). If omitted the token never expires.

Request:

{
  "name": "CI Pipeline",
  "expires_in_days": 90
}

Response (201 Created):

{
  "id": 1,
  "name": "CI Pipeline",
  "token_prefix": "de_Ab3xY7kL",
  "token": "de_Ab3xY7kLmN9pQrStUvWxYz0123456789abcdef",
  "is_active": true,
  "last_used_at": null,
  "last_used_ip": null,
  "created_at": "2026-03-08T12:00:00Z",
  "revoked_at": null,
  "expires_at": "2026-06-06T12:00:00Z"
}

Important: The token field is only included in the creation response. Copy it immediately — it will not be shown again.

GET /api/api-tokens/

List all tokens for the authenticated user. The full token value is never included.

Response (200):

[
  {
    "id": 1,
    "name": "CI Pipeline",
    "token_prefix": "de_Ab3xY7kL",
    "is_active": true,
    "last_used_at": "2026-03-08T15:30:00Z",
    "last_used_ip": "203.0.113.42",
    "created_at": "2026-03-08T12:00:00Z",
    "revoked_at": null,
    "expires_at": "2026-06-06T12:00:00Z"
  }
]

DELETE /api/api-tokens/{token_id}

Revoke or permanently delete a token:

  • Active token – soft-revoked (kept for audit purposes, marked inactive). Response: {"detail": "Token revoked"}
  • Already-revoked token – permanently deleted from the database. Response: {"detail": "Token deleted"}

Response (200):

{
  "detail": "Token revoked"
}

POST /api/api-tokens/{token_id}/reactivate

Reactivate a previously revoked token. Clears revoked_at and sets is_active back to true.

Response (200): The updated TokenResponse object.

Using API Tokens

Include the token in the Authorization header of any API request:

# Upload a document
curl -X POST "http://your-instance/api/files/ui-upload" \
  -H "Authorization: Bearer de_your_token_here" \
  -F "file=@/path/to/document.pdf"

# List files
curl -X GET "http://your-instance/api/files" \
  -H "Authorization: Bearer de_your_token_here"

Python example:

import requests

response = requests.post(
    "http://your-instance/api/files/ui-upload",
    headers={"Authorization": "Bearer de_your_token_here"},
    files={"file": open("document.pdf", "rb")},
)
print(response.json())

Classification Rules

The classification rules API lets you manage custom document classification rules. Rules are evaluated during the classify pipeline step to assign a category to each document based on filename patterns, content keywords, and metadata fields.

Built-in Categories

GET /api/classification-rules/categories

Returns the pre-built classification categories.

Response (200):

{
  "invoice": "Invoice",
  "contract": "Contract",
  "receipt": "Receipt",
  "letter": "Letter",
  "report": "Report",
  "bank_statement": "Bank Statement",
  "tax_document": "Tax Document",
  "insurance": "Insurance Document",
  "payslip": "Payslip",
  "unknown": "Unknown"
}

Rule Types

GET /api/classification-rules/rule-types

Returns the supported rule types with descriptions.

Response (200):

[
  {
    "type": "filename_pattern",
    "label": "Filename Pattern",
    "description": "Regex pattern matched against the original filename."
  },
  {
    "type": "content_keyword",
    "label": "Content Keyword",
    "description": "Pipe-separated keywords matched against the OCR text."
  },
  {
    "type": "metadata_match",
    "label": "Metadata Match",
    "description": "field=value pattern matched against existing AI metadata."
  }
]

List Rules

GET /api/classification-rules/

List all classification rules visible to the current user (system rules + own rules).

Response (200):

[
  {
    "id": 1,
    "owner_id": "user@example.com",
    "name": "German Invoice Filename",
    "category": "invoice",
    "rule_type": "filename_pattern",
    "pattern": "(?i)rechnung",
    "priority": 10,
    "case_sensitive": false,
    "enabled": true
  }
]

Create Rule

POST /api/classification-rules/

Create a new custom classification rule.

Request:

{
  "name": "German Invoice Filename",
  "category": "invoice",
  "rule_type": "filename_pattern",
  "pattern": "(?i)rechnung",
  "priority": 10,
  "case_sensitive": false,
  "enabled": true
}
Field Type Required Description
name string Yes Unique rule name (per user)
category string Yes Target category (e.g. invoice, contract, or custom)
rule_type string Yes One of: filename_pattern, content_keyword, metadata_match
pattern string Yes Regex (filename), pipe-separated keywords (content), or field=value (metadata)
priority integer No Higher priority rules are evaluated first (default: 0)
case_sensitive boolean No Case-sensitive matching (default: false)
enabled boolean No Whether the rule is active (default: true)

Response (201):

{
  "id": 1,
  "owner_id": "user@example.com",
  "name": "German Invoice Filename",
  "category": "invoice",
  "rule_type": "filename_pattern",
  "pattern": "(?i)rechnung",
  "priority": 10,
  "case_sensitive": false,
  "enabled": true
}

Get Rule

GET /api/classification-rules/{rule_id}

Update Rule

PUT /api/classification-rules/{rule_id}

Request (partial update):

{
  "priority": 20,
  "enabled": false
}

Delete Rule

DELETE /api/classification-rules/{rule_id}

Response: 204 No Content

Automation (Zapier / Make.com)

Manage automation hook subscriptions for integrating DocuElevate with external platforms like Zapier and Make.com. All endpoints require API token authentication (Authorization: Bearer <token>).

Supported Events

The automation system shares event types with the Webhooks subsystem:

Event Description
document.uploaded A new document has been ingested
document.processed A document finished processing successfully
document.failed Document processing failed
user.signup A new user account was created
user.plan_changed A user's subscription plan changed
user.payment_issue A payment issue was reported for a user

GET /api/automation/events

List all valid event types that automation hooks can subscribe to.

Response (200):

["document.failed", "document.processed", "document.uploaded", "user.payment_issue", "user.plan_changed", "user.signup"]

POST /api/automation/hooks/subscribe

Subscribe to DocuElevate events. Zapier and Make.com call this endpoint to register a webhook URL that receives event notifications.

Request:

curl -X POST "http://your-instance/api/automation/hooks/subscribe" \
  -H "Authorization: Bearer de_your_token_here" \
  -H "Content-Type: application/json" \
  -d '{
    "target_url": "https://hooks.zapier.com/hooks/catch/123456/abcdef/",
    "events": ["document.processed", "document.uploaded"],
    "hook_type": "zapier",
    "secret": "optional-signing-secret",
    "description": "My Zap for processed documents"
  }'

Response (201):

{
  "id": 1,
  "target_url": "https://hooks.zapier.com/hooks/catch/123456/abcdef/",
  "events": ["document.processed", "document.uploaded"],
  "is_active": true,
  "hook_type": "zapier",
  "description": "My Zap for processed documents",
  "has_secret": true
}

GET /api/automation/hooks

List all automation hook subscriptions.

Response (200):

[
  {
    "id": 1,
    "target_url": "https://hooks.zapier.com/hooks/catch/123456/abcdef/",
    "events": ["document.processed", "document.uploaded"],
    "is_active": true,
    "hook_type": "zapier",
    "description": "My Zap for processed documents",
    "has_secret": true
  }
]

DELETE /api/automation/hooks/{hook_id}

Unsubscribe an automation hook. Zapier calls this when a Zap is turned off or deleted.

Response (204): No content.

GET /api/automation/triggers/sample/{event}

Get sample trigger data for Zapier field mapping. Zapier uses this during Zap setup to discover available fields.

Request:

curl "http://your-instance/api/automation/triggers/sample/document.processed" \
  -H "Authorization: Bearer de_your_token_here"

Response (200):

[
  {
    "id": "evt_sample0002",
    "event": "document.processed",
    "timestamp": 1710000060.0,
    "document_id": 42,
    "filename": "invoice_2024.pdf",
    "status": "processed",
    "title": "Invoice #1234",
    "owner_id": "user@example.com"
  }
]

POST /api/automation/actions/upload

Upload a document from an automation platform. This incoming action endpoint allows Zapier or Make.com to push documents into DocuElevate for processing.

Request:

curl -X POST "http://your-instance/api/automation/actions/upload" \
  -H "Authorization: Bearer de_your_token_here" \
  -F "file=@/path/to/document.pdf"

Response (200):

{
  "status": "accepted",
  "filename": "document.pdf",
  "task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Zapier-Compatible Payload Format

When events fire, automation hooks receive a flat JSON payload (no nested data key) that Zapier and Make.com can easily map:

{
  "id": "evt_a1b2c3d4e5f67890",
  "event": "document.processed",
  "timestamp": 1710000060.0,
  "document_id": 42,
  "filename": "invoice_2024.pdf",
  "status": "processed",
  "title": "Invoice #1234",
  "owner_id": "user@example.com"
}

The id field is unique per event and is used by Zapier for deduplication. If a secret was provided during subscription, an X-Webhook-Signature header with an HMAC-SHA256 signature is included.

Retry Behavior

Automation hook deliveries follow the same retry policy as regular webhooks: up to 3 retries with exponential backoff (60 s, 300 s, 900 s) and ±20% jitter.

Further Assistance

For additional help with the API, please contact our support team or refer to the Development Guide.

Mobile App API

The mobile API provides endpoints used by the native iOS and Android app. All endpoints require authentication (Bearer token or active session cookie).

For full mobile app documentation see MobileApp.md.

POST /api/mobile/generate-token

Exchange an active web session for a long-lived API token scoped to the mobile app.

Request:

{ "device_name": "John's iPhone" }

Response (201 Created):

{
  "token": "de_AbCdEfGhIjKl...",
  "token_id": 42,
  "name": "Mobile App – John's iPhone",
  "created_at": "2026-03-10T09:30:00Z"
}

The token is shown once only.

POST /api/mobile/register-device

Register an Expo push token to receive push notifications.

Request:

{
  "push_token": "ExponentPushToken[xxxxxx]",
  "device_name": "John's iPhone",
  "platform": "ios"
}

Response (201 Created): Device record with id, platform, is_active, created_at.

GET /api/mobile/devices

List all registered push-notification devices for the current user.

Response (200 OK): Array of device records.

DELETE /api/mobile/devices/{device_id}

Deactivate or permanently delete a push-notification device:

  • Active device – soft-deactivated (record kept, will no longer receive push notifications). Response: {"detail": "Device deactivated"}
  • Already-inactive device – permanently deleted from the database. Response: {"detail": "Device deleted"}

Response (200)

GET /api/mobile/whoami

Return basic profile information for the authenticated user.

Response (200 OK):

{
  "owner_id": "john@example.com",
  "display_name": "John Doe",
  "email": "john@example.com",
  "avatar_url": "https://www.gravatar.com/avatar/...",
  "is_admin": false
}

GraphQL API

DocuElevate exposes a GraphQL API at /graphql alongside the REST API. It supports flexible queries with field selection, making it ideal for dashboards and integrations that only need a subset of the available data.

Endpoint

Method URL Description
POST /graphql Execute a GraphQL query or mutation
GET /graphql Open the GraphiQL interactive playground

Authentication

The GraphQL endpoint honours the same authentication rules as the REST API:

  • AUTH_ENABLED=False (default, single-user mode): all queries are allowed without credentials.
  • AUTH_ENABLED=True (multi-user mode): a valid session cookie or an Authorization: Bearer <token> API token is required. Admin-only queries (settings, users) additionally require the is_admin flag.

Available Queries

Field Returns Notes
documents(ownerId, limit, offset) [DocumentType] Paginated list of documents
document(id) DocumentType Single document by primary key
pipelines(ownerId, limit, offset) [PipelineType] Paginated list of pipelines with steps
pipeline(id) PipelineType Single pipeline by primary key
settings(limit, offset) [SettingType] Non-sensitive app settings (admin only)
users(limit, offset) [UserType] User profiles (admin only)
user(userId) UserType Single user profile (admin only)

Note: Sensitive configuration keys (API secrets, passwords, tokens) are automatically excluded from the settings query regardless of the caller's privilege level.

GraphiQL Playground

Navigate to http://<your-instance>/graphql in a browser to open the interactive GraphiQL IDE, which provides schema documentation, auto-complete, and the ability to run queries directly.

Example Queries

List recent documents:

{
  documents(limit: 5) {
    id
    originalFilename
    mimeType
    fileSize
    documentTitle
    createdAt
  }
}

Fetch a pipeline with its steps:

{
  pipeline(id: 1) {
    id
    name
    description
    isDefault
    isActive
    steps {
      position
      stepType
      label
      enabled
    }
  }
}

List application settings (admin only):

{
  settings {
    key
    value
    updatedAt
  }
}

List user profiles (admin only):

{
  users(limit: 10) {
    userId
    displayName
    subscriptionTier
    isBlocked
  }
}

Using variables:

query GetDocument($id: Int!) {
  document(id: $id) {
    id
    originalFilename
    documentTitle
    isDuplicate
    ocrQualityScore
  }
}

Variables: { "id": 42 }

System Reset

Admin-only endpoints for resetting the system to a clean state. Requires ENABLE_FACTORY_RESET=True.

GET /api/admin/system-reset/status

Check whether the system reset feature is enabled.

Response (200):

{
  "enabled": true,
  "factory_reset_on_startup": false
}

POST /api/admin/system-reset/full

Wipe all user data (database + work-files).

Request:

{
  "confirmation": "DELETE"
}

Response (200):

{
  "status": "ok",
  "result": {
    "database": { "files": 42, "processing_logs": 100 },
    "filesystem": { "deleted_dirs": 5, "deleted_files": 12 }
  }
}

POST /api/admin/system-reset/reimport

Move original files to a reimport folder, wipe everything, and configure the reimport folder as a watch folder for re-ingestion.

Request:

{
  "confirmation": "REIMPORT"
}

Response (200):

{
  "status": "ok",
  "result": {
    "database": { "files": 42 },
    "filesystem": { "deleted_dirs": 5, "deleted_files": 12 },
    "reimport": { "files_moved": 42, "reimport_folder": "/workdir/reimport" }
  }
}

Comments & Annotations

Threaded comments and PDF annotations for document collaboration.

List Comments

GET /api/files/{file_id}/comments

Returns all comments for a document, organized into a threaded tree.

Response (200):

{
  "file_id": 1,
  "comments": [
    {
      "id": 1,
      "file_id": 1,
      "user_id": "alice",
      "parent_id": null,
      "body": "Please review section 3.",
      "mentions": ["bob"],
      "is_resolved": false,
      "created_at": "2026-03-21T12:00:00+00:00",
      "updated_at": "2026-03-21T12:00:00+00:00",
      "replies": [
        {
          "id": 2,
          "file_id": 1,
          "user_id": "bob",
          "parent_id": 1,
          "body": "Done!",
          "mentions": [],
          "is_resolved": false,
          "created_at": "2026-03-21T12:05:00+00:00",
          "updated_at": "2026-03-21T12:05:00+00:00",
          "replies": []
        }
      ]
    }
  ],
  "total": 2
}

Create Comment

POST /api/files/{file_id}/comments

Create a new comment on a document. @mentions are automatically extracted from the body.

Request:

{
  "body": "Hey @bob, please review this section.",
  "parent_id": null
}

Response (201):

{
  "id": 3,
  "file_id": 1,
  "user_id": "alice",
  "parent_id": null,
  "body": "Hey @bob, please review this section.",
  "mentions": ["bob"],
  "is_resolved": false,
  "created_at": "2026-03-21T12:10:00+00:00",
  "updated_at": "2026-03-21T12:10:00+00:00"
}

Update Comment

PUT /api/files/{file_id}/comments/{comment_id}

Update the body of an existing comment. Only the comment author may update it.

Request:

{
  "body": "Updated comment text @charlie"
}

Delete Comment

DELETE /api/files/{file_id}/comments/{comment_id}

Delete a comment. Only the comment author may delete it.

Response: 204 No Content

Resolve / Unresolve Comment

PATCH /api/files/{file_id}/comments/{comment_id}/resolve

Mark a comment thread as resolved or unresolved.

Request:

{
  "is_resolved": true
}

List Annotations

GET /api/files/{file_id}/annotations

Returns all PDF page annotations for a document, ordered by page then creation time.

Response (200):

{
  "file_id": 1,
  "annotations": [
    {
      "id": 1,
      "file_id": 1,
      "user_id": "alice",
      "page": 1,
      "x": 100.0,
      "y": 200.0,
      "width": 150.0,
      "height": 20.0,
      "content": "Important paragraph",
      "annotation_type": "highlight",
      "color": "#ffff00",
      "created_at": "2026-03-21T12:00:00+00:00",
      "updated_at": "2026-03-21T12:00:00+00:00"
    }
  ],
  "total": 1
}

Create Annotation

POST /api/files/{file_id}/annotations

Create a new annotation on a PDF page.

Request:

{
  "page": 1,
  "x": 100.0,
  "y": 200.0,
  "width": 150.0,
  "height": 20.0,
  "content": "Important paragraph",
  "annotation_type": "highlight",
  "color": "#ffff00"
}

Allowed annotation_type values: note, highlight, underline, strikethrough.

Update Annotation

PUT /api/files/{file_id}/annotations/{annotation_id}

Update an existing annotation. Only the annotation author may update it.

Request (all fields optional):

{
  "content": "Updated note",
  "color": "#00ff00"
}

Delete Annotation

DELETE /api/files/{file_id}/annotations/{annotation_id}

Delete an annotation. Only the annotation author may delete it.

Response: 204 No Content

List Mentionable Users

GET /api/users/mentionable

Returns all non-blocked user profiles for the @mention autocomplete.

Response (200):

[
  { "user_id": "alice", "display_name": "Alice Anderson" },
  { "user_id": "bob", "display_name": "Bob Baker" }
]

File Sharing & Permissions

DocuElevate supports per-user document sharing with role-based access control.

Roles

Role View Comment / Annotate Edit metadata Delete Share
owner
editor
viewer
  • Only the file owner can share a document, change roles, or delete the document.
  • When a user is @mentioned in a comment they are automatically granted viewer access to the document (multi-user mode only).

List Shares

GET /api/files/{file_id}/shares

Returns all active shares for a document. Only the file owner (or an admin) may call this endpoint.

Response (200):

[
  {
    "id": 1,
    "file_id": 42,
    "owner_id": "alice",
    "shared_with_user_id": "bob",
    "role": "viewer",
    "created_at": "2026-03-22T10:00:00+00:00",
    "updated_at": "2026-03-22T10:00:00+00:00"
  }
]

Error Responses: - 403: Not the file owner - 404: File not found


Create Share

POST /api/files/{file_id}/shares

Share a document with another user. Only the file owner may call this endpoint. If the user already has a share, their role is updated.

Request:

{
  "shared_with_user_id": "bob",
  "role": "viewer"
}

role must be "viewer" (default) or "editor".

Response (201):

{
  "id": 1,
  "file_id": 42,
  "owner_id": "alice",
  "shared_with_user_id": "bob",
  "role": "viewer",
  "created_at": "2026-03-22T10:00:00+00:00",
  "updated_at": "2026-03-22T10:00:00+00:00"
}

Error Responses: - 403: Not the file owner - 422: Invalid role, empty user ID, or sharing with self


Update Share Role

PUT /api/files/{file_id}/shares/{share_id}

Change the role of an existing share. Only the file owner may call this endpoint.

Request:

{
  "role": "editor"
}

Response (200): Updated share object.

Error Responses: - 403: Not the file owner - 404: Share not found - 422: Invalid role


Revoke Share

DELETE /api/files/{file_id}/shares/{share_id}

Remove a user's access to a document. Only the file owner may revoke shares.

Response (200):

{ "status": "success", "message": "Share revoked successfully" }

Error Responses: - 403: Not the file owner - 404: Share not found


List Shared With

GET /api/files/{file_id}/shared-with

Returns who a document is shared with. Accessible to any user with at least viewer access (owner, editors, and viewers can all call this).

Response (200):

[
  {
    "share_id": 1,
    "user_id": "bob",
    "display_name": "Bob Baker",
    "role": "viewer"
  }
]