Skip to content

Configuration Guide

DocuElevate is designed to be highly configurable through environment variables. This guide explains all available configuration options and how to use them effectively.

Environment Variables

Configuration is primarily done through environment variables specified in a .env file.

Core Settings

Variable Description Example
DATABASE_URL Path/URL to the SQLite database (or other SQL backend). Use the Database Wizard for guided setup. See Database Configuration. sqlite:///./app/database.db
DB_POOL_SIZE Number of persistent connections in the pool per worker (PostgreSQL/MySQL only; ignored for SQLite). 10
DB_MAX_OVERFLOW Additional connections beyond DB_POOL_SIZE under burst load (PostgreSQL/MySQL only). 20
DB_POOL_TIMEOUT Seconds to wait for a pool connection before raising TimeoutError (PostgreSQL/MySQL only). 30
DB_POOL_RECYCLE Recycle connections after this many seconds to avoid stale connections (PostgreSQL/MySQL only). 1800
REDIS_URL URL for Redis, used by Celery for broker & result store. redis://redis:6379/0
WORKDIR Working directory for the application. /workdir
GOTENBERG_URL Gotenberg PDF processing URL. http://gotenberg:3000
EXTERNAL_HOSTNAME The external hostname for the application. docuelevate.example.com
PUBLIC_BASE_URL Full public base URL including scheme (e.g., https://docuelevate.example.com). When set, overrides auto-detected URLs used for OAuth redirect URIs. Required when your reverse proxy does not forward X-Forwarded-Proto headers. (not set)
ALLOW_FILE_DELETE Enable file deletion in the web interface (true/false). true
COMPLIANCE_ENABLED Enable the compliance templates dashboard (GDPR, HIPAA, SOC 2). true
FACTORY_RESET_ON_STARTUP Wipe all user data on every startup (demo/testing). false
ENABLE_FACTORY_RESET Show the System Reset page in the admin UI. false

Batch Processing Settings

Control how the /processall endpoint handles large batches of files to prevent overwhelming downstream APIs.

Variable Description Default
PROCESSALL_THROTTLE_THRESHOLD Number of files above which throttling is applied. Files <= threshold are processed immediately. 20
PROCESSALL_THROTTLE_DELAY Delay in seconds between each task submission when throttling is active. 3

Example Usage: When processing 25 files with default settings: - Files are staggered: file 0 at 0s, file 1 at 3s, file 2 at 6s, etc. - Total queue time: (25-1) × 3 = 72 seconds - Prevents API rate limit issues and ensures smooth processing

Task Retry Settings

Failed Celery tasks are automatically retried with exponential backoff and optional jitter. Different task types use different default delays (OCR tasks wait longer than upload tasks to account for API rate limits).

Variable Description Default
TASK_RETRY_MAX_RETRIES Maximum number of retry attempts for any failed task. 3
TASK_RETRY_DELAYS Comma-separated list of countdown values in seconds for each retry attempt. Values beyond the list double the last entry for subsequent retries. 60,300,900
TASK_RETRY_JITTER Apply ±20 % random jitter to countdowns to prevent thundering-herd problems when many tasks fail at the same time. true

Per-task-type policies (not overridable via environment variables; set in code):

Task type Default delays (s) Notes
General tasks 60, 300, 900 Controlled by TASK_RETRY_DELAYS
OCR / AI tasks 120, 600, 1800 Longer waits for API rate-limit windows to clear
Cloud-storage uploads 60, 300, 900 Controlled by TASK_RETRY_DELAYS

Example – aggressive retries for a high-availability setup:

TASK_RETRY_MAX_RETRIES=5
TASK_RETRY_DELAYS=30,120,600,1800,3600
TASK_RETRY_JITTER=true

Example – conservative retries with longer back-off:

TASK_RETRY_MAX_RETRIES=3
TASK_RETRY_DELAYS=300,900,3600
TASK_RETRY_JITTER=true

Client-Side Upload Throttling

Control how the web UI queues and paces file uploads to avoid overwhelming the backend, especially when dragging large directories (potentially thousands of files) onto the upload area.

Variable Description Default
UPLOAD_CONCURRENCY Maximum number of files uploaded simultaneously from the browser. 3
UPLOAD_QUEUE_DELAY_MS Delay in milliseconds between starting each upload slot. Staggers upload starts to smooth out server load. 500

Adaptive back-off: The browser automatically slows down if the server responds with HTTP 429 (Too Many Requests). It reads the Retry-After header, pauses the queue for the indicated time, doubles the inter-slot delay (exponential back-off, capped at 30 s), and reduces concurrency to 1. After 5 consecutive successes it gradually recovers toward the configured values.

Example: With UPLOAD_CONCURRENCY=3 and UPLOAD_QUEUE_DELAY_MS=500, a directory of 5,000 files is uploaded ≈ 3 at a time with 500 ms pacing – the backend processes files at its own rate while the queue drains in the background without triggering API rate limits.

Per-User Upload Rate Limiting

Server-side rate limiting that prevents any single user from overwhelming the system with bulk uploads. The limiter uses a Redis-backed sliding window and dynamically adjusts limits based on system health.

Variable Description Default
UPLOAD_RATE_LIMIT_PER_USER Maximum uploads allowed per user within the sliding window. Effective limit may be reduced under load. 20
UPLOAD_RATE_LIMIT_WINDOW Sliding window size in seconds. 60

Health-aware dynamic limiting: The effective per-user limit is automatically reduced when the system is under heavy load:

System condition Effective limit Trigger
Normal 100 % of base Queue < 50, CPU load normal
Moderate load 50 % of base Queue 50–100 or CPU > 1.5×
High load 25 % of base Queue 100–200 or CPU > 2×
Critical load 10 % of base Queue > 200 or CPU > 3×

When a user exceeds the limit, the server returns HTTP 429 Too Many Requests with a Retry-After header. The browser client (see Client-Side Upload Throttling above) automatically pauses and retries.

Note: The limiter fails open — if Redis is unavailable, all uploads are allowed through so that a monitoring outage never blocks document processing.

File Upload Size Limits

Security Feature: Control file upload sizes to prevent resource exhaustion attacks. See SECURITY_AUDIT.md for security details.

Variable Description Default
MAX_UPLOAD_SIZE Maximum file upload size in bytes. Files exceeding this limit are rejected. 1073741824 (1GB)
MAX_SINGLE_FILE_SIZE Optional: Maximum size for a single file chunk in bytes. Files exceeding this are split into smaller parts. None (no splitting)
MAX_REQUEST_BODY_SIZE Maximum request body size in bytes for non-file-upload requests (JSON, form data, etc.). File uploads use MAX_UPLOAD_SIZE instead. 1048576 (1MB)

Configuration Examples:

# Default: Allow up to 1GB uploads, no splitting, 1MB JSON/form body limit
MAX_UPLOAD_SIZE=1073741824
MAX_REQUEST_BODY_SIZE=1048576

# Conservative: 100MB max, split files over 50MB
MAX_UPLOAD_SIZE=104857600
MAX_SINGLE_FILE_SIZE=52428800

# Large files: 2GB max, split files over 500MB
MAX_UPLOAD_SIZE=2147483648
MAX_SINGLE_FILE_SIZE=524288000

File Splitting Behavior: - When MAX_SINGLE_FILE_SIZE is configured and a PDF exceeds this size, it is automatically split into smaller chunks - IMPORTANT: Splitting is done at PAGE BOUNDARIES, not by byte position - Uses pypdf to properly parse PDF structure - Each output file is a complete, valid PDF containing whole pages - No risk of corrupted or broken PDF files - Pages are distributed across output files to stay under size limit - Each chunk is processed sequentially as a separate task - Only works for PDF files (images and office documents are converted to PDF first) - Original file is removed after successful splitting - Useful for very large PDFs to prevent memory issues during processing

Use Cases: - Default (1GB, no splitting): Suitable for most deployments handling typical documents - With splitting: Recommended for servers with limited memory or when processing very large scanned documents - Higher limits: For environments specifically designed to handle large architectural plans, books, or scanned archives

Watch Folder Ingestion

DocuElevate can automatically monitor directories for new files and ingest them without any manual action. This works for: - Local filesystem paths — including SMB/CIFS shares, NFS mounts, or any path accessible to the Docker container - FTP server directories — using the configured FTP connection credentials - SFTP server directories — using the configured SFTP connection credentials

Local Watch Folders

Mount the share or directory into the Docker container and configure one or more paths to watch.

Variable Description Default
WATCH_FOLDERS Comma-separated list of absolute local filesystem paths to poll for new files. (empty)
WATCH_FOLDER_POLL_INTERVAL How often to scan the folders, in minutes. 1
WATCH_FOLDER_DELETE_AFTER_PROCESS Delete source files from the watch folder after they are successfully enqueued. When false, processed files are tracked in a cache file to prevent re-ingestion. false

Example (docker-compose.yaml):

services:
  worker:
    volumes:
      - /mnt/smb/scanner:/watchfolders/scanner  # SMB/CIFS share mounted on the host
      - /mnt/nfs/inbox:/watchfolders/inbox       # NFS mount
    environment:
      WATCH_FOLDERS: /watchfolders/scanner,/watchfolders/inbox
      WATCH_FOLDER_POLL_INTERVAL: 1
      WATCH_FOLDER_DELETE_AFTER_PROCESS: false

Tip for HP Scanners and MFPs: Configure your scanner's "Scan to Network Folder" to point at an SMB share that is also mounted into the DocuElevate worker container. DocuElevate will pick up the scan files automatically every minute. No email forwarding is required.

FTP Ingest (Watch Folder)

DocuElevate can poll an FTP server directory for new files. It reuses the FTP connection settings already configured for uploads.

Variable Description Default
FTP_INGEST_ENABLED Enable FTP folder watching (true/false). false
FTP_INGEST_FOLDER Path on the FTP server to poll (e.g. /incoming). Uses the existing FTP connection settings. (empty)
FTP_INGEST_DELETE_AFTER_PROCESS Delete files from the FTP server after they are downloaded and enqueued. false

Example:

# Existing FTP upload settings (also used for ingest)
FTP_HOST=ftp.example.com
FTP_USERNAME=docuelevate
FTP_PASSWORD=secret

# FTP ingest configuration
FTP_INGEST_ENABLED=true
FTP_INGEST_FOLDER=/incoming
FTP_INGEST_DELETE_AFTER_PROCESS=false

SFTP Ingest (Watch Folder)

DocuElevate can poll an SFTP server directory for new files. It reuses the SFTP connection settings already configured for uploads.

Variable Description Default
SFTP_INGEST_ENABLED Enable SFTP folder watching (true/false). false
SFTP_INGEST_FOLDER Path on the SFTP server to poll (e.g. /uploads/inbox). Uses the existing SFTP connection settings. (empty)
SFTP_INGEST_DELETE_AFTER_PROCESS Delete files from the SFTP server after they are downloaded and enqueued. false

Example:

# Existing SFTP upload settings (also used for ingest)
SFTP_HOST=sftp.example.com
SFTP_USERNAME=docuelevate
SFTP_PRIVATE_KEY=/run/secrets/sftp_key

# SFTP ingest configuration
SFTP_INGEST_ENABLED=true
SFTP_INGEST_FOLDER=/uploads/inbox
SFTP_INGEST_DELETE_AFTER_PROCESS=false

Supported File Types for Watch Folders

Watch folder ingestion accepts the same file types as the web upload interface: PDF, Word, Excel, PowerPoint, images (JPEG, PNG, TIFF, BMP, GIF), plain text, CSV, RTF, and more. Unsupported files (executables, archives, etc.) are silently skipped.

Dropbox Ingest (Watch Folder)

DocuElevate can poll a Dropbox folder for new files. It reuses the Dropbox OAuth credentials already configured for uploads.

Variable Description Default
DROPBOX_INGEST_ENABLED Enable Dropbox folder watching (true/false). false
DROPBOX_INGEST_FOLDER Dropbox folder path to poll (e.g. /Inbox/Scanner). Uses the existing Dropbox OAuth credentials. (empty)
DROPBOX_INGEST_DELETE_AFTER_PROCESS Delete files from Dropbox after they are downloaded and enqueued. false

Google Drive Ingest (Watch Folder)

DocuElevate can poll a Google Drive folder for new files. It reuses the existing Google Drive service-account or OAuth credentials.

Variable Description Default
GOOGLE_DRIVE_INGEST_ENABLED Enable Google Drive folder watching (true/false). false
GOOGLE_DRIVE_INGEST_FOLDER_ID Google Drive folder ID to poll (copy from the URL of the target folder in Drive). Uses the existing Google Drive credentials. (empty)
GOOGLE_DRIVE_INGEST_DELETE_AFTER_PROCESS Delete files from Google Drive after they are downloaded and enqueued. false

OneDrive Ingest (Watch Folder)

DocuElevate can poll a OneDrive folder for new files. It reuses the existing OneDrive MSAL (client ID/secret/refresh token) credentials.

Variable Description Default
ONEDRIVE_INGEST_ENABLED Enable OneDrive folder watching (true/false). false
ONEDRIVE_INGEST_FOLDER_PATH OneDrive folder path to poll (e.g. /Inbox/Scanner). Uses the existing OneDrive credentials. (empty)
ONEDRIVE_INGEST_DELETE_AFTER_PROCESS Delete files from OneDrive after they are downloaded and enqueued. false

Nextcloud Ingest (Watch Folder)

DocuElevate can poll a Nextcloud folder via WebDAV for new files. It reuses the existing Nextcloud upload URL and credentials.

Variable Description Default
NEXTCLOUD_INGEST_ENABLED Enable Nextcloud folder watching (true/false). false
NEXTCLOUD_INGEST_FOLDER Nextcloud folder path to poll (e.g. /Scans/Inbox). Uses the existing Nextcloud upload URL and credentials. (empty)
NEXTCLOUD_INGEST_DELETE_AFTER_PROCESS Delete files from Nextcloud after they are downloaded and enqueued. false

Amazon S3 Ingest (Watch Folder)

DocuElevate can poll an S3 bucket prefix for new objects. It reuses the existing S3/AWS credentials and bucket name.

Variable Description Default
S3_INGEST_ENABLED Enable S3 prefix watching (true/false). false
S3_INGEST_PREFIX S3 key prefix to poll (e.g. inbox/scanner/). Uses the existing S3 bucket and AWS credentials. (empty)
S3_INGEST_DELETE_AFTER_PROCESS Delete objects from S3 after they are downloaded and enqueued. false

WebDAV Ingest (Watch Folder)

DocuElevate can poll a WebDAV folder for new files. It reuses the existing WebDAV URL and credentials.

Variable Description Default
WEBDAV_INGEST_ENABLED Enable WebDAV folder watching (true/false). false
WEBDAV_INGEST_FOLDER WebDAV folder path to poll. Uses the existing WebDAV URL and credentials. (empty)
WEBDAV_INGEST_DELETE_AFTER_PROCESS Delete files from WebDAV after they are downloaded and enqueued. false

Per-User Watch Folder Integrations

In addition to system-level watch folders, each user can configure personal watch folder sources through the Integrations dashboard (/integrations). Documents ingested from per-user watch folder integrations are automatically attributed to the owning user's owner_id.

Per-user watch folder integrations are stored in the user_integrations table with integration_type='WATCH_FOLDER' and direction='SOURCE'. The config JSON field stores: - source_type — the type of source to scan (local, s3, dropbox, google_drive, onedrive, nextcloud, webdav; default: local) - folder_path — path to the directory/folder to scan (used by local, Dropbox, OneDrive, Nextcloud, WebDAV) - delete_after_process — whether to remove source files after ingestion (default: false)

Additional type-specific config fields: - S3: bucket, region, prefix, endpoint_url - Google Drive: folder_id - Nextcloud / WebDAV: url, folder_path

Security: Path traversal protection is enforced on local watch folder paths. Relative paths, .. components, and symlink escapes are rejected. Cloud source types use per-user encrypted credentials instead.

  • Individual scan failures are handled gracefully and recorded on the integration's last_error field without interrupting the scanning of other integrations.
  • The scan runs alongside the system-level watch folder polling cycle.

IMAP Email Ingestion

DocuElevate can automatically pull document attachments from IMAP mailboxes — no need to forward emails manually. Configure one or two system-wide mailboxes using environment variables, and/or let each user configure their own IMAP sources via the Integrations dashboard.

For HP Scanners (Scan to Email): If your scanner is set up to email scanned documents to a dedicated mailbox, configure that mailbox in DocuElevate using the settings below. DocuElevate will automatically retrieve the scanned PDFs from the inbox and process them. You do not need to configure DocuElevate as an email server — it acts as an email client that reads from your existing mailbox.

System-Level IMAP Configuration

Variable Description Example
IMAP1_HOST Hostname for first IMAP server. mail.example.com
IMAP1_PORT Port number (usually 993). 993
IMAP1_USERNAME IMAP login (first mailbox). user@example.com
IMAP1_PASSWORD IMAP password (first mailbox). *******
IMAP1_SSL Use SSL (true/false). true
IMAP1_POLL_INTERVAL_MINUTES Frequency in minutes to poll for new mail. 5
IMAP_READONLY_MODE When true, fetches and processes attachments but does not modify the mailbox (no starring, labeling, deleting, or flag changes). Use for pre-production instances sharing a mailbox with production. Default: false. false
IMAP_ATTACHMENT_FILTER System-wide fallback for which attachment types are ingested when no ingestion profile is assigned to a mailbox. documents_only (default) ingests PDFs and office files only — images are skipped. all ingests every supported file type including images. Individual IMAP accounts can override this using ingestion profiles. documents_only

IMAP Ingestion Profiles

For fine-grained control, DocuElevate supports Ingestion Profiles — named configurations that let you choose exactly which file-type categories to accept from each mailbox.

Each profile contains a list of enabled categories:

Category Description
pdf PDF documents (.pdf)
office Microsoft Office files (Word, Excel, PowerPoint — .docx, .xlsx, .pptx, …)
opendocument LibreOffice/OpenOffice files (.odt, .ods, .odp, …)
text Plain text, CSV and RTF files (.txt, .csv, .rtf)
web HTML and Markdown files (.html, .htm, .md, .markdown)
images Image files (.jpg, .png, .gif, .bmp, .tiff, .webp, .svg)

Two built-in system profiles are seeded automatically:

Profile Categories
Documents Only pdf, office, opendocument, text, web (no images)
All Files All categories, including images

Users can create their own custom profiles via the Email Ingestion dashboard (/imap-accounts) by clicking the Manage profiles link or the + button next to the profile dropdown. Custom profiles are private to the creating user and can be freely edited or deleted.

API endpoints for ingestion profiles:

Method Endpoint Description
GET /api/imap-profiles/ List all visible profiles (system + user's own)
POST /api/imap-profiles/ Create a new profile
GET /api/imap-profiles/categories List available file-type categories
GET /api/imap-profiles/{id} Get a single profile
PUT /api/imap-profiles/{id} Update a profile (not built-in)
DELETE /api/imap-profiles/{id} Delete a profile (not built-in)

Per-User IMAP Integrations

In addition to system-level mailboxes, each user can configure personal IMAP sources through the Integrations dashboard (/integrations). Documents ingested from per-user IMAP integrations are automatically attributed to the owning user's owner_id.

Per-user IMAP integrations are stored in the user_integrations table with integration_type='IMAP' and direction='SOURCE'. The config JSON field stores: - host — IMAP server hostname (required) - port — IMAP server port (default: 993) - username — IMAP login username (required) - use_ssl — whether to use SSL/TLS (default: true) - delete_after_process — whether to delete emails from the mailbox after processing (default: false) - gmail_apply_labels — whether to apply Gmail-specific labels and stars to processed emails (default: true). When enabled, processed emails are starred and tagged with an "Ingested" label. Only applies to Gmail hosts.

Credentials are encrypted at rest using Fernet encryption.

  • Individual connection failures are handled gracefully and recorded on the integration's last_error field without interrupting the polling of other integrations.
  • The polling loop runs every minute and processes all active IMAP sources (system-level and per-user) in sequence.

Authentication

Variable Description
AUTH_ENABLED Enable or disable authentication (true/false).
SESSION_SECRET Secret key used to encrypt sessions and cookies (at least 32 chars).
SESSION_LIFETIME_DAYS Number of days before a server-side session expires. Default: 30.
SESSION_LIFETIME_CUSTOM_DAYS Override for SESSION_LIFETIME_DAYS when set.
QR_LOGIN_CHALLENGE_TTL_SECONDS How long a QR login challenge is valid (seconds). Default: 120.
ADMIN_USERNAME Username for basic authentication (when not using OIDC).
ADMIN_PASSWORD Password for basic authentication (when not using OIDC).
ADMIN_GROUP_NAME Group name in OIDC claims that grants admin access. Default: admin.
AUTHENTIK_CLIENT_ID Client ID for Authentik OAuth2/OIDC authentication.
AUTHENTIK_CLIENT_SECRET Client secret for Authentik OAuth2/OIDC authentication.
AUTHENTIK_CONFIG_URL Configuration URL for Authentik OpenID Connect.
OAUTH_PROVIDER_NAME Display name for the OAuth provider button.

Social Login Providers

Social login lets users sign in with their existing Google, Microsoft, Apple, Dropbox, or GitHub accounts. Each provider is independently enabled and configured. For detailed setup instructions see the Social Login Setup Guide.

Variable Description Default
SOCIAL_AUTH_GOOGLE_ENABLED Enable Google Sign-In. false
SOCIAL_AUTH_GOOGLE_CLIENT_ID Google OAuth2 client ID from the Google Cloud Console. (empty)
SOCIAL_AUTH_GOOGLE_CLIENT_SECRET Google OAuth2 client secret. (empty)
SOCIAL_AUTH_MICROSOFT_ENABLED Enable Microsoft Sign-In (Azure AD / Microsoft Entra ID). false
SOCIAL_AUTH_MICROSOFT_CLIENT_ID Microsoft application (client) ID from Azure App Registrations. (empty)
SOCIAL_AUTH_MICROSOFT_CLIENT_SECRET Microsoft client secret. (empty)
SOCIAL_AUTH_MICROSOFT_TENANT Azure AD tenant: common, organizations, consumers, or a tenant GUID. common
SOCIAL_AUTH_APPLE_ENABLED Enable Sign in with Apple. false
SOCIAL_AUTH_APPLE_CLIENT_ID Apple Services ID (e.g. com.example.docuelevate). (empty)
SOCIAL_AUTH_APPLE_TEAM_ID Apple Developer Team ID. (empty)
SOCIAL_AUTH_APPLE_KEY_ID Apple Sign-In private key ID. (empty)
SOCIAL_AUTH_APPLE_PRIVATE_KEY Apple Sign-In private key (PEM format). (empty)
SOCIAL_AUTH_DROPBOX_ENABLED Enable Dropbox Sign-In. false
SOCIAL_AUTH_DROPBOX_CLIENT_ID Dropbox OAuth2 App Key. (empty)
SOCIAL_AUTH_DROPBOX_CLIENT_SECRET Dropbox OAuth2 App Secret. (empty)
SOCIAL_AUTH_GITHUB_ENABLED Enable GitHub Sign-In. false
SOCIAL_AUTH_GITHUB_CLIENT_ID GitHub OAuth2 client ID from GitHub Developer Settings. (empty)
SOCIAL_AUTH_GITHUB_CLIENT_SECRET GitHub OAuth2 client secret. (empty)
SSO_AUTO_LOGIN Automatically redirect to SSO login when authentication is required. false

SSO Providers

Variable Description Default
SOCIAL_AUTH_KEYCLOAK_ENABLED Enable Keycloak SSO. false
SOCIAL_AUTH_KEYCLOAK_CLIENT_ID Keycloak OAuth2 client ID. (empty)
SOCIAL_AUTH_KEYCLOAK_CLIENT_SECRET Keycloak OAuth2 client secret. (empty)
SOCIAL_AUTH_KEYCLOAK_SERVER_URL Keycloak server base URL (e.g. https://keycloak.example.com). (empty)
SOCIAL_AUTH_KEYCLOAK_REALM Keycloak realm name. (empty)
SOCIAL_AUTH_GENERIC_OAUTH2_ENABLED Enable a generic OAuth2 SSO provider. false
SOCIAL_AUTH_GENERIC_OAUTH2_CLIENT_ID Generic OAuth2 client ID. (empty)
SOCIAL_AUTH_GENERIC_OAUTH2_CLIENT_SECRET Generic OAuth2 client secret. (empty)
SOCIAL_AUTH_GENERIC_OAUTH2_AUTHORIZE_URL Generic OAuth2 authorization URL. (empty)
SOCIAL_AUTH_GENERIC_OAUTH2_TOKEN_URL Generic OAuth2 token endpoint URL. (empty)
SOCIAL_AUTH_GENERIC_OAUTH2_USERINFO_URL Generic OAuth2 userinfo endpoint URL. (empty)
SOCIAL_AUTH_GENERIC_OAUTH2_SCOPE Space-separated list of OAuth2 scopes. openid profile email
SOCIAL_AUTH_GENERIC_OAUTH2_NAME Display name for the provider button. OAuth2
SOCIAL_AUTH_SAML2_ENABLED Enable SAML2 SSO authentication. false
SOCIAL_AUTH_SAML2_ENTITY_ID SAML2 Identity Provider Entity ID. (empty)
SOCIAL_AUTH_SAML2_SSO_URL SAML2 Identity Provider SSO URL. (empty)
SOCIAL_AUTH_SAML2_CERTIFICATE SAML2 Identity Provider X.509 certificate (PEM format). (empty)
SOCIAL_AUTH_SAML2_NAME Display name for the SAML2 provider. SAML2

Multi-User Mode

When multi-user mode is enabled, each authenticated user gets their own isolated document space. Uploads, search results, and file management are scoped to the individual user. Shared settings (AI configuration, OCR providers, storage destinations) remain global.

Admin users (determined by ADMIN_GROUP_NAME) bypass the user filter and can see all documents.

Requires AUTH_ENABLED=true.

Variable Description Default
MULTI_USER_ENABLED Enable multi-user mode with individual document spaces per user. false
DEFAULT_DAILY_UPLOAD_LIMIT Maximum document uploads allowed per user per day. 0 = unlimited. 0
UNOWNED_DOCS_VISIBLE_TO_ALL Show unclaimed documents (no owner) to all users. When false, only admins see them. true
DEFAULT_OWNER_ID Automatically assign this owner to newly ingested documents without a session (e.g. IMAP, API). Leave empty to keep unowned. (empty)

Unclaimed Documents

Documents ingested via system-level sources (environment variable IMAP mailboxes, system watch folders) without a user session have owner_id = NULL unless DEFAULT_OWNER_ID is set. These are called unclaimed documents.

Documents ingested via per-user integrations (IMAP or Watch Folder integrations configured through the Integrations dashboard) are automatically attributed to the owning user's owner_id and are never unclaimed.

  • When UNOWNED_DOCS_VISIBLE_TO_ALL=true (default), every authenticated user sees unclaimed documents alongside their own files. This allows users to discover and claim them.
  • When UNOWNED_DOCS_VISIBLE_TO_ALL=false, only admins can see unclaimed documents.

Claiming Documents

Users can claim unclaimed documents via the API:

  • POST /api/files/{file_id}/claim — Claim a single unclaimed document.
  • POST /api/files/bulk-claim — Claim multiple unclaimed documents at once.

Only documents with owner_id = NULL can be claimed. Already-owned documents cannot be claimed by another user.

Admin Owner Assignment

Admins can assign ownership of documents to any user:

  • POST /api/files/assign-owner?owner_id=<user_id> — Assign all unclaimed documents to the specified user, or pass a file_ids JSON body to assign specific files.

The DEFAULT_OWNER_ID setting can also be configured via the Settings page, which provides an autocomplete field that searches existing users by substring.

Subscriptions & Upload Quotas

DocuElevate supports configurable subscription plans with per-user upload quotas enforced at upload time. Plans are managed via the Plan Designer at /admin/plans. The following global setting controls the default overage buffer applied across all plans.

Variable Description Default
SUBSCRIPTION_OVERAGE_PERCENT Soft-limit overage buffer in percent (0–200). The announced monthly quota is multiplied by (1 + percent/100) for actual enforcement. E.g. 20 means a 150-doc/month plan enforces at 180 docs (150 × 1.20). Set 0 to enforce exactly at the announced limit. Per-plan overage_percent configured in the Plan Designer overrides this global default. 20

Security Headers

DocuElevate supports HTTP security headers to improve browser-side security. These headers are disabled by default since most deployments use a reverse proxy (Traefik, Nginx, etc.) that already adds them. Enable only if deploying directly without a reverse proxy. See Deployment Guide - Security Headers for detailed configuration examples.

Application Logging

DocuElevate uses Python's standard logging module. Two environment variables control log verbosity:

Variable Description Default
LOG_LEVEL Root logger level. Accepts standard Python level names: DEBUG, INFO, WARNING, ERROR, CRITICAL. INFO
DEBUG Enable debug mode. When true and LOG_LEVEL is not explicitly set, the effective log level is automatically lowered to DEBUG. false

Precedence rules (standard behaviour):

  1. If LOG_LEVEL is explicitly set, it always wins — regardless of DEBUG.
  2. If only DEBUG=true is set (no LOG_LEVEL), the effective level becomes DEBUG.
  3. If neither is set, the default level is INFO.
# Typical production (default)
# LOG_LEVEL=INFO

# Quick debug mode — sets level to DEBUG automatically
DEBUG=true

# Explicit level override (DEBUG flag is ignored for level selection)
LOG_LEVEL=WARNING

Tip: At DEBUG level, noisy third-party libraries (httpx, authlib, urllib3, etc.) are automatically pinned to WARNING so that application debug output remains readable.

Structured JSON Logging

Set LOG_FORMAT=json to emit structured JSON lines on stdout — one JSON object per log message. This is the standard format for log collectors and SIEM tools:

Variable Description Default
LOG_FORMAT Log output format: text (human-readable) or json (structured JSON lines). text

Each JSON log line contains: timestamp (ISO 8601), level, logger, message, module, funcName, lineno, and exc_info (when an exception is logged).

# Enable JSON logging for SIEM / log aggregation
LOG_FORMAT=json

Example JSON output:

{"timestamp": "2025-03-16T09:18:05.192000+00:00", "level": "INFO", "logger": "app.auth", "message": "[SECURITY] OAUTH_LOGIN_SUCCESS user=alice@example.com admin=False", "module": "auth", "funcName": "oauth_callback", "lineno": 654}

Compatible with: - Grafana Loki — Promtail scrapes JSON from Docker stdout - Splunk — Universal Forwarder or HEC with JSON sourcetype - ELK / OpenSearch — Filebeat with JSON codec - Datadog — Agent auto-parses JSON logs - Fluentd / Vector — JSON input plugin - Docker log drivers--log-driver=json-file (default) preserves structure

Syslog Forwarding (Application Logs)

For traditional (non-container) deployments, application logs can be forwarded directly to a syslog receiver. This is separate from audit-log SIEM forwarding (see below) — it sends every Python log message, not just audit events.

Variable Description Default
LOG_SYSLOG_ENABLED Forward application logs to a syslog receiver in addition to stdout. false
LOG_SYSLOG_HOST Hostname or IP of the syslog receiver. localhost
LOG_SYSLOG_PORT Port of the syslog receiver. 514
LOG_SYSLOG_PROTOCOL Protocol: udp or tcp. udp
# Forward all application logs to syslog
LOG_SYSLOG_ENABLED=true
LOG_SYSLOG_HOST=syslog.internal.example.com
LOG_SYSLOG_PORT=514
LOG_SYSLOG_PROTOCOL=udp

# Combine with JSON format for structured syslog messages
LOG_FORMAT=json
LOG_SYSLOG_ENABLED=true

Note: When LOG_FORMAT=json, syslog messages are also sent as JSON. When LOG_FORMAT=text, syslog messages use the standard name - level - message format.

Audit Logging

DocuElevate provides comprehensive audit logging that records significant actions (logins, document CRUD, settings changes) to an append-only database table. Every entry captures the timestamp, user, action, resource, client IP, and optional JSON details.

Variable Description Default
AUDIT_LOGGING_ENABLED Enable the HTTP request audit-logging middleware. true
AUDIT_LOG_INCLUDE_CLIENT_IP Include the client IP address in audit log entries. Disable for GDPR-sensitive deployments. true

SIEM Integration

Audit events can be forwarded in real time to external SIEM systems for centralised monitoring, alerting, and long-term retention. Two transports are supported:

  • Syslog – RFC 5424 structured-data messages over UDP or TCP. Works with rsyslog, syslog-ng, Graylog, Datadog, etc.
  • HTTP – JSON POST payloads compatible with Splunk HEC, Logstash HTTP input, Grafana Loki push API, and any generic webhook.
Variable Description Default
AUDIT_SIEM_ENABLED Enable forwarding of audit events to an external SIEM system. false
AUDIT_SIEM_TRANSPORT Transport: syslog or http. syslog
AUDIT_SIEM_SYSLOG_HOST Hostname or IP of the syslog receiver. localhost
AUDIT_SIEM_SYSLOG_PORT Port of the syslog receiver. 514
AUDIT_SIEM_SYSLOG_PROTOCOL Protocol for syslog: udp or tcp. udp
AUDIT_SIEM_HTTP_URL HTTP endpoint URL for SIEM delivery (e.g. Splunk HEC, Logstash, Loki). (empty)
AUDIT_SIEM_HTTP_TOKEN Bearer / HEC token for the SIEM HTTP endpoint. (empty)
AUDIT_SIEM_HTTP_CUSTOM_HEADERS Comma-separated Key:Value extra headers for SIEM HTTP requests. (empty)

Example – Syslog to rsyslog:

AUDIT_SIEM_ENABLED=true
AUDIT_SIEM_TRANSPORT=syslog
AUDIT_SIEM_SYSLOG_HOST=syslog.internal.example.com
AUDIT_SIEM_SYSLOG_PORT=514
AUDIT_SIEM_SYSLOG_PROTOCOL=udp

Example – Splunk HEC:

AUDIT_SIEM_ENABLED=true
AUDIT_SIEM_TRANSPORT=http
AUDIT_SIEM_HTTP_URL=https://splunk.example.com:8088/services/collector/event
AUDIT_SIEM_HTTP_TOKEN=your-hec-token

Example – Logstash HTTP input:

AUDIT_SIEM_ENABLED=true
AUDIT_SIEM_TRANSPORT=http
AUDIT_SIEM_HTTP_URL=https://logstash.example.com:8080
AUDIT_SIEM_HTTP_TOKEN=

Rate Limiting

DocuElevate implements rate limiting to protect against DoS attacks and API abuse. Rate limiting is enabled by default and uses Redis for distributed rate limiting across multiple workers.

Master Control

Variable Description Default
RATE_LIMITING_ENABLED Enable/disable rate limiting middleware. Recommended for production. true

Rate Limit Configuration

Rate limits are specified in the format count/period, where: - count is the maximum number of requests allowed - period is one of: second, minute, hour, day

Variable Description Default Applies To
RATE_LIMIT_DEFAULT Default rate limit for all API endpoints 100/minute Most API endpoints
RATE_LIMIT_UPLOAD Rate limit for file upload endpoints (prevents resource exhaustion) 600/minute /api/ui-upload and similar
RATE_LIMIT_AUTH Stricter rate limit for authentication (prevents brute force) 10/minute Login, authentication endpoints

Note: Processing endpoints (OCR, metadata extraction) use built-in queue throttling via Celery to control processing rates and prevent upstream API overloads. No additional API-level rate limit is configured for processing endpoints.

How Rate Limiting Works

  1. Per-User Tracking: For authenticated requests, limits are enforced per user ID
  2. Per-IP Tracking: For unauthenticated requests, limits are enforced per IP address
  3. 429 Response: When limit is exceeded, API returns 429 Too Many Requests with Retry-After header
  4. Redis Backend: Uses Redis for distributed rate limiting (required for multi-worker deployments)
  5. In-Memory Fallback: Falls back to in-memory storage if Redis is unavailable (not recommended for production)

Configuration Example

# Enable rate limiting (recommended for production)
RATE_LIMITING_ENABLED=true

# Configure Redis for distributed rate limiting
REDIS_URL=redis://redis:6379/0

# Customize rate limits
RATE_LIMIT_DEFAULT=100/minute     # 100 requests per minute per user/IP
RATE_LIMIT_UPLOAD=600/minute      # 600 uploads per minute
RATE_LIMIT_AUTH=10/minute         # 10 auth attempts per minute (brute force protection)

Small Deployment (1-10 users):

RATE_LIMIT_DEFAULT=200/minute
RATE_LIMIT_UPLOAD=1200/minute
RATE_LIMIT_AUTH=20/minute

Medium Deployment (10-100 users):

RATE_LIMIT_DEFAULT=100/minute
RATE_LIMIT_UPLOAD=600/minute
RATE_LIMIT_AUTH=10/minute

Large Deployment (100+ users):

RATE_LIMIT_DEFAULT=50/minute
RATE_LIMIT_UPLOAD=300/minute
RATE_LIMIT_AUTH=5/minute

Disabling Rate Limiting (Development Only)

For development or testing, you can disable rate limiting:

RATE_LIMITING_ENABLED=false

Warning: Do not disable rate limiting in production environments.

Monitoring Rate Limits

When rate limits are exceeded, check application logs for details:

2024-02-10 16:00:00 - Rate limiting by user: testuser
2024-02-10 16:00:01 - Rate limit exceeded: 100 per 1 minute

For more information on handling rate-limited responses in API clients, see API Documentation - Rate Limiting.


Security Headers Configuration

DocuElevate supports HTTP security headers to improve browser-side security. These headers are disabled by default since most deployments use a reverse proxy (Traefik, Nginx, etc.) that already adds them. Enable only if deploying directly without a reverse proxy. See Deployment Guide - Security Headers for detailed configuration examples.

Master Control

Variable Description Default
SECURITY_HEADERS_ENABLED Enable/disable security headers middleware. Set to true if deploying without reverse proxy. false

Strict-Transport-Security (HSTS)

Forces browsers to use HTTPS for all future requests to this domain. Only effective over HTTPS.

Variable Description Default
SECURITY_HEADER_HSTS_ENABLED Enable HSTS header. true
SECURITY_HEADER_HSTS_VALUE HSTS header value (max-age in seconds, subdomain support). max-age=31536000; includeSubDomains

Common Values: - max-age=31536000; includeSubDomains (1 year, recommended for production) - max-age=300 (5 minutes, for testing) - max-age=63072000; includeSubDomains; preload (2 years with HSTS preload)

Content-Security-Policy (CSP)

Controls which resources browsers are allowed to load. Helps prevent XSS attacks and code injection.

Variable Description Default
SECURITY_HEADER_CSP_ENABLED Enable CSP header. true
SECURITY_HEADER_CSP_VALUE CSP policy directives. See below

Default Policy:

default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; img-src 'self' data: https:; font-src 'self' data:;

Common Customizations:

# Stricter CSP (remove 'unsafe-inline', use nonces)
SECURITY_HEADER_CSP_VALUE="default-src 'self'; script-src 'self'; style-src 'self';"

# Allow specific external domains
SECURITY_HEADER_CSP_VALUE="default-src 'self'; script-src 'self' https://cdn.example.com; style-src 'self' 'unsafe-inline';"

Note: The default policy includes 'unsafe-inline' for compatibility with inline JavaScript. Tailwind CSS v3 is compiled at build time into a static file served from 'self', so no external style CDN is needed.

X-Frame-Options

Prevents the page from being loaded in frames/iframes. Protects against clickjacking attacks.

Variable Description Default
SECURITY_HEADER_X_FRAME_OPTIONS_ENABLED Enable X-Frame-Options header. true
SECURITY_HEADER_X_FRAME_OPTIONS_VALUE X-Frame-Options header value. DENY

Valid Values: - DENY - Page cannot be displayed in a frame (most secure) - SAMEORIGIN - Page can only be displayed in a frame on the same origin - ~~ALLOW-FROM uri~~ - Deprecated: Page can only be displayed in a frame on the specified origin. This directive is deprecated in modern browsers; use CSP frame-ancestors directive instead.

X-Content-Type-Options

Prevents browsers from MIME-sniffing responses away from the declared content-type. Helps prevent XSS attacks.

Variable Description Default
SECURITY_HEADER_X_CONTENT_TYPE_OPTIONS_ENABLED Enable X-Content-Type-Options header. true

Note: This header is always set to nosniff when enabled (no configuration needed).

Configuration Examples

Reverse Proxy Deployment (Default - Traefik, Nginx):

# Headers disabled by default - reverse proxy handles them
# SECURITY_HEADERS_ENABLED=false  # Can be omitted

Direct Deployment (No Reverse Proxy):

# Enable all security headers
SECURITY_HEADERS_ENABLED=true
SECURITY_HEADER_HSTS_ENABLED=true
SECURITY_HEADER_CSP_ENABLED=true
SECURITY_HEADER_X_FRAME_OPTIONS_ENABLED=true
SECURITY_HEADER_X_CONTENT_TYPE_OPTIONS_ENABLED=true

Custom Configuration:

# Enable headers but customize values
SECURITY_HEADERS_ENABLED=true
SECURITY_HEADER_HSTS_VALUE="max-age=300"  # 5 minutes for testing
SECURITY_HEADER_X_FRAME_OPTIONS_VALUE="SAMEORIGIN"  # Allow same-origin framing
SECURITY_HEADER_CSP_VALUE="default-src 'self'; script-src 'self' https://trusted-cdn.com;"

See Also: - Deployment Guide - Security Headers for Traefik/Nginx examples - SECURITY_AUDIT.md for security rationale

AI Provider & Model Selection

DocuElevate supports multiple AI providers for metadata extraction and OCR text refinement. Select the provider via AI_PROVIDER and configure the matching credentials below.

Variable Description Default
AI_PROVIDER Active AI provider. See supported values below. openai
AI_MODEL Model name for the selected provider. Falls back to OPENAI_MODEL when not set. (unset)
OPENAI_MODEL Default model name (used when AI_MODEL is not set). gpt-4o-mini

Supported AI_PROVIDER values: openai, azure, anthropic, gemini, ollama, openrouter, portkey, litellm


OpenAI (default)

Variable Description Default
OPENAI_API_KEY OpenAI API key. (required)
OPENAI_BASE_URL API base URL. Change for compatible proxies. https://api.openai.com/v1
AI_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o-mini

Azure OpenAI

Variable Description Default
OPENAI_API_KEY Azure OpenAI API key. (required)
OPENAI_BASE_URL Azure resource endpoint URL. (required)
AZURE_OPENAI_API_VERSION Azure OpenAI API version string. 2024-02-01
AI_PROVIDER=azure
OPENAI_API_KEY=<azure-key>
OPENAI_BASE_URL=https://my-resource.openai.azure.com
AI_MODEL=gpt-4o   # deployment name in Azure

Anthropic Claude

Variable Description
ANTHROPIC_API_KEY Anthropic API key.
AI_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
AI_MODEL=claude-3-5-sonnet-20241022

Google Gemini

Variable Description
GEMINI_API_KEY Google AI Studio API key.
AI_PROVIDER=gemini
GEMINI_API_KEY=AIza...
AI_MODEL=gemini-1.5-pro

Ollama (local LLMs – CPU-friendly)

Run models locally using Ollama. Recommended for CPU-only deployments:

Variable Description Default
OLLAMA_BASE_URL Ollama server URL. http://localhost:11434
AI_PROVIDER=ollama
OLLAMA_BASE_URL=http://ollama:11434   # Docker service name
AI_MODEL=llama3.2                     # or qwen2.5, phi3, etc.

Recommended models for document processing on CPU:

  • llama3.2 (3B) – good balance of speed and JSON output quality
  • qwen2.5 (3B/7B) – excellent at structured extraction
  • phi3 (3.8B) – strong reasoning, very fast on CPU

OpenRouter

OpenRouter provides access to 100+ models from a single endpoint using the provider/model name format.

Variable Description Default
OPENROUTER_API_KEY OpenRouter API key. (required)
OPENROUTER_BASE_URL Override the gateway URL. https://openrouter.ai/api/v1
AI_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-...
AI_MODEL=anthropic/claude-3.5-sonnet

Portkey AI Gateway

Portkey is an AI gateway that adds observability, caching, fallbacks, and load balancing across 200+ models behind a single OpenAI-compatible endpoint.

Variable Description Default
PORTKEY_API_KEY Portkey account API key. (required)
PORTKEY_VIRTUAL_KEY Optional Virtual Key (stores provider credentials in Portkey vault, keeping them out of your env file). (unset)
PORTKEY_CONFIG Optional saved Config ID (e.g. pc-fallback-abc123) for routing rules, fallbacks, and load balancing. (unset)
PORTKEY_BASE_URL Override the Portkey gateway URL (for self-hosted deployments). https://api.portkey.ai/v1
AI_PROVIDER=portkey
PORTKEY_API_KEY=pk-...
PORTKEY_VIRTUAL_KEY=vk-openai-abc123   # optional – routes to your OpenAI key stored in Portkey
AI_MODEL=gpt-4o

Using a Config for fallback routing:

AI_PROVIDER=portkey
PORTKEY_API_KEY=pk-...
PORTKEY_CONFIG=pc-fallback-config-xyz  # applies your saved routing rules
AI_MODEL=gpt-4o

LiteLLM (aggregator proxy)

LiteLLM provides a unified provider/model interface for 100+ LLMs including OpenAI, Anthropic, Gemini, Cohere, Ollama, and many more.

Variable Description Default
OPENAI_API_KEY API key forwarded to LiteLLM (provider-specific). (depends on model)
OPENAI_BASE_URL Optional proxy/gateway URL. https://api.openai.com/v1
AI_PROVIDER=litellm
AI_MODEL=anthropic/claude-3-5-sonnet-20241022
OPENAI_API_KEY=sk-ant-...   # passed as the api_key to LiteLLM

Document Translation

After processing, DocuElevate can automatically translate a document's extracted text into a configurable default language (e.g. English). This reference translation is stored alongside the original text so users always have a version in a language they understand.

Other languages are translated on the fly via the AI provider and are not persisted.

Settings

Variable Description Default
DEFAULT_DOCUMENT_LANGUAGE ISO 639-1 code for the default translation target (e.g. en, de, fr). Documents whose detected language differs are automatically translated into this language after processing. en

Each user can override this global default in their profile (UserProfile.default_document_language).

How It Works

  1. During metadata extraction the AI detects the document language (stored as detected_language on the file record).
  2. If the detected language differs from the default target language, a background Celery task (translate_to_default_language) translates the extracted text.
  3. The translated text is persisted in default_language_text and the target code in default_language_code.
  4. The file detail view shows both the original text and the default-language version.
  5. Users can also request on-the-fly translations to any language via the Translate dropdown.

API Endpoints

Endpoint Method Description
/api/files/{id}/translation/default GET Returns the persisted default-language translation (404 if unavailable)
/api/files/{id}/translate?lang=xx GET On-the-fly translation to any ISO 639-1 language code
/files/{id}/text/default-language GET View endpoint returning the default-language text as JSON

Example

# Get the stored English translation of a German document
curl http://localhost:8000/api/files/42/translation/default

# Translate on the fly to French
curl "http://localhost:8000/api/files/42/translate?lang=fr"

OCR Providers

DocuElevate supports multiple OCR engines that can be used individually or in combination. Configure the list of active providers with OCR_PROVIDERS and tune each provider with the settings below.

Provider Selection

Variable Description Default
OCR_PROVIDERS Comma-separated list of OCR engines to use, e.g. azure, mistral, azure,tesseract. azure
OCR_MERGE_STRATEGY Strategy for combining results from multiple providers: ai_merge, longest, or primary. ai_merge

Supported OCR_PROVIDERS values: azure, tesseract, easyocr, mistral, google_docai, aws_textract

When multiple providers are listed, all run in parallel and their results are merged according to OCR_MERGE_STRATEGY.

Embedded Text Quality Check

DocuElevate can automatically assess whether the text already embedded in a PDF is of sufficient quality before deciding to skip OCR. This prevents poor OCR output from a previous scan being silently used for downstream processing.

Variable Description Default
ENABLE_TEXT_QUALITY_CHECK Enable AI-based quality assessment of embedded PDF text. true
TEXT_QUALITY_THRESHOLD Minimum quality score (0–100) required to accept embedded text without re-OCR. 85
TEXT_QUALITY_SIGNIFICANT_ISSUES Comma-separated issue labels that force re-OCR even when the score meets the threshold. excessive_typos,garbage_characters,incoherent_text,fragmented_sentences

How it works:

  1. When a PDF with embedded text is received, DocuElevate first examines the PDF metadata (/Producer, /Creator).
  2. If the PDF was digitally created (e.g., exported from Word, LibreOffice, LaTeX, or any modern authoring tool), the embedded text is considered trustworthy and the quality check is skipped — digital text cannot be improved by re-OCRing.
  3. If the PDF was previously OCR'd (Tesseract, ABBYY, ocrmypdf, etc.) or the origin is unknown, an AI model evaluates a sample of the extracted text for:
  4. Excessive typos and character-substitution artefacts typical of OCR
  5. Garbage characters or symbol soup
  6. Incoherent or nonsensical sentences
  7. Heavy fragmentation
  8. The text is rejected (and re-OCR triggered) when either of these conditions is true:
  9. The quality score is below TEXT_QUALITY_THRESHOLD (default 85), or
  10. The AI identifies one or more issues listed in TEXT_QUALITY_SIGNIFICANT_ISSUES — even if the numeric score is above the threshold. This prevents edge cases such as a score of 68 with excessive_typos and garbage_characters being silently accepted.
  11. After the re-OCR pass, the fresh OCR result is compared head-to-head against the original embedded text using an AI side-by-side review. The higher-quality text is passed to downstream processing (metadata extraction, AI analysis). This ensures re-OCR never degrades quality.
  12. All quality decisions (score, source, AI feedback, comparison outcome) are recorded in the processing log for review.

Tip: Set ENABLE_TEXT_QUALITY_CHECK=false to disable the check entirely and always use embedded text as-is. This is useful when the AI provider is unavailable or when processing speed is more important than text accuracy.

Tuning the threshold: The default of TEXT_QUALITY_THRESHOLD=85 is intentionally strict. Lower it (e.g., 70) for environments with consistently good existing OCR. Raise it (up to 100) for maximum quality enforcement.

Searchable PDF Text Layer

Not all OCR providers embed a searchable text layer in the output PDF. The table below summarises each provider's behaviour and how DocuElevate handles it:

Provider Embeds text layer? Notes
azure ✅ Yes Azure Document Intelligence returns a PDF/A with an embedded text layer.
tesseract ❌ No (text only) Text is extracted but the PDF is not modified. embed_text_layer post-processing is applied automatically.
easyocr ❌ No (text only) Same as above.
mistral ❌ No (text only) Mistral OCR API returns plain text; embed_text_layer post-processing is applied automatically.
google_docai ❌ No (text only) Google Cloud Document AI returns plain text; embed_text_layer post-processing is applied automatically.
aws_textract ❌ No (text only) AWS Textract returns plain text; embed_text_layer post-processing is applied automatically.

For providers that do not embed a text layer, DocuElevate automatically runs ocrmypdf --skip-text after OCR to add an invisible Tesseract-generated text layer to the PDF. This makes the file selectable and searchable in PDF viewers. The step is silently skipped if ocrmypdf is not available on PATH (a warning is logged).

Azure Document Intelligence

Variable Description How to Obtain
AZURE_DOCUMENT_INTELLIGENCE_KEY Azure Document Intelligence API key for OCR. Azure Portal
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT Endpoint URL for Azure Document Intelligence API. Azure Portal

Tesseract (self-hosted)

Requires tesseract-ocr to be installed in the Docker image or on the host. The default Docker image ships with Tesseract (English language data only).

Automatic language data download: DocuElevate automatically downloads missing Tesseract .traineddata files at startup using wget from the tessdata_fast repository. No manual installation is required — simply set TESSERACT_LANGUAGE to the desired language codes and the data files are fetched on first start. The container must have outbound internet access for this to work.

Variable Description Default
TESSERACT_CMD Path to the tesseract binary (optional; auto-detected from PATH). (auto)
TESSERACT_LANGUAGE Tesseract language code(s), e.g. eng, eng+deu, deu. eng+deu
OCR_PROVIDERS=tesseract
TESSERACT_LANGUAGE=eng+deu

Language codes: Use ISO 639-2 codes separated by +, e.g. eng+deu+fra for English + German + French. All codes supported by Tesseract are available. See the tessdata repository for the full list.

No internet access? Set TESSDATA_PREFIX to a writable directory and pre-populate it with the required .traineddata files. Alternatively, build a custom Docker image that installs the needed language packages via apt-get install tesseract-ocr-<lang>.

EasyOCR (self-hosted)

Requires the easyocr Python package. Install it separately as it is not included in the base requirements.

Automatic model download: EasyOCR model files are downloaded automatically on first use (or at startup) to ~/.EasyOCR/model/. The container must have outbound internet access. Model download can take several minutes depending on the language.

Variable Description Default
EASYOCR_LANGUAGES Comma-separated EasyOCR language codes, e.g. en,de,fr. en,de
EASYOCR_GPU Enable GPU acceleration for EasyOCR (true/false). false

Mistral OCR

Variable Description How to Obtain
MISTRAL_API_KEY Mistral API key. console.mistral.ai
MISTRAL_OCR_MODEL Mistral OCR model name. mistral-ocr-latest

Google Cloud Document AI

Variable Description Default
GOOGLE_DOCAI_PROJECT_ID GCP project ID (required). (required)
GOOGLE_DOCAI_PROCESSOR_ID Document AI processor ID (required). (required)
GOOGLE_DOCAI_LOCATION Processor location, e.g. us or eu. us
GOOGLE_DOCAI_CREDENTIALS_JSON Service account JSON (optional; falls back to GOOGLE_DRIVE_CREDENTIALS_JSON). (optional)

AWS Textract

Reuses the AWS credentials already configured for S3 integration.

Variable Description
AWS_ACCESS_KEY_ID AWS access key ID.
AWS_SECRET_ACCESS_KEY AWS secret access key.
AWS_REGION AWS region, e.g. us-east-1.

Multi-Provider Example

# Use both Azure (for accuracy) and Tesseract (for redundancy); merge via AI
OCR_PROVIDERS=azure,tesseract
OCR_MERGE_STRATEGY=ai_merge
AZURE_AI_KEY=...
AZURE_ENDPOINT=https://...
TESSERACT_LANGUAGE=eng+deu

Azure Document Intelligence (Legacy)

Note: This section documents the standalone Azure Document Intelligence credentials. When using OCR_PROVIDERS=azure these same credentials are used automatically.

Variable Description How to Obtain
AZURE_DOCUMENT_INTELLIGENCE_KEY Azure Document Intelligence API key for OCR. Azure Portal
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT Endpoint URL for Azure Doc Intelligence API. Azure Portal

Paperless NGX

Variable Description
PAPERLESS_ENABLED Set to false to disable Paperless-ngx uploads without removing credentials. Default: true
PAPERLESS_NGX_API_TOKEN API token for Paperless NGX.
PAPERLESS_HOST Root URL for Paperless NGX (e.g. https://paperless.example.com).
PAPERLESS_CUSTOM_FIELD_ABSENDER (Optional, Legacy) Name of the custom field in Paperless-ngx to store the sender ("absender") information. If set, the extracted sender will be automatically set as a custom field after document upload. Example: Absender or Sender
PAPERLESS_CUSTOM_FIELDS_MAPPING (Optional, Recommended) JSON mapping of extracted metadata fields to Paperless custom field names. This allows you to map multiple fields at once. Format: {"metadata_field": "CustomFieldName", ...}. See examples below.

Custom Fields Mapping Examples

Single Field (Legacy Method):

PAPERLESS_CUSTOM_FIELD_ABSENDER=Absender

Multiple Fields (Recommended Method):

# Map multiple metadata fields to custom fields in Paperless
PAPERLESS_CUSTOM_FIELDS_MAPPING='{"absender": "Sender", "empfaenger": "Recipient", "language": "Language"}'

All Available Metadata Fields: DocuElevate extracts the following fields that can be mapped to Paperless custom fields: - absender - Sender/author of the document - empfaenger - Recipient of the document - correspondent - The issuing entity/company (shortened name) - document_type - Type classification (Invoice, Contract, etc.) - language - Document language (ISO 639-1 code, e.g., "de", "en") - kommunikationsart - Communication type (German classification) - kommunikationskategorie - Communication category (German classification) - reference_number - Invoice/order/reference number if found - title - Human-readable document title - tags - List of thematic keywords (array)

Complete Example:

PAPERLESS_CUSTOM_FIELDS_MAPPING='{"absender": "Sender", "empfaenger": "Recipient", "correspondent": "Correspondent", "language": "Language", "reference_number": "ReferenceNumber"}'

Note: Custom fields must be created in your Paperless-ngx instance before DocuElevate can use them. The field names in the mapping (right side of the JSON) must exactly match the names in Paperless (case-sensitive).

Dropbox

Variable Description
DROPBOX_ENABLED Set to false to disable Dropbox uploads without removing credentials. Default: true
DROPBOX_APP_KEY Dropbox API app key.
DROPBOX_APP_SECRET Dropbox API app secret.
DROPBOX_REFRESH_TOKEN OAuth2 refresh token for Dropbox.
DROPBOX_FOLDER Default folder path for Dropbox uploads.

For detailed setup instructions, see the Dropbox Setup Guide.

Nextcloud

Variable Description
NEXTCLOUD_ENABLED Set to false to disable Nextcloud uploads without removing credentials. Default: true
NEXTCLOUD_UPLOAD_URL Nextcloud WebDAV URL (e.g. https://nc.example.com/remote.php/dav/files/<USERNAME>).
NEXTCLOUD_USERNAME Nextcloud login username.
NEXTCLOUD_PASSWORD Nextcloud login password.
NEXTCLOUD_FOLDER Destination folder in Nextcloud (e.g. "/Documents/Uploads").

Google Drive

Variable Description
GOOGLE_DRIVE_ENABLED Set to false to disable Google Drive uploads without removing credentials. Default: true
GOOGLE_DRIVE_USE_OAUTH Set to true to use OAuth flow (recommended)
GOOGLE_DRIVE_CLIENT_ID OAuth Client ID (required if using OAuth flow)
GOOGLE_DRIVE_CLIENT_SECRET OAuth Client Secret (required if using OAuth flow)
GOOGLE_DRIVE_REFRESH_TOKEN OAuth Refresh Token (required if using OAuth flow)
GOOGLE_DRIVE_FOLDER_ID Google Drive folder ID for file uploads
GOOGLE_DRIVE_CREDENTIALS_JSON JSON string containing service account credentials (alternative method)
GOOGLE_DRIVE_DELEGATE_TO Email address to delegate permissions (optional for service accounts)

Note: For OAuth method with non-verified apps, refresh tokens expire after 7 days. For production use, either complete the Google verification process or use the Service Account method.

For detailed setup instructions, see the Google Drive Setup Guide.

WebDAV

Variable Description
WEBDAV_ENABLED Set to false to disable WebDAV uploads without removing credentials. Default: true
WEBDAV_URL WebDAV server URL (e.g. https://webdav.example.com/path).
WEBDAV_USERNAME WebDAV authentication username.
WEBDAV_PASSWORD WebDAV authentication password.
WEBDAV_FOLDER Destination folder on WebDAV server (e.g. "/Documents/Uploads").
WEBDAV_VERIFY_SSL Whether to verify SSL certificates (default: True).

FTP

Variable Description
FTP_ENABLED Set to false to disable FTP uploads without removing credentials. Default: true
FTP_HOST FTP server hostname or IP address.
FTP_PORT FTP port (default: 21).
FTP_USERNAME FTP authentication username.
FTP_PASSWORD FTP authentication password.
FTP_FOLDER Destination folder on FTP server (e.g. "/Documents/Uploads").
FTP_USE_TLS Try to use FTPS with TLS encryption first (default: True).
FTP_ALLOW_PLAINTEXT Allow fallback to plaintext FTP if TLS fails (default: True).

SFTP

Variable Description
SFTP_ENABLED Set to false to disable SFTP uploads without removing credentials. Default: true
SFTP_HOST SFTP server hostname or IP address.
SFTP_PORT SFTP port (default: 22).
SFTP_USERNAME SFTP authentication username.
SFTP_PASSWORD SFTP authentication password (if not using private key).
SFTP_FOLDER Destination folder on SFTP server.
SFTP_PRIVATE_KEY Path to private key file for authentication (optional).
SFTP_PRIVATE_KEY_PASSPHRASE Passphrase for private key if required (optional).

Email (shared SMTP – password reset & verification)

Note: These settings configure the shared SMTP connection used for system emails such as password resets and account verification. They do not enable the email delivery destination. To send processed documents via email, configure the dedicated DEST_EMAIL_* variables below.

Variable Description
EMAIL_HOST SMTP server hostname.
EMAIL_PORT SMTP port (default: 587).
EMAIL_USERNAME SMTP authentication username.
EMAIL_PASSWORD SMTP authentication password.
EMAIL_USE_TLS Whether to use TLS (default: True).
EMAIL_SENDER From address (e.g., "DocuElevate <docuelevate@example.com>").

Email Destination (document delivery)

Note: These settings are intentionally separate from the shared EMAIL_* settings above. Configuring EMAIL_HOST for password resets does not automatically activate the email delivery destination. You must set DEST_EMAIL_HOST to enable it.

Variable Description
DEST_EMAIL_ENABLED Set to false to disable email delivery without removing credentials. Default: true
DEST_EMAIL_HOST SMTP server hostname for document delivery.
DEST_EMAIL_PORT SMTP port for document delivery (default: 587).
DEST_EMAIL_USERNAME SMTP authentication username for document delivery.
DEST_EMAIL_PASSWORD SMTP authentication password for document delivery.
DEST_EMAIL_USE_TLS Whether to use TLS for document delivery (default: True).
DEST_EMAIL_SENDER From address for delivered documents (e.g., "DocuElevate Delivery <docuelevate@example.com>").
DEST_EMAIL_DEFAULT_RECIPIENT Fallback recipient email when none is specified for a delivery task.

Evernote

Variable Description
EVERNOTE_ENABLED Set to false to disable Evernote uploads without removing credentials. Default: true
EVERNOTE_AUTH_TOKEN Evernote developer token or OAuth access token used to create notes.
EVERNOTE_SANDBOX Use Evernote sandbox API endpoints. Default: false
EVERNOTE_NOTEBOOK_GUID Optional target notebook GUID. If omitted, Evernote uses the default notebook.
EVERNOTE_DEFAULT_TAGS Optional comma-separated tags applied to every created note.
EVERNOTE_INCLUDE_METADATA Include extracted metadata in the Evernote note body. Default: true

For detailed setup instructions, see the Evernote Setup Guide.

OneDrive / Microsoft Graph

Variable Description
ONEDRIVE_ENABLED Set to false to disable OneDrive uploads without removing credentials. Default: true
ONEDRIVE_CLIENT_ID Azure AD application client ID
ONEDRIVE_CLIENT_SECRET Azure AD application client secret
ONEDRIVE_TENANT_ID Azure AD tenant ID: use "common" for personal accounts or your tenant ID for corporate accounts
ONEDRIVE_REFRESH_TOKEN OAuth 2.0 refresh token (required for personal accounts)
ONEDRIVE_FOLDER_PATH Folder path in OneDrive for storing documents

For detailed setup instructions, see the OneDrive Setup Guide.

SharePoint Online

Variable Description
SHAREPOINT_CLIENT_ID Azure AD application client ID
SHAREPOINT_CLIENT_SECRET Azure AD application client secret
SHAREPOINT_TENANT_ID Azure AD tenant ID (use "common" for multi-tenant apps)
SHAREPOINT_REFRESH_TOKEN OAuth 2.0 refresh token
SHAREPOINT_SITE_URL SharePoint site URL (e.g. https://tenant.sharepoint.com/sites/sitename)
SHAREPOINT_DOCUMENT_LIBRARY Document library name (default: Documents)
SHAREPOINT_FOLDER_PATH Subfolder path inside the document library

SharePoint uses the same Microsoft Graph API as OneDrive. See the OneDrive Setup Guide for Azure AD app registration instructions — the same app registration can be reused for SharePoint with the Sites.ReadWrite.All permission.

Amazon S3

Variable Description
S3_ENABLED Set to false to disable S3 uploads without removing credentials. Default: true
AWS_ACCESS_KEY_ID AWS IAM access key ID
AWS_SECRET_ACCESS_KEY AWS IAM secret access key
AWS_REGION AWS region where your S3 bucket is located (default: us-east-1)
S3_BUCKET_NAME Name of your S3 bucket
S3_FOLDER_PREFIX Optional prefix/folder path for uploaded files
S3_STORAGE_CLASS Storage class for uploaded objects (default: STANDARD)
S3_ACL Access control for uploaded files (default: private)

For detailed setup instructions, see the Amazon S3 Setup Guide.

iCloud Drive (Apple)

Variable Description
ICLOUD_ENABLED Set to false to disable iCloud uploads without removing credentials. Default: true
ICLOUD_USERNAME Apple ID email address
ICLOUD_PASSWORD App-specific password (generate at appleid.apple.com)
ICLOUD_FOLDER Target folder path in iCloud Drive (e.g. Documents/Uploads)
ICLOUD_COOKIE_DIRECTORY Optional directory for session cookie persistence (default: ~/.pyicloud)

Note: Apple does not provide a public REST API for iCloud Drive. This integration uses the pyicloud library which relies on an unofficial, reverse-engineered protocol. Because most Apple IDs have two-factor authentication enabled, you must generate an app-specific password and use it as ICLOUD_PASSWORD.

Notification System

Variable Description
NOTIFICATION_URLS Comma-separated list of Apprise notification URLs
NOTIFY_ON_TASK_FAILURE Send notifications on task failures (True/False)
NOTIFY_ON_CREDENTIAL_FAILURE Send notifications on credential failures (True/False)
NOTIFY_ON_STARTUP Send notification when system starts (True/False)
NOTIFY_ON_SHUTDOWN Send notification when system shuts down (True/False)
NOTIFY_ON_FILE_PROCESSED Send notification when a file is successfully processed (True/False)
NOTIFY_ON_USER_SIGNUP Send admin notification when a new user signs up (True/False, default True)
NOTIFY_ON_PLAN_CHANGE Send admin notification when a user changes their subscription plan (True/False, default True)
NOTIFY_ON_PAYMENT_ISSUE Send admin notification when a payment issue is reported for a user (True/False, default True)
TELEGRAM_ENABLED Enable Telegram bot notifications.
TELEGRAM_BOT_TOKEN Telegram Bot API token from @BotFather.
TELEGRAM_CHAT_ID Telegram chat ID to send notifications to.

User-Event Notifications

DocuElevate sends admin push notifications (via Apprise) and fires outbound webhooks for three user-lifecycle events:

Event Trigger Notification type
New signup A first-time user logs in and a UserProfile is created NOTIFY_ON_USER_SIGNUP
Plan change A user selects a new subscription tier during onboarding, or an admin changes their tier NOTIFY_ON_PLAN_CHANGE
Payment issue An admin POSTs to /api/admin/users/{user_id}/payment-issue NOTIFY_ON_PAYMENT_ISSUE

In addition to the Apprise push notification, each event also fires the matching webhook event (user.signup, user.plan_changed, user.payment_issue) to all active webhook configurations subscribed to that event, enabling integration with CRM, helpdesk (Jira, Zendesk, etc.), or payment processors.

For detailed setup instructions, see the Notifications Setup Guide.

Per-User Notification System

In addition to the system-level Apprise notifications, DocuElevate includes a per-user notification system that gives each user full control over how they are notified about their own document events.

Notification Dashboard — available at /notifications for every logged-in user. It has three tabs:

Tab Description
Inbox In-app bell-icon notification feed. Persisted in the database; shows unread count badge in the navigation bar. Users can mark individual items or all items as read.
Targets User-defined notification channels: Email (SMTP) and Webhook (HTTP POST). Each target can be tested independently from the UI.
Preferences Event/channel matrix. Users choose which channels are triggered for each event type. In-app notifications are always enabled.

User-centric event types:

Event Description
document.processed A document uploaded by the user was successfully processed and uploaded to destinations
document.failed A document uploaded by the user failed during processing

Email target configuration fields:

Field Description
smtp_host SMTP server hostname
smtp_port SMTP port (default 587)
smtp_username SMTP login username
smtp_password SMTP login password (stored in database, masked in UI)
smtp_use_tls Enable STARTTLS (true/false, default true)
sender_email From address (defaults to smtp_username if omitted)
recipient_email Destination address for this target

Webhook target configuration fields:

Field Description
url HTTP(S) URL to POST the notification payload to
secret Optional secret string sent as X-DocuElevate-Secret header

Webhook payload format:

{
  "event": "document.processed",
  "title": "Document processed: invoice.pdf",
  "message": "Your document 'invoice.pdf' has been successfully processed and uploaded."
}

Note: There are no additional environment variables for the per-user notification system — all settings are stored in the database and managed through the user-facing /notifications dashboard.

Webhooks

Webhooks notify external systems via HTTP POST when document events occur. Configurations are stored in the database and managed through the API (see API docs).

Variable Description Default
WEBHOOK_ENABLED Enable or disable webhook delivery globally (True/False) True

Webhook URLs, secrets, and subscribed events are configured per-webhook via the /api/webhooks/ endpoints (admin access required). Each delivery includes an optional HMAC-SHA256 signature for verification and is retried with exponential backoff on failure.

Automation Hooks (Zapier / Make.com)

Automation hooks enable integration with external automation platforms such as Zapier and Make.com (formerly Integromat).

Variable Description Default
AUTOMATION_HOOKS_ENABLED Enable or disable Zapier / Make.com automation hook subscriptions and delivery (True/False) True

When enabled, external platforms can:

  • Subscribe to DocuElevate events via POST /api/automation/hooks/subscribe (outgoing triggers)
  • Send documents to DocuElevate via POST /api/automation/actions/upload (incoming actions)
  • Discover fields via GET /api/automation/triggers/sample/{event} (Zapier field mapping)

Automation hooks share the same event types as webhooks (document.uploaded, document.processed, document.failed, user.signup, user.plan_changed, user.payment_issue) and use a flat Zapier-compatible JSON payload format. See the API docs for endpoint details and payload examples.

Backup & Restore

DocuElevate automatically backs up the database on a scheduled basis. Backups are managed from the Admin → Backup & Restore dashboard.

Supported database backends: SQLite (.db.gz), PostgreSQL (.pgsql.gz), MySQL / MariaDB (.mysql.gz). For PostgreSQL and MySQL backups the respective CLI client (pg_dump / psql or mysqldump / mysql) must be installed on the Celery worker host. See the Database Configuration Guide for setup details.

Variable Description Default
BACKUP_ENABLED Enable or disable automatic scheduled backups (True/False). True
BACKUP_DIR Filesystem path where local backup archives are stored. Defaults to <WORKDIR>/backups. (workdir/backups)
BACKUP_REMOTE_DESTINATION Storage provider to copy backups to. Options: s3, dropbox, google_drive, onedrive, nextcloud, webdav, ftp, sftp, email. Leave empty for local-only storage. (empty)
BACKUP_REMOTE_FOLDER Sub-folder / key prefix used when uploading to the remote destination. backups
BACKUP_RETAIN_HOURLY Number of hourly snapshots to keep (1 per hour = 96 covers 4 days). 96
BACKUP_RETAIN_DAILY Number of daily snapshots to keep (21 = 3 weeks). 21
BACKUP_RETAIN_WEEKLY Number of weekly snapshots to keep (13 ≈ 3 months). 13

Retention schedule:

Tier Frequency Default retention Coverage
Hourly Every hour 96 snapshots ~4 days
Daily Daily at 02:00 21 snapshots ~3 weeks
Weekly Sundays at 03:00 13 snapshots ~3 months

Archives beyond the retention window are automatically pruned after each new backup. The Clean Up button on the dashboard applies retention immediately. When a remote destination is configured, remote copies follow the same retention policy.

Note: Backup and restore is currently supported only for SQLite databases.

Uptime Kuma

Variable Description
UPTIME_KUMA_URL Uptime Kuma push URL for monitoring the application's health.
UPTIME_KUMA_PING_INTERVAL How often to ping Uptime Kuma in minutes (default: 5).

UI / Appearance

DocuElevate supports a dark mode toggle in the navbar. Users can switch between light and dark themes at any time; their choice is stored in localStorage and persists across page reloads in the same browser.

Administrators can set the site-wide default colour scheme that is applied when a user has not yet made a personal choice:

Variable Description Default
UI_DEFAULT_COLOR_SCHEME Default colour scheme for all users. Options: system (follow OS preference), light, dark. Users can always override with the navbar toggle. system

How it works:

  1. On page load an inline script checks the user's localStorage preference first.
  2. If no stored preference exists, the server-supplied UI_DEFAULT_COLOR_SCHEME is used.
  3. When the value is system (the default), the OS-level prefers-color-scheme media query is respected.
  4. Clicking the 🌙 / ☀️ toggle in the navbar saves the new preference to localStorage immediately.

WCAG AA compliance: All dark-mode colour pairs have been chosen with a minimum 4.5:1 contrast ratio for normal text and 3:1 for large text.

Example:

# Force dark mode for all users by default
UI_DEFAULT_COLOR_SCHEME=dark

Support / Help Center – Zammad Integration

The Help Center page (/help) can optionally integrate with a Zammad instance to offer live chat and a ticket-creation form directly within DocuElevate.

Variable Description Default
ZAMMAD_URL Base URL of your Zammad instance (e.g. https://zammad.example.com). Required for chat and form. (unset)
ZAMMAD_CHAT_ENABLED Show a Zammad live-chat widget on the Help Center page. false
ZAMMAD_CHAT_ID Zammad chat topic ID (see Channels → Chat → Topics in Zammad admin). 1
ZAMMAD_FORM_ENABLED Show a "Submit a Ticket" feedback form on the Help Center page. false
SUPPORT_EMAIL Support e-mail address displayed on the Help Center page. (unset)

Example:

ZAMMAD_URL=https://zammad.example.com
ZAMMAD_CHAT_ENABLED=true
ZAMMAD_CHAT_ID=1
ZAMMAD_FORM_ENABLED=true
SUPPORT_EMAIL=support@example.com

Note: The live-chat widget requires at least one Zammad agent to be online. If no agent is available, the widget will not appear. Enable Zammad's debug mode (debug: true) for troubleshooting.

Automatic User Context (Auto-Fill)

When a user is logged in, DocuElevate automatically passes their identity to the Zammad widgets:

  • Ticket form: The user's name and email are pre-filled in the form fields. A DocuElevate User Context block containing the user's name, email, and username is appended to the ticket body so the support agent can immediately identify the requester.
  • Live chat: The user's name and email are passed to the Zammad chat widget constructor. Depending on your Zammad version, the agent may see this information in the chat session details.

No additional configuration is required — the auto-fill uses the authenticated session data (OAuth, local login, or admin credentials). Anonymous visitors see the standard Zammad widgets without pre-filled data.

Observability – Sentry

DocuElevate integrates with Sentry for real-time error tracking and performance monitoring. See SentrySetup.md for a full setup guide.

Server-side (Python SDK)

Variable Description Default
SENTRY_DSN Sentry DSN URL. When set, error reporting and performance tracing are enabled automatically. Leave blank to disable. (unset)
SENTRY_ENVIRONMENT Environment label attached to every Sentry event (development, staging, production, …). production
SENTRY_TRACES_SAMPLE_RATE Fraction of requests captured for performance tracing (0.0 – 1.0). 0.0 disables tracing entirely. 0.1
SENTRY_PROFILES_SAMPLE_RATE Fraction of profiled transactions sent to Sentry (0.0 – 1.0). Only active when traces > 0. 0.0
SENTRY_SEND_DEFAULT_PII Attach PII (IP addresses, user agents) to Sentry events. Disabled by default for GDPR/CCPA compliance. false

Browser SDK (JavaScript)

The Sentry Browser SDK is loaded automatically on every rendered page when SENTRY_DSN is set. The same DSN is used for both server and browser — the DSN is a public key in Sentry's security model and is intentionally embedded in client-side code.

Variable Description Default
SENTRY_JS_TRACES_SAMPLE_RATE Fraction of browser page-loads captured for client-side performance tracing (0.0 – 1.0). 0.0
SENTRY_JS_REPLAY_SESSION_SAMPLE_RATE Fraction of sessions recorded by Sentry Session Replay (0.0 – 1.0). 0.0
SENTRY_JS_REPLAY_ON_ERROR_SAMPLE_RATE Fraction of error sessions captured with session replay context (0.0 – 1.0). 0.1
# Minimal example (server + browser)
SENTRY_DSN=https://<key>@o<org>.ingest.sentry.io/<project>
SENTRY_ENVIRONMENT=production

# Optional server-side tuning
SENTRY_TRACES_SAMPLE_RATE=0.1
SENTRY_PROFILES_SAMPLE_RATE=0.0
SENTRY_SEND_DEFAULT_PII=false

# Optional browser-side tuning
SENTRY_JS_TRACES_SAMPLE_RATE=0.1
SENTRY_JS_REPLAY_SESSION_SAMPLE_RATE=0.0
SENTRY_JS_REPLAY_ON_ERROR_SAMPLE_RATE=0.1

Note: Sentry is completely opt-in — if SENTRY_DSN is not set, neither SDK is initialised and no data leaves your infrastructure.

Duplicate Document Detection

DocuElevate detects and flags documents that share the same content, even if they arrive as separate uploads.

Exact Duplicate Detection (SHA-256)

When ENABLE_DEDUPLICATION=True (the default), each new document is hashed with SHA-256 before processing begins. If the hash matches an existing file record the upload is rejected immediately — no processing task is created, and the temporary file is removed from disk. The /api/ui-upload response returns "status": "duplicate" together with a duplicate_of object that identifies the original file.

If the same file somehow reaches the Celery worker (e.g. via a watch-folder ingest) it is still caught there and stored as a duplicate (is_duplicate=True, duplicate_of_id=<original_id>) with no further processing.

Variable Description Default
ENABLE_DEDUPLICATION Hash-based exact duplicate detection on ingest. True
SHOW_DEDUPLICATION_STEP Show the "Check for Duplicates" step in the processing timeline UI. True

When the upload is an exact duplicate the /api/ui-upload response looks like:

{
  "status": "duplicate",
  "original_filename": "invoice.pdf",
  "stored_filename": "abc-123.pdf",
  "duplicate_of": {
    "duplicate_type": "exact",
    "original_file_id": 42,
    "original_filename": "invoice.pdf",
    "message": "This file is an exact duplicate of an already-processed document. It has not been queued for processing again."
  }
}

Near-Duplicate Detection (Content Similarity)

Near-duplicate detection catches documents that contain the same content but carry different SHA-256 hashes — for example, the same letter scanned twice on different days.

After OCR processes a document, its extracted text is converted to a vector embedding using the configured AI provider. The cosine similarity between two documents' embeddings reflects how semantically similar their content is.

Variable Description Default
NEAR_DUPLICATE_THRESHOLD Minimum cosine similarity (0–1) for two documents to be considered near-duplicates. 0.85 means ≥ 85 % semantic overlap. 0.85
EMBEDDING_MODEL Model name for generating text embeddings via the OpenAI-compatible API. Must be supported by the endpoint configured with OPENAI_BASE_URL. text-embedding-3-small
EMBEDDING_MAX_TOKENS Maximum tokens to send to the embedding model. Text is truncated to approximately this many tokens before calling the API. Set below the model's context window (e.g. 8 000 for an 8 192-token model). 8000

Near-duplicate detection: - Embeddings are computed automatically during document ingestion as a processing step ("Compute Embedding"). - A periodic backfill task (every 5 minutes) picks up any files that were processed before the embedding pipeline was enabled. - The Similarity dashboard (/similarity) shows all pairs of documents above the threshold, ranked by score. - The Duplicates management page (/duplicates → "Near-Duplicate Finder" tab) allows per-file lookup. - Debug endpoints are available to inspect embedding status and trigger recomputation (see API docs). - Documents without OCR text cannot be compared and are excluded from results.

A score of ≥ 0.90 reliably identifies the same document scanned twice. A score of 0.70–0.90 suggests partial content overlap. Adjust NEAR_DUPLICATE_THRESHOLD to tune sensitivity.

PDF/A Archival Conversion

DocuElevate can optionally generate PDF/A archival copies of both the original ingested file and the processed file. PDF/A copies are saved as parallel variants alongside the standard files—they do not replace the originals. This provides better legal coverage by producing time-stamped, self-contained archival documents suitable for long-term storage and compliance.

The conversion uses ocrmypdf (backed by Ghostscript), which is already bundled in the Docker images.

Note: PDF/A conversion may alter font rendering, especially for OCR text overlays produced by Microsoft Azure Document Intelligence. This is expected and is why PDF/A copies are kept as parallel variants rather than replacements.

Variable Description Default
ENABLE_PDFA_CONVERSION Enable PDF/A archival variant generation for both original and processed files. false
PDFA_FORMAT PDF/A format variant: 1 (PDF/A-1b), 2 (PDF/A-2b), 3 (PDF/A-3b). 2
PDFA_UPLOAD_ORIGINAL Upload the original-file PDF/A variant to all configured storage providers. false
PDFA_UPLOAD_PROCESSED Upload the processed-file PDF/A variant to all configured storage providers. false
PDFA_UPLOAD_FOLDER Subfolder name appended to each provider's folder for PDF/A uploads. pdfa
GOOGLE_DRIVE_PDFA_FOLDER_ID Google Drive folder ID for PDF/A uploads (uses folder IDs, not paths). Empty = use default folder. (empty)
PDFA_TIMESTAMP_ENABLED Enable RFC 3161 timestamping of PDF/A files (creates .tsr proof-of-existence files). false
PDFA_TIMESTAMP_URL URL of the RFC 3161 Timestamp Authority. https://freetsa.org/tsr

Storage Layout

When enabled, PDF/A copies are stored under workdir/pdfa/:

workdir/
├── original/          # Immutable copy of ingested file
├── processed/         # Processed file with embedded metadata
├── pdfa/
│   ├── original/      # PDF/A copy of the ingested file
│   │   └── *.pdf.tsr  # RFC 3161 timestamps (when timestamping enabled)
│   └── processed/     # PDF/A copy of the processed file (with -PDFA suffix)
│       └── *.pdf.tsr  # RFC 3161 timestamps (when timestamping enabled)
└── tmp/               # Temporary processing area

Per-Provider Folder Overrides

When uploading PDF/A files to storage providers, DocuElevate appends the PDFA_UPLOAD_FOLDER value as a subfolder to each provider's configured folder. For example:

Provider Regular Folder PDF/A Upload Folder
Dropbox /Documents /Documents/pdfa
S3 docs/uploads/ docs/uploads/pdfa/
Nextcloud /Files /Files/pdfa
OneDrive Documents/Uploads Documents/Uploads/pdfa
SharePoint Uploads Uploads/pdfa
Google Drive (folder ID) GOOGLE_DRIVE_PDFA_FOLDER_ID

Set PDFA_UPLOAD_FOLDER to an empty string to upload PDF/A files into the same folder as regular uploads.

RFC 3161 Timestamping

When PDFA_TIMESTAMP_ENABLED=true, each PDF/A file is timestamped using the configured TSA (default: FreeTSA). This creates a .tsr file alongside each PDF/A file, providing cryptographic proof that the document existed at a specific point in time.

Requires openssl on the PATH (included in Docker images).

Other TSA options: - GlobalSign – enterprise, eIDAS qualified - DigiStamp – high assurance, legal - IdenTrust – legal, free with certificate purchase

Configuration Example

# Enable PDF/A archival copies
ENABLE_PDFA_CONVERSION=true

# Use PDF/A-2b format (default, recommended for most use cases)
PDFA_FORMAT=2

# Upload both original and processed PDF/A to providers
PDFA_UPLOAD_ORIGINAL=true
PDFA_UPLOAD_PROCESSED=true

# PDF/A files go into a 'pdfa' subfolder on each provider
PDFA_UPLOAD_FOLDER=pdfa

# Enable RFC 3161 timestamping via FreeTSA
PDFA_TIMESTAMP_ENABLED=true
PDFA_TIMESTAMP_URL=https://freetsa.org/tsr

Performance & Caching

DocuElevate automatically optimizes database access and uses Redis as a caching layer for frequently accessed data.

Database Indexes

On startup the application creates indexes on columns used for filtering, sorting, and joining in the file listing and status computation queries:

Table Column Purpose
files created_at Default sort order
files mime_type MIME type filter & dropdown
processing_logs file_id Log retrieval by file
processing_logs timestamp Log ordering
file_processing_steps status Status filter sub-queries

These indexes are created idempotently on every startup so no manual migration step is required.

Redis Query Cache

When Redis is available (configured via REDIS_URL), DocuElevate caches selected query results to avoid redundant database round-trips:

Cache Key TTL Description
mime_types 120 s Distinct MIME types shown in the file-list filter dropdown

The cache is fail-open: if Redis is unreachable the application falls back to querying the database directly with no user-visible impact.

Configuration Examples

Minimal Configuration

This is the minimal configuration needed to run DocuElevate with local storage only:

DATABASE_URL=sqlite:///./app/database.db
REDIS_URL=redis://redis:6379/0
WORKDIR=/workdir
GOTENBERG_URL=http://gotenberg:3000

Full Configuration with All Services

# Core settings
DATABASE_URL=sqlite:///./app/database.db
REDIS_URL=redis://redis:6379/0
WORKDIR=/workdir
GOTENBERG_URL=http://gotenberg:3000
EXTERNAL_HOSTNAME=docuelevate.example.com
ALLOW_FILE_DELETE=true

# IMAP settings
IMAP1_HOST=mail.example.com
IMAP1_PORT=993
IMAP1_USERNAME=user@example.com
IMAP1_PASSWORD=password
IMAP1_SSL=true
IMAP1_POLL_INTERVAL_MINUTES=5
IMAP1_DELETE_AFTER_PROCESS=false

# AI services
OPENAI_API_KEY=sk-...
AZURE_DOCUMENT_INTELLIGENCE_KEY=...
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://...

# Authentication
AUTH_ENABLED=true
SESSION_SECRET=a-very-long-and-secure-random-secret-key-string-for-session-encryption
ADMIN_USERNAME=admin
ADMIN_PASSWORD=your_secure_password
ADMIN_GROUP_NAME=admin
AUTHENTIK_CLIENT_ID=...
AUTHENTIK_CLIENT_SECRET=...
AUTHENTIK_CONFIG_URL=https://auth.example.com/.well-known/openid-configuration
OAUTH_PROVIDER_NAME=Authentik SSO

# Multi-user mode (requires AUTH_ENABLED=true)
MULTI_USER_ENABLED=false
DEFAULT_DAILY_UPLOAD_LIMIT=0

# Storage services
PAPERLESS_NGX_API_TOKEN=...
PAPERLESS_HOST=https://paperless.example.com

DROPBOX_APP_KEY=...
DROPBOX_APP_SECRET=...
DROPBOX_REFRESH_TOKEN=...
DROPBOX_FOLDER=/Documents/Uploads

NEXTCLOUD_UPLOAD_URL=https://nc.example.com/remote.php/dav/files/username
NEXTCLOUD_USERNAME=username
NEXTCLOUD_PASSWORD=password
NEXTCLOUD_FOLDER=/Documents/Uploads

# Google Drive
GOOGLE_DRIVE_CREDENTIALS_JSON={"type":"service_account","project_id":"..."}
GOOGLE_DRIVE_FOLDER_ID=1a2b3c4d5e6f7g8h9i0j
GOOGLE_DRIVE_DELEGATE_TO=optional-user@example.com
GOOGLE_DRIVE_USE_OAUTH=true
GOOGLE_DRIVE_CLIENT_ID=your_client_id
GOOGLE_DRIVE_CLIENT_SECRET=your_client_secret
GOOGLE_DRIVE_REFRESH_TOKEN=your_refresh_token

# WebDAV
WEBDAV_URL=https://webdav.example.com/path
WEBDAV_USERNAME=username
WEBDAV_PASSWORD=password
WEBDAV_FOLDER=/Documents/Uploads
WEBDAV_VERIFY_SSL=True

# FTP
FTP_HOST=ftp.example.com
FTP_PORT=21
FTP_USERNAME=username
FTP_PASSWORD=password
FTP_FOLDER=/Documents/Uploads
FTP_USE_TLS=True
FTP_ALLOW_PLAINTEXT=True

# SFTP
SFTP_HOST=sftp.example.com
SFTP_PORT=22
SFTP_USERNAME=username
SFTP_PASSWORD=password
SFTP_FOLDER=/Documents/Uploads
# SFTP_PRIVATE_KEY=/path/to/key.pem
# SFTP_PRIVATE_KEY_PASSPHRASE=passphrase

# Email (shared SMTP – password reset & verification)
EMAIL_HOST=smtp.example.com
EMAIL_PORT=587
EMAIL_USERNAME=docuelevate@example.com
EMAIL_PASSWORD=password
EMAIL_USE_TLS=True
EMAIL_SENDER=DocuElevate System <docuelevate@example.com>

# Email Destination (document delivery – separate from shared email above)
DEST_EMAIL_HOST=smtp.example.com
DEST_EMAIL_PORT=587
DEST_EMAIL_USERNAME=docuelevate@example.com
DEST_EMAIL_PASSWORD=password
DEST_EMAIL_USE_TLS=True
DEST_EMAIL_SENDER=DocuElevate Delivery <docuelevate@example.com>
DEST_EMAIL_DEFAULT_RECIPIENT=recipient@example.com

# Notification Settings
# Configure notification services using Apprise URL format
NOTIFICATION_URLS=discord://webhook_id/webhook_token,mailto://user:pass@gmail.com,tgram://bot_token/chat_id
NOTIFY_ON_TASK_FAILURE=True
NOTIFY_ON_CREDENTIAL_FAILURE=True
NOTIFY_ON_STARTUP=True
NOTIFY_ON_SHUTDOWN=False

# OneDrive (Personal Account)
ONEDRIVE_CLIENT_ID=12345678-1234-1234-1234-123456789012
ONEDRIVE_CLIENT_SECRET=your_client_secret
ONEDRIVE_TENANT_ID=common
ONEDRIVE_REFRESH_TOKEN=your_refresh_token
ONEDRIVE_FOLDER_PATH=Documents/Uploads

# SharePoint Online
SHAREPOINT_CLIENT_ID=12345678-1234-1234-1234-123456789012
SHAREPOINT_CLIENT_SECRET=your_client_secret
SHAREPOINT_TENANT_ID=your-tenant-id
SHAREPOINT_REFRESH_TOKEN=your_refresh_token
SHAREPOINT_SITE_URL=https://tenant.sharepoint.com/sites/sitename
SHAREPOINT_DOCUMENT_LIBRARY=Documents
SHAREPOINT_FOLDER_PATH=Uploads

# Amazon S3
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
AWS_REGION=us-east-1
S3_BUCKET_NAME=my-document-bucket
S3_FOLDER_PREFIX=documents/uploads/2023/  # Will place files in this subfolder
S3_STORAGE_CLASS=STANDARD
S3_ACL=private

# Uptime Kuma
UPTIME_KUMA_URL=https://kuma.example.com/api/push/abcde12345?status=up
UPTIME_KUMA_PING_INTERVAL=5

# Backup & Restore
BACKUP_ENABLED=True
BACKUP_DIR=/data/backups
BACKUP_REMOTE_DESTINATION=s3         # or dropbox, google_drive, onedrive, nextcloud, webdav, ftp, sftp, email
BACKUP_REMOTE_FOLDER=backups
BACKUP_RETAIN_HOURLY=96
BACKUP_RETAIN_DAILY=21
BACKUP_RETAIN_WEEKLY=13

Selective Service Configuration

You can choose which document storage services to use by only including the relevant environment variables. For example, if you only want to use Dropbox, include only the Dropbox variables and omit the Paperless NGX and Nextcloud variables.

System Reset / Factory Reset

DocuElevate provides two mechanisms for resetting the system to a clean state. Both are disabled by default and must be explicitly enabled.

Automatic Reset on Startup

Set FACTORY_RESET_ON_STARTUP=true to wipe all user data (database rows and work-files) every time the application starts. This is useful for demo, testing, or ephemeral environments where you always want a fresh instance.

FACTORY_RESET_ON_STARTUP=true

Warning: This destroys all documents, processing history, audit logs, and backups on every restart. Application settings and configuration are preserved.

Admin UI Reset Page

Set ENABLE_FACTORY_RESET=true to display the System Reset page in the admin navigation menu. From this page, administrators can:

Action Confirmation Description
Full Reset Type DELETE Wipes all database rows and work-files. The system returns to its initial state.
Reset & Re-import Type REIMPORT Copies original files to a reimport/ folder inside the workdir, wipes everything, then configures the reimport folder as a watch folder so files are automatically re-ingested with the same processing pipeline, rate limits, and backoff strategy as regular uploads.
ENABLE_FACTORY_RESET=true

API Endpoints

When ENABLE_FACTORY_RESET=true, two admin-only API endpoints are available:

  • POST /api/admin/system-reset/full — body: {"confirmation": "DELETE"}
  • POST /api/admin/system-reset/reimport — body: {"confirmation": "REIMPORT"}
  • GET /api/admin/system-reset/status — returns current feature-flag state

What Gets Deleted

Deleted Preserved
All document records (files table) Application settings (application_settings table)
Processing logs and steps User accounts and profiles
Audit logs Subscription plans
Backup records Pipelines and scheduled jobs
Original, processed, and temporary files The workdir directory itself
Watch-folder caches and ingestion state OAuth and integration configuration

Configuration File Location

The .env file should be placed at the root of the project directory. When using Docker Compose, you can reference it with the env_file directive in your docker-compose.yml.