Home PIKA Docs Contact
Self-hosted OCR

Validate AI-extracted text with confidence.

VERA combines PaddleOCR with human review. Upload scanned documents, review every token, correct errors before they propagate, and export validated data. All on your infrastructure.

Features

OCR you can trust.

Raw OCR is never good enough for production. VERA puts a human in the loop — so you export data you've actually verified.

PaddleOCR engine

High-accuracy text extraction powered by PaddleOCR, running locally. Supports scanned documents, photos, and multi-page PDFs.

PDF JPEG / PNG Multi-page

Token-level review

Review every extracted token with confidence scores. Accept, correct, or reject individual words — not entire pages.

Confidence scoring Inline editing

AI-powered summaries

Ollama generates document summaries from validated text. Useful for quick triage and downstream workflows.

Local LLM Ollama

Async processing

Documents are processed in the background by Celery workers. Upload a batch, come back when they're ready. Stuck documents are automatically recovered.

Centralised auth via Hub

User accounts, roles, and licensing managed through the ai.doo Hub. No separate user database to maintain.

Verification-first design.

VERA is built around the principle that OCR output must be verified before it's trusted. The human corrects before export — not after errors have propagated through your pipeline.

Workflow

Upload. Review. Correct. Export.

Four steps from scanned document to validated, exportable data.

01

Upload

Drop scanned documents (PDF, JPEG, PNG) into VERA. Multi-page PDFs are split automatically.

02

OCR extraction

PaddleOCR processes each page in the background. Confidence scores are assigned to every token.

03

Human review

Review tokens side-by-side with the original scan. Correct low-confidence words, accept the rest.

04

Export

Export validated text. Only verified data leaves the system.

Architecture

Built for self-hosted deployment.

Runs entirely on your infrastructure via Docker Compose. No cloud dependency.

Backend

FastAPI + Celery workers for async OCR processing. PostgreSQL for structured data, Redis for task queues.

Frontend

Next.js application with a token-level review interface. Responsive, fast, production-built.

Infrastructure

Docker Compose deployment. Connects to shared Ollama instance for AI summaries. GPU-accelerated where available.

Pricing

Licensed per seat. Self-hosted.

VERA is licensed through the ai.doo Hub. No per-document fees, no cloud costs. You own your infrastructure.

Professional

Per-seat annual license

Full VERA deployment with Hub authentication, async processing, AI summaries, and ongoing updates.

  • Unlimited documents
  • PaddleOCR + AI summaries
  • Multi-user with Hub auth
  • Background processing (Celery)
  • Priority email support
Enterprise

Custom deployment

Bespoke integrations, custom export formats, dedicated support, and on-site deployment assistance.

  • Everything in Professional
  • Custom integrations
  • Dedicated support channel
  • Deployment assistance
  • SLA available

Ready to validate with confidence?

Email hello@aidoo.biz to discuss VERA for your organisation.