Skip to main content
Folio turns unstructured documents into structured data. Upload a file, Folio runs an OCR → classify → extract pipeline, and you retrieve a typed JSON result via polling or webhook.
  • What it does — reads PDFs, images, and scanned documents; identifies the document type; and pulls out named fields with per-field confidence scores.
  • Who it’s for — health-tech teams that need reliable, privacy-preserving extraction from clinical and administrative documents at scale.
  • Where it runs — Azure Canada Central, Law 25-aligned, with optional de-identification built into the pipeline.

How the pipeline works

Upload (POST /v1/documents)

   OCR  →  Classify  →  Extract

Result available (GET /v1/documents/{id}/result)
Processing is asynchronous. A POST /v1/documents call returns a 202 with a document id almost instantly. The result is ready seconds to minutes later depending on document size and pipeline configuration.

How these docs are organised

SectionWhat you’ll find
Get startedQuickstart walkthrough, API key authentication
GuidesAsync model, webhooks, custom extraction schemas, de-identification, confidence & HITL, language support
API referenceFull endpoint specs auto-generated from the OpenAPI schema

Start here

Quickstart

Submit your first document and retrieve a structured result in under five minutes.

Authentication

Learn how API keys work and how to keep them safe.

Async model

Understand the queued → processing → completed lifecycle and how to poll or subscribe to results.

Custom schemas

Define your own field list for any document type.