Folio - Folio

Folio turns unstructured documents into structured data. Upload a file, Folio runs an OCR → classify → extract pipeline, and you retrieve a typed JSON result via polling or webhook.

What it does — reads PDFs, images, and scanned documents; identifies the document type; and pulls out named fields with per-field confidence scores.
Who it’s for — health-tech teams that need reliable, privacy-preserving extraction from clinical and administrative documents at scale.
Where it runs — Azure Canada Central, Law 25-aligned, with optional de-identification built into the pipeline.

How the pipeline works

Upload (POST /v1/documents)
        ↓
   OCR  →  Classify  →  Extract
        ↓
Result available (GET /v1/documents/{id}/result)

Processing is asynchronous. A POST /v1/documents call returns a 202 with a document id almost instantly. The result is ready seconds to minutes later depending on document size and pipeline configuration.

How these docs are organised

Section	What you’ll find
Get started	Quickstart walkthrough, API key authentication
Guides	Async model, webhooks, custom extraction schemas, de-identification, confidence & HITL, language support
API reference	Full endpoint specs auto-generated from the OpenAPI schema

Start here

Quickstart

Submit your first document and retrieve a structured result in under five minutes.

Authentication

Learn how API keys work and how to keep them safe.

Async model

Understand the queued → processing → completed lifecycle and how to poll or subscribe to results.

Custom schemas

Define your own field list for any document type.

Quickstart

⌘I

​How the pipeline works

​How these docs are organised

​Start here

Quickstart

Authentication

Async model

Custom schemas

How the pipeline works

How these docs are organised

Start here