en) and French (fr) documents, with automatic language detection (auto) enabled by default. Use the language form field to influence OCR accuracy and extraction quality.
The language parameter
| Value | Behaviour |
|---|---|
auto | Folio detects the document language automatically. This is the default. |
en | Force English OCR and extraction. |
fr | Force French OCR and extraction. |
language field:
When to use auto vs. explicit
Use auto (default) when:
- Your document corpus contains a mix of English and French documents and you don’t know the language at submission time.
- You want Folio to handle language detection without adding any logic on your side.
en or fr) when:
- You know the document language in advance and want to avoid a small detection overhead.
- OCR accuracy for a specific language is critical and you want to eliminate any ambiguity from the classifier.
- Documents contain a mix of characters but should be treated as one primary language (e.g. an English form with a few French words).
Bilingual documents
Some documents (e.g. Canadian government forms, Quebec-regulated health records) contain content in both English and French. In these cases:- Use
auto: Folio will detect the dominant language and apply the best OCR model for that language. Fields in the secondary language are still extracted with reasonable accuracy. - Extraction field values are returned in whichever language they appear in the source document — Folio does not translate values.
If bilingual accuracy is critical for your use case, test both
auto and each explicit language against a representative sample of your documents and compare the per-field confidence scores in the result.Language and extraction schemas
Language does not affect how extraction schemas are defined or applied. Schemakey names, type constraints, and pattern checks are language-agnostic. When using hint values in a schema, writing the hint in the same language as the target documents may improve extraction accuracy.