OCR Token Calculation

OCR Tokens play a critical role in how documents are processed and converted into records inside Odoo. When using an OCR Engine, token usage directly impacts processing cost, performance, and scalability of your document automation workflows.

In this OCR workflow, tokens are consumed specifically for text processing. This includes both the instructions sent to the OCR Engine and the text extracted from the document. Together, these are referred to as OCR Tokens.

OCR Tokens represent the total amount of textual data that the OCR Engine processes and generates during a single OCR request. By understanding how OCR tokens work, businesses can better estimate processing costs, manage extracted data size, and design efficient and predictable OCR workflows within Odoo.


Understanding OCR Token Consumption

OCR token usage is not a fixed or straight calculation. It varies because every document differs in structure, format, and content. Even documents that appear similar may require very different levels of processing effort.

The OCR Engine performs more than basic character reading. It analyzes multiple elements within a document, including text positioning, tables, field labels, and overall layout structure. This deep analysis allows the system to convert unstructured document data into structured Odoo records with high accuracy.

OCR token usage includes:

  • Text instructions and requests sent to the OCR Engine
  • Text extracted from the document as OCR output

Both inputs and outputs are processed together and counted as OCR Tokens.

Because of this approach, two documents with the same number of pages can consume different numbers of tokens. A document with clean formatting and minimal tables may require fewer tokens, while another document with dense text, complex tables, or inconsistent layouts may consume significantly more.


How Token Usage Is Determined

OCR Token calculation is dynamic and usage-based, not formula-driven. Instead of simply counting characters or pages, the system evaluates the actual processing effort required to convert a document into a valid Odoo record.

What Is an OCR Token?

An OCR Token is a unit used to measure the amount of text processed internally by the OCR Engine. Tokens are not the same as characters or words. They are smaller text segments used for efficient processing.

General guidelines for English text:

  • 1 token is approximately equal to 4 characters
  • 1 token is approximately equal to 0.75 words

These values are estimates. Actual token usage may vary depending on formatting, content structure, and processing complexity.

How OCR Token Calculation Works

OCR token usage is calculated based on the total text processed by the OCR Engine, including:

  • OCR instructions and request messages
  • Text extracted from the document as OCR output

These together form the total OCR Tokens consumed during processing.

Factors Considered During OCR Token Calculation

While calculating OCR token usage, the OCR Engine evaluates:

  • Number of pages and their complexity
  • Quantity and structure of tables
  • Density of textual content
  • Field extraction and mapping requirements
  • Validation and error handling effort

This ensures token usage accurately reflects real processing work rather than applying a rigid rule.


OCR Token Calculation Example

Example scenario:

  • OCR instruction text: 200 characters, approximately 50 tokens
  • Extracted OCR text: 1,000 characters, approximately 250 tokens

Total OCR Tokens consumed:

50 + 250 = 300 OCR Tokens

For transparency and tracking, the exact OCR token usage is shown inside your OCR Dashboard.

*This OCR token calculation methodology is applicable only to the Standard OCR Plan.