Files
Odoo-Modules/fusion_accounting_ocr/__manifest__.py
gsinghpal 125f48377a feat(fusion_accounting_ocr): pluggable OCR for vendor bills
Replaces Enterprise's account_invoice_extract with a Fusion-native pipeline:

Stage 1 (text extraction): Tesseract OCRs the bill attachment via
pytesseract + pdf2image. Pluggable OCRProvider adapter pattern allows
future Mindee / Google Document AI / Ollama-vision backends.

Stage 2 (field parsing): The fusion_accounting_ai LLMProvider reads the
raw OCR text and returns structured invoice fields (vendor, invoice
number, dates, amounts, line items) as JSON.

Draft invoice fields are auto-populated for empty-only fields (never
overwriting user-entered data). Vendor matching by name against
res.partner with supplier_rank > 0.

Adds:
- account.move.ocr_state (selection: not_requested/pending/processing/
  done/failed/manual)
- account.move.ocr_raw_text, ocr_extracted_data (Json), ocr_backend,
  ocr_confidence
- fusion.ocr.log (audit trail per OCR run)
- res.company.fusion_ocr_enabled / fusion_ocr_default_backend / auto_run
- /fusion/ocr/request_for_invoice JSON-RPC endpoint

Backend availability detected at runtime via OCRProvider.is_available()
classmethods. Tesseract 5.3.4 + pytesseract 0.3.13 + pdf2image 1.17.0
are installed in the container.

Tests: 13 (TesseractAdapter availability + image OCR; flow tests for
draft autofill, no-attachment guard, customer-invoice guard, ref-not-
overwritten; field parser empty/clean-json/markdown-fence/bad-JSON/
provider-exception). All pass on westin-v19 OrbStack VM.

Made-with: Cursor
2026-04-20 00:32:50 -04:00

40 lines
1.3 KiB
Python

{
'name': 'Fusion Accounting — Invoice OCR',
'version': '19.0.1.0.0',
'category': 'Accounting/Accounting',
'summary': 'OCR for vendor bills via tesseract + LLM-driven field extraction.',
'description': """
Fusion Accounting — Invoice OCR
================================
Replaces Enterprise's account_invoice_extract with a Fusion-native pipeline:
1. Tesseract OCRs the bill attachment (PDF or image) into raw text
2. The fusion_accounting_ai LLMProvider parses the raw text into structured
fields (vendor, invoice number, dates, amounts, line items)
3. Draft invoice fields are populated for the AP user to confirm
Pluggable backend architecture: future Mindee, Google Document AI, or
Ollama-vision adapters can be dropped in alongside the default tesseract
adapter.
""",
'icon': '/fusion_accounting_ocr/static/description/icon.png',
'author': 'Westin / Fusion Suite',
'depends': [
'fusion_accounting_core',
'fusion_accounting_ai',
'account',
],
'external_dependencies': {
'python': ['pytesseract', 'pdf2image', 'PIL'],
},
'data': [
'security/ir.model.access.csv',
'views/account_move_views.xml',
'views/res_config_settings_views.xml',
],
'auto_install': False,
'installable': True,
'application': False,
'license': 'LGPL-3',
}