feat(fusion_accounting_ocr): pluggable OCR for vendor bills
Replaces Enterprise's account_invoice_extract with a Fusion-native pipeline: Stage 1 (text extraction): Tesseract OCRs the bill attachment via pytesseract + pdf2image. Pluggable OCRProvider adapter pattern allows future Mindee / Google Document AI / Ollama-vision backends. Stage 2 (field parsing): The fusion_accounting_ai LLMProvider reads the raw OCR text and returns structured invoice fields (vendor, invoice number, dates, amounts, line items) as JSON. Draft invoice fields are auto-populated for empty-only fields (never overwriting user-entered data). Vendor matching by name against res.partner with supplier_rank > 0. Adds: - account.move.ocr_state (selection: not_requested/pending/processing/ done/failed/manual) - account.move.ocr_raw_text, ocr_extracted_data (Json), ocr_backend, ocr_confidence - fusion.ocr.log (audit trail per OCR run) - res.company.fusion_ocr_enabled / fusion_ocr_default_backend / auto_run - /fusion/ocr/request_for_invoice JSON-RPC endpoint Backend availability detected at runtime via OCRProvider.is_available() classmethods. Tesseract 5.3.4 + pytesseract 0.3.13 + pdf2image 1.17.0 are installed in the container. Tests: 13 (TesseractAdapter availability + image OCR; flow tests for draft autofill, no-attachment guard, customer-invoice guard, ref-not- overwritten; field parser empty/clean-json/markdown-fence/bad-JSON/ provider-exception). All pass on westin-v19 OrbStack VM. Made-with: Cursor
This commit is contained in:
39
fusion_accounting_ocr/__manifest__.py
Normal file
39
fusion_accounting_ocr/__manifest__.py
Normal file
@@ -0,0 +1,39 @@
|
||||
{
|
||||
'name': 'Fusion Accounting — Invoice OCR',
|
||||
'version': '19.0.1.0.0',
|
||||
'category': 'Accounting/Accounting',
|
||||
'summary': 'OCR for vendor bills via tesseract + LLM-driven field extraction.',
|
||||
'description': """
|
||||
Fusion Accounting — Invoice OCR
|
||||
================================
|
||||
Replaces Enterprise's account_invoice_extract with a Fusion-native pipeline:
|
||||
|
||||
1. Tesseract OCRs the bill attachment (PDF or image) into raw text
|
||||
2. The fusion_accounting_ai LLMProvider parses the raw text into structured
|
||||
fields (vendor, invoice number, dates, amounts, line items)
|
||||
3. Draft invoice fields are populated for the AP user to confirm
|
||||
|
||||
Pluggable backend architecture: future Mindee, Google Document AI, or
|
||||
Ollama-vision adapters can be dropped in alongside the default tesseract
|
||||
adapter.
|
||||
""",
|
||||
'icon': '/fusion_accounting_ocr/static/description/icon.png',
|
||||
'author': 'Westin / Fusion Suite',
|
||||
'depends': [
|
||||
'fusion_accounting_core',
|
||||
'fusion_accounting_ai',
|
||||
'account',
|
||||
],
|
||||
'external_dependencies': {
|
||||
'python': ['pytesseract', 'pdf2image', 'PIL'],
|
||||
},
|
||||
'data': [
|
||||
'security/ir.model.access.csv',
|
||||
'views/account_move_views.xml',
|
||||
'views/res_config_settings_views.xml',
|
||||
],
|
||||
'auto_install': False,
|
||||
'installable': True,
|
||||
'application': False,
|
||||
'license': 'LGPL-3',
|
||||
}
|
||||
Reference in New Issue
Block a user