Replaces Enterprise's account_invoice_extract with a Fusion-native pipeline: Stage 1 (text extraction): Tesseract OCRs the bill attachment via pytesseract + pdf2image. Pluggable OCRProvider adapter pattern allows future Mindee / Google Document AI / Ollama-vision backends. Stage 2 (field parsing): The fusion_accounting_ai LLMProvider reads the raw OCR text and returns structured invoice fields (vendor, invoice number, dates, amounts, line items) as JSON. Draft invoice fields are auto-populated for empty-only fields (never overwriting user-entered data). Vendor matching by name against res.partner with supplier_rank > 0. Adds: - account.move.ocr_state (selection: not_requested/pending/processing/ done/failed/manual) - account.move.ocr_raw_text, ocr_extracted_data (Json), ocr_backend, ocr_confidence - fusion.ocr.log (audit trail per OCR run) - res.company.fusion_ocr_enabled / fusion_ocr_default_backend / auto_run - /fusion/ocr/request_for_invoice JSON-RPC endpoint Backend availability detected at runtime via OCRProvider.is_available() classmethods. Tesseract 5.3.4 + pytesseract 0.3.13 + pdf2image 1.17.0 are installed in the container. Tests: 13 (TesseractAdapter availability + image OCR; flow tests for draft autofill, no-attachment guard, customer-invoice guard, ref-not- overwritten; field parser empty/clean-json/markdown-fence/bad-JSON/ provider-exception). All pass on westin-v19 OrbStack VM. Made-with: Cursor
44 lines
1.3 KiB
Python
44 lines
1.3 KiB
Python
"""Helper: turn an ir.attachment into a list of PIL.Image pages.
|
|
|
|
Kept separate from the adapters so future backends (Ollama-vision, Mindee)
|
|
that want PIL images directly don't have to re-implement the PDF rendering.
|
|
"""
|
|
|
|
import base64
|
|
import io
|
|
import logging
|
|
|
|
_logger = logging.getLogger(__name__)
|
|
|
|
|
|
def attachment_to_pages(attachment):
|
|
"""Decode an ir.attachment into a list of PIL.Image pages.
|
|
|
|
Returns ``[]`` on failure (caller should treat as no pages).
|
|
"""
|
|
try:
|
|
from PIL import Image
|
|
from pdf2image import convert_from_bytes
|
|
except ImportError as e:
|
|
_logger.warning("attachment_to_pages requires PIL + pdf2image: %s", e)
|
|
return []
|
|
|
|
if not attachment or not attachment.datas:
|
|
return []
|
|
|
|
try:
|
|
data = base64.b64decode(attachment.datas)
|
|
except Exception as e:
|
|
_logger.warning("Could not decode attachment %s: %s", attachment.id, e)
|
|
return []
|
|
|
|
mimetype = attachment.mimetype or ''
|
|
is_pdf = mimetype == 'application/pdf' or data[:4] == b'%PDF'
|
|
try:
|
|
if is_pdf:
|
|
return convert_from_bytes(data, dpi=200)
|
|
return [Image.open(io.BytesIO(data))]
|
|
except Exception as e:
|
|
_logger.warning("Could not render attachment %s: %s", attachment.id, e)
|
|
return []
|