Replaces Enterprise's account_invoice_extract with a Fusion-native pipeline: Stage 1 (text extraction): Tesseract OCRs the bill attachment via pytesseract + pdf2image. Pluggable OCRProvider adapter pattern allows future Mindee / Google Document AI / Ollama-vision backends. Stage 2 (field parsing): The fusion_accounting_ai LLMProvider reads the raw OCR text and returns structured invoice fields (vendor, invoice number, dates, amounts, line items) as JSON. Draft invoice fields are auto-populated for empty-only fields (never overwriting user-entered data). Vendor matching by name against res.partner with supplier_rank > 0. Adds: - account.move.ocr_state (selection: not_requested/pending/processing/ done/failed/manual) - account.move.ocr_raw_text, ocr_extracted_data (Json), ocr_backend, ocr_confidence - fusion.ocr.log (audit trail per OCR run) - res.company.fusion_ocr_enabled / fusion_ocr_default_backend / auto_run - /fusion/ocr/request_for_invoice JSON-RPC endpoint Backend availability detected at runtime via OCRProvider.is_available() classmethods. Tesseract 5.3.4 + pytesseract 0.3.13 + pdf2image 1.17.0 are installed in the container. Tests: 13 (TesseractAdapter availability + image OCR; flow tests for draft autofill, no-attachment guard, customer-invoice guard, ref-not- overwritten; field parser empty/clean-json/markdown-fence/bad-JSON/ provider-exception). All pass on westin-v19 OrbStack VM. Made-with: Cursor
48 lines
1.6 KiB
Python
48 lines
1.6 KiB
Python
import io
|
|
|
|
from PIL import Image, ImageDraw
|
|
|
|
from odoo.tests import tagged
|
|
from odoo.tests.common import TransactionCase
|
|
|
|
from odoo.addons.fusion_accounting_ocr.services.ocr_providers.tesseract_adapter import (
|
|
TesseractAdapter,
|
|
)
|
|
|
|
|
|
@tagged('post_install', '-at_install')
|
|
class TestTesseractAdapter(TransactionCase):
|
|
|
|
def test_is_available(self):
|
|
# In our container tesseract + pytesseract + pdf2image are pre-installed.
|
|
self.assertTrue(TesseractAdapter.is_available())
|
|
|
|
def test_extract_simple_text_image(self):
|
|
# Generate a tiny PNG with the text "INVOICE 12345 Total $100".
|
|
# Use a slightly larger image and try to load a TTF font for
|
|
# tesseract reliability; fall back to default bitmap font otherwise.
|
|
img = Image.new('RGB', (800, 120), color='white')
|
|
draw = ImageDraw.Draw(img)
|
|
try:
|
|
from PIL import ImageFont
|
|
font = ImageFont.truetype(
|
|
'/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf', 36,
|
|
)
|
|
except Exception:
|
|
font = None
|
|
draw.text((20, 30), "INVOICE 12345 Total $100", fill='black', font=font)
|
|
|
|
buf = io.BytesIO()
|
|
img.save(buf, format='PNG')
|
|
png_bytes = buf.getvalue()
|
|
|
|
adapter = TesseractAdapter()
|
|
result = adapter.extract(png_bytes, mimetype='image/png')
|
|
|
|
self.assertEqual(result.backend, 'tesseract')
|
|
self.assertEqual(result.error, '')
|
|
self.assertEqual(result.pages, 1)
|
|
self.assertGreater(len(result.raw_text), 0)
|
|
# Tesseract should pick up the digits at minimum.
|
|
self.assertIn('12345', result.raw_text.replace(' ', ''))
|