We are seeing incorrect results for the supplier name field. The OCR returns the brand name from the page LOGO instead of the legal entity that issued the document.
Example: on a receipt the header logo reads "Make", while the actual supplier (legal entity, tied to the address and US EIN 61-1797223) is "Celonis Inc.". The API returns "Make".
What we have already tried, without success:
Post-processing with a RAG step
Adding explicit field guidance instructing the model to prefer the legal entity near the address and tax ID and to ignore header logos
Neither resolves the issue. The model still favors the visually dominant logo over the legal name in the sender block.
Questions:
Is there a recommended way to make supplier_name prefer the legal entity (near supplier_address and supplier_company_registrations) instead of the header logo?
Can this layout be submitted for model review or correction?
If we annotate sample documents, will that improve extraction for this case?
Link to document - https://app.mindee.com/model/701fa3d9-d54b-4e7c-a506-7efed7a5079e/review?docs=98b124b8-2349-4a9a-82c6-6b7699a4a07c File name - 5dea48c8-35ad-4df5-8543-1aa1164691a2.pdf
Happy to share sample documents. Thanks for your help.
Please authenticate to join the conversation.
In Progress
π Bug Report
2 days ago

Hennadii Shevchyk
Get notified by email when there are changes.
In Progress
π Bug Report
2 days ago

Hennadii Shevchyk
Get notified by email when there are changes.