Supplier name parsed from logo instead of legal entity

We are seeing incorrect results for the supplier name field. The OCR returns the brand name from the page LOGO instead of the legal entity that issued the document.

Example: on a receipt the header logo reads "Make", while the actual supplier (legal entity, tied to the address and US EIN 61-1797223) is "Celonis Inc.". The API returns "Make".

What we have already tried, without success:

  • Post-processing with a RAG step

  • Adding explicit field guidance instructing the model to prefer the legal entity near the address and tax ID and to ignore header logos

Neither resolves the issue. The model still favors the visually dominant logo over the legal name in the sender block.

Questions:

  1. Is there a recommended way to make supplier_name prefer the legal entity (near supplier_address and supplier_company_registrations) instead of the header logo?

  2. Can this layout be submitted for model review or correction?

  3. If we annotate sample documents, will that improve extraction for this case?

Link to document - https://app.mindee.com/model/701fa3d9-d54b-4e7c-a506-7efed7a5079e/review?docs=98b124b8-2349-4a9a-82c6-6b7699a4a07c File name - 5dea48c8-35ad-4df5-8543-1aa1164691a2.pdf

Happy to share sample documents. Thanks for your help.

Please authenticate to join the conversation.

Upvoters
Status

In Progress

Board

πŸ› Bug Report

Date

2 days ago

Author

Hennadii Shevchyk

Subscribe to post

Get notified by email when there are changes.