It's really, really good at tables.
You have to use the Layout Model and not just the base Document Intelligence.
A bit pricey, but if you're processing content one time and it's high value (my use case as clinical trial protocol documents and the trial will run anywhere from 6-24 months), then it's worth it, IMO.
[0] https://learn.microsoft.com/en-us/azure/ai-services/document...
In my experience, the latest Gemini is best at vision and OCR
There's reliable, and there's reliable. For example [1] is a conversation where I ask ChatGPT 4o questions about a seven-page tabular PDF from [2] which contains a list of election polling stations.
The results are simultaneously impressive and unimpressive. The document contains some repeated addresses, and the LLM correctly identifies all 11 of them... then says it found ten.
It gracefully deals with the PDF table, and converts the all-caps input data into Title Case.
The table is split across multiple pages, and the title row repeats each time. It deals with that easily.
It correctly finds all five schools mentioned.
When asked to extract an address that isn't in the document it correctly refuses, instead of hallucinating an answer.
When asked to count churches, "Bunyan Baptist Church" gets missed out. Of two church halls, only one gets counted.
The "Friends Meeting House" also doesn't get counted, but arguably that's not a church even if it is a place of worship.
Longmeadow Evangelical Church has one address, three rows and two polling station numbers. When asked how many polling stations are in the table, the LLM counts that as two. A reasonable person might have expected one, two, three, or a warning. If I was writing an invoice parser, I would want this to be very predictable.
So, it's a mixed bag. I've certainly seen worse attempts at parsing a PDF.
[1] https://chatgpt.com/share/67812ad9-f2bc-8011-96be-faea40e48d... [2] https://www.stevenage.gov.uk/documents/elections/2024-pcc-el...
From your description, it does perfectly at the task asked about upthread (extraction) and has mixed results on other, question-answering, tasks, that weren't the subject.
¯\_(ツ)_/¯
Which do you think was which?
This is much lighter weight and more reliable than vllm
and that too integrated with prometheus, 160GB VRAM requirement and so on?
Looks like this is targeted for enterprises or maybe governments etc trying to digitalize at scale.
Does it mean that it is yet another wrapper library to call they proprietary cloud api?
Or that when you have the specific access right, you can retrieve a proprietary docker image with secret proprietary binary stuffs inside that will be the server used by the library available in GitHub?
You can imagine how fun it is to debug.
Also: I noticed that it mentioned images… does it do any kind of OCR or summary of them?
The open question is whether to use rule-based parsing using simpler software or model-based parsing using this software.
"Devin Robison" is the author of the package!! Funny, guess it will be similar with the name Alexa
Prerequisites
Hardware
GPU Family Memory # of GPUs (min.)
H100 SXM or PCIe 80GB 2
A100 SXM or PCIe 80GB 2
Hmm, perhaps this is not for me.I genuinely appreciate your perspective, but as a smaller, lesser-known provider, I’d like to understand your concerns better.
Are you worried that I might misuse your data and compromise my entire business, by selling it to the highest bidder? Do you feel uncertain about the security of my systems? Or is it a belief that owning and managing the hardware yourself gives you greater control over security?
What kind of validation or reassurance would help address these concerns?
https://hotaisle.xyz/shared-responsibility-model/
I'm not sure what you mean by "security services"? Can you please expand on that?
CONTAINER ID IMAGE
0f2f86615ea5 nvcr.io/ohlfw0olaadg/ea-participants/nv-ingest:24.10
de44122c6ddc otel/opentelemetry-collector-contrib:0.91.0
02c9ab8c6901 nvcr.io/ohlfw0olaadg/ea-participants/cached:0.2.0
d49369334398 nvcr.io/nim/nvidia/nv-embedqa-e5-v5:1.1.0
508715a24998 nvcr.io/ohlfw0olaadg/ea-participants/nv-yolox-structured-images-v1:0.2.0
5b7a174a0a85 nvcr.io/ohlfw0olaadg/ea-participants/deplot:1.0.0
430045f98c02 nvcr.io/ohlfw0olaadg/ea-participants/paddleocr:0.2.0
8e587b45821b grafana/grafana
aa2c0ec387e2 redis/redis-stack
bda9a2a9c8b5 openzipkin/zipkin
ac27e5297d57 prom/prometheus:latest