66 pointsby redbella day ago7 comments
  • AlbertoGPa day ago
    > [...] self-hostable solution that leverages state-of-the-art (SOTA) vision models for segment extraction and OCR, unifying the output through a Rust Actix server. This setup allows you to process PDFs and extract segments at an impressive speed of approximately 5 pages per second on a single NVIDIA L4 instance, offering a cost-effective and scalable solution for high-accuracy bounding box segment extraction and OCR. This solution has models that accommodate for both GPU and CPU environments.
  • oliwarner14 hours ago
    > To use Chunkr privately without complying to the AGPL-3.0 license terms you can contact us

    AGPL has no bearing on how I use software, only how I can redistribute it. AGPL does not stop a person or company using Chunkr or its product in a commercial environment without further license.

    • Onavo13 hours ago
      Yes but most of the time in tools handling PDFs they are usually through the network. Would be a minefield if legal says the AGPL will affect every single microservice that interacts with your service. This is why there is a blanket AGPL ban at most tech companies. The AGPL is effectively an EULA.
      • oliwarner11 hours ago
        First, the quote talks about what I do privately. AGPL explicitly encourages me to do whatever the hell I like with it. I don't need another license.

        But broader, the interpretation that AGPL microservices are viral is just one interpretation. If it really is just a swappable backend interface, why should it affect other subsystems? IANAL but it seems pretty trivial to insulate a microservice with the same sort of GPL condom companies ship to avoid "linking" to (eg) the Kernel.

        https://medium.com/swlh/understanding-the-agpl-the-most-misu...

  • kybernetikosa day ago
    It'd be great to see some examples on the web site.
  • infecto15 hours ago
    Like all of these startups in this space there never is a comparison of output being made between them and the ($$) competition. I realize they are doing some segmentation in the workflow but imo the valuable part is the actual document text and table extraction piece. Textract in its cheapest and simplest form is cheaper than this service. Turning on tables Textract is more expensive but I would be curious if Textract is doing a better job.
    • mistrial915 hours ago
      what you want is work in itself.. who pays the reviewer? How do you discover the reviewer? secondly, why must there be one "winner" .. maybe there are niches, local markets, business groups.. they want something and someone provides it.
      • infecto12 hours ago
        Huh? This is a company selling a product/service. I am saying they have done no job to compare themselves to the competition beyond saying they are expensive and I am arguing that the competition is not much more expensive and might offer superior quality.
  • ollivera15 hours ago
    Initially, I didn’t like having the tables as images, but using GPT Vision might be a more accurate way to obtain the markdown. I was also considering using the Adobe Extraction API to extract markdown from the CSV file. So, I will try your API over the weekend and see the results.
  • saaaaaam21 hours ago
    Although the docs say “get started by creating an account on chunkr.ai” there doesn’t seem to be any way to create an account.
  • canterburry15 hours ago
    Task Fails