8 pointsby emot6 hours ago2 comments
  • TIPSIO5 hours ago
    This is cool and glad Cloudflare is offering options for AI to everyone (tools to block, tools to better enable).

    This is probably fast, but FWIW I would bet doing a simple str replace on HTML elements with '' would yield mostly the same result. Any sort of structured content (like markdown) isn't even needed really for LLM. Make it messy and super fast and don't accidentally lose anything, it's an LLM.

    If compression was really the goal, you could take it further and probably remove all words like "the" and "and", punctuation, maybe even spaces

  • hedora6 hours ago
    Why would agents use this?

    HTML -> Markdown software is readily available, and some percentage of the internet is hostile towards agents.

    Also, isn't the conversion lossy? I imagine an agent would rather have access to the HTML, and iteratively try strategies until it got good extraction quality? If it happens automatically inside the network, you're stuck with second-class content extraction some percentage of the time.

    Has anyone built libraries to do that?