3 pointsby thetall0ne16 hours ago2 comments
  • toomuchtodo12 hours ago
    A cool feature would be to be able to read WARC files from crawling the target site previously.

    https://github.com/webrecorder/warcio

    https://github.com/ArchiveTeam/grab-site

  • mojomark16 hours ago
    Nice - right? How annoying is that. If a public human can read content, why can't an LLM? ChatGPT/Claude also (at least the for me, also don't consistently fully review the content I upload for review. Sometimes it's full, but most of the time (especially if it's a larger document, say 100pg pdf or 15 python scripts), I have to continually push them to go through everything.

    Really annoying - thank you for this! Now, am I too lazy to apply it, that's the question.

    • thetall0ne16 hours ago
      lol! no problem. Its such an annoying problem. One time and LLM said "I can't directly access URLs" and I replied with "yes you can" and then it did it! WTF!?