1 pointby thm10 hours ago2 comments
  • robtherobber9 hours ago
    When publishers complain it's never to protect the authors or journalists, but a specific (extractive) business model. Not that I side with Common Crawl on this one.
  • sharemywin10 hours ago
    maybe robot.txt should be upgraded with license specifics.

    not for commercial use, etc.

    so I believe all content is covered under fair use which to me means common crawl has a right to scrape everything and it's the user of common crawl to sort out the details.