4 pointsby EnricoShippole4 hours ago3 comments
  • EnricoShippole4 hours ago
    Given the increasingly closed-source nature of the U.S. AI ecosystem, it is now more important than ever to push for the proliferation of open model and dataset releases. Datamule, TeraflopAI, and Daft collaborated to release 43 Billion Tokens of SEC EDGAR data.
  • jgfriedman19994 hours ago
    Neat! Surprised at how cheap it was.
    • jaychia3 hours ago
      Very cool that this kind of work can now be performed at this kind of a price-point. 24 hours for 8M filings on just 12 cores :)

      Excited for unstructured/multimodal data processing to become increasingly commoditized and abstracted away so that more such datasets can be built