13 pointsby rochoa5 hours ago2 comments
  • denysvitalian hour ago
    Yes. It's _that_ Milla Jovovich (actress known for Resident Evil). This definitely wasn't on my 2026 Bingo Card.

    Confirmed in this post: https://www.instagram.com/reel/DWzNnqwD2Lu

    This really shows how ideas are worth more than the code itself nowadays. Haven't really tried the project myself yet, but if the benchmark is correct - this looks like a major breakthrough. Even more so coming from someone which (AFAIK) is not technical.

    This is amazing. Well done Milla & team!

    Btw, I already love the memes around this: "Missed the chance to call this Resident Eval

  • darkhanakh2 hours ago
    so the 100% LongMemEval score is a bit misleading if you actually look at whats going on. they took the 3 questions that were failing and applied targeted fixes for those specifically, plus LLM reranking on top. if you hold out those fixes the actual score is 98.4%. still good but not "100%" good yknow

    same story with LoCoMo, the 100% score uses top-k=50 which literally exceeds the session count lol, with reranking on top. honest top-10 no rerank gets you 88.9%

    this is giving openclaw energy where you engineer your benchmark results to look perfect and then market it as some breakthrough. the underlying tech might be interesting but leading with "highest score ever published" when the methodology has these kinds of asterisks is not great

    cool that milla jovovich is vibe coding tho i guess