5 pointsby kokakiwi4 hours ago2 comments
  • gilles_oponono3 hours ago
    what part do you compress more specifically ?
    • kokakiwi3 hours ago
      For coding agents, mainly the tools' output, they're often the heaviest "messages" sent by the user and also the most "noisy" (like for "cargo test", Codex don't really care about all the build part, only the test results)
  • MallocVoidstar3 hours ago
    No discussion on problem difficulty, or on result quality besides "the Edgee run generated slightly more output tokens than the baseline".
    • sachamorard3 hours ago
      More info in the GitHub repo, in the reports folder (sorry, I'm not sure I can add the link here without being flagged).

      "Codex + Edgee consumes roughly half the fresh tokens of the normal Codex baseline. Output tokens are marginally higher (+3,312, +19.5%), suggesting the Edgee scenario produces slightly more verbose responses but dramatically reduces context ingestion."

    • kokakiwi3 hours ago
      I think the problem being given to Codex for the benchmark is the one in the attached video, where two Codex run side-by-side, working a "standard" dev thingy