4 pointsby filipn95 hours ago2 comments
  • filipn95 hours ago
    Hi HN, I built llmSHAP, a multi-threaded open-source Python library that attributes how much different parts of LLM input contributed to the LLM’s output, using Shapley values.

    It’s designed for LLM prompts / RAG contexts where you want something more structured than token masking: you can define “features” as tokens, sentences, paragraphs, tools, images, or arbitrary fields, pin “permanent” context that should always be included (e.g., system prompt + question), and compute attributions with options like caching and multi-threading to speed up the many model calls.

    Repo + tutorial are here: https://github.com/filipnaudot/llmSHAP https://filipnaudot.github.io/llmSHAP/tutorial.html

    I’d love feedback on if you think this could be genuinely useful in your LLM workflows (prompt engineering, RAG debugging, evals, guardrail auditing, etc.)?

  • bradley015 hours ago
    Hi Filip,

    Cool project! I’ve used TokenSHAP but I actually found llmSHAP via the preprint on arXiv a few weeks ago and found it to be a good alternative.

    Are you planning on adding in-image object attribution?

    • filipn95 hours ago
      Thanks!

      No, at the moment we’re not planning to add this. We did, however, just add support for image attribution, but only in terms of how important the image is to the final output, rather than which parts of the image are most important.