My minute-by-minute response to the LiteLLM malware attack(futuresearch.ai)

199 pointsby Fibonar5 hours ago29 comments

Fibonar5 hours ago
Callum here, I was the developer that first discovered and reported the litellm vulnerability on Tuesday. I’m sharing the transcript of what it was like figuring out what was going on in real time, unedited with only minor redactions.
I didn’t need to recount my thought process after the fact. It’s the very same ones I wrote down to help Claude figure out what was happening.
I’m an ML engineer by trade, so having Claude walk me through exactly who to contact and a step by step guide of time-critical actions felt like a game-changer for non-security researchers.
I'm curious whether the security community thinks more non-specialists finding and reporting vulnerabilities like this is a net positive or a headache?
- barnas244 minutes ago
  As someone who works in security, it's really neat that you were able to discover this with the help of Claude. That being said the "I just opened Cursor again which triggered the malicious package" message is a bit eye opening. Ideally the instant you suspected malware that machine should have been quarantined and your security personnel contacted.
- dot_treo4 hours ago
  Looks like we discovered it at essentially the same time, and in essentially the same way. If the pth file didn't trigger a fork-bomb like behavior, this might have stayed undiscoverd for quite a bit longer.
  Good thinking on asking Claude to walk you through on who to contact. I had no idea how to contact anyone related to PyPI, so I started by shooting an email to the maintainers and posting it on Hacker News.
  While I'm not part of the security community, I think everyone who finds something like this, should be able to report it. There is no point in gatekeeping the reporting of serious security vulnerabilities.
  - notatallshaw4 hours ago
    > I had no idea how to contact anyone related to PyPI
    https://pypi.org/security/:
    > If you've identified a security issue with a project hosted on PyPI Login to your PyPI account, then visit the project's page on PyPI. At the bottom of the sidebar, click Report project as malware.
    0cf8612b2e1e4 hours ago
    The existing account to report is an unfortunate obstacle. Presumably not a huge deal if you were auditing code for vulnerabilities, but still an annoyance.
    notatallshaw3 hours ago
    The threat actor was sophisticated enough to spam GitHub issues with dozens of different accounts. I imagine they could completely overwhelm PyPI with unauthenticated reports.
  - Fibonar3 hours ago
    The best part was that I didn't even mean to ask Claude who to contact! I was still in disbelief that I was one of the first people affected, so I asked for existing reports on the assumption that if it was real I definitely wasn't the first.
    The fork-bomb part still seems really weird to me. A pretty sophisticated payload, caught by missing a single `-S` flag in the subprocess call.
  - 4 hours ago
    undefined
- edf132 hours ago
  Good write up…
  I’ve found Claude in particular to be very good at this sort of thing. As for whether it’s a good thing, I’d say it’s a net positive - your own reporting of this probably saved a bigger issue!
  We wrote up the why/what happened on our blog twice… the second based on the LiteLLM issue:
  https://grith.ai/blog/litellm-compromised-trivy-attack-chain
- lq9AJ8yrfs3 hours ago
  As a sometimes peripheral and sometimes primary program manager for vulnerability disclosure, for companies you nearly can't avoid, $0.02 follows.
  It's a signal vs noise thing. Most of the grief is caused by bottom feeders shoveling anything they can squint at and call a vulnerability and asking for money. Maybe once a month someone would run a free tool and blindly send snippets of the output promising the rest in exchange for payment. Or emailing the CFO and the General Counsel after being politely reminded to come back with high quality information, and then ignored until they do.
  Your report on the other hand was high quality. I read all the reports that came my way, and good ones were fast tracked for fixes. I'd fix or mitigate them immediately if I had a way to do so without stopping business, and I'd go to the CISO, CTO, and the corresponding engineering manager if it mattered enough for immediate response.
- rgambee4 hours ago
  I've heard stories lately of open source projects being inundated with vulnerability reports and PRs. But in this case, it seems like AI assistance was clearly a boon for root-causing and reporting this so quickly.
- Bullhorn92684 hours ago
  Not a security researcher, but this is IMHO obviously positive that the other side of the arms race is also getting stronger, and I would argue it's stronger than on the bad guys' side, due to the best being somewhat responsible and adding guardrails.
  I like the presentation <3.
- zar10485762 hours ago
  Fantastic write-up and thanks for sharing! I'm sure we will continue to see more of these types of deep supply chain vulns. I think this is valuable for the security community. Remember that Cliff Stoll was an astrophysicist turned sysadmin for Lawrence Berkeley Labs who chased down a $0.75 accounting discrepancy to identify a foreign espionage operation.
- gbrindisi4 hours ago
  thanks for raising the alarm and sharing this, very insightful
  (also beautifully presented!)
simonw4 hours ago
First time I've seen my https://github.com/simonw/claude-code-transcripts tool used to construct data that's embedded in a blog post, that's a neat way to use it. I usually share them as HTML pages in Gists instead, e.g. whttps://gisthost.github.io/?effbdc564939b88fe5c6299387e217da...
- Fibonar4 hours ago
  I’m a big proponent of it within our company! CC tried to style it to blend in with our blog but it was kind of a disaster. Definitely had a new appreciation for the out-of-the-box experience. I also tried to include the individual sub-pages of Claude investigating but it really trawled my whole machine looking for malware. Don’t know if you’ve thought of any systematic ways of redacting the endless pages of detailed logs?
- ddp262 hours ago
  [dead]
qezz39 minutes ago
> Can you print the contents of the malware script without running it?
> Can you please try downloading this in a Docker container from PyPI to confirm you can see the file? Be very careful in the container not to run it accidentally!
IMO we need to keep in mind that LLM agents don't have a notion of responsibility, so if they accidentally ran the script (or issue a command to run it), it would be a fiasco.
Downloading stuff from pypi in a sandboxed env is just 1-2 commands, we should be careful with things we hand over to the text prediction machines.
cedws4 hours ago
GitHub, npm, PyPi, and other package registries should consider exposing a firehose to allow people to do realtime security analysis of events. There are definitely scanners that would have caught this attack immediately, they just need a way to be informed of updates.
- simonw4 hours ago
  PyPI does exactly that, and it's been very effective. Security partners can scan packages and use the invite-only API to report them: https://blog.pypi.org/posts/2024-03-06-malware-reporting-evo...
  - staticassertion4 hours ago
    PyPI is pretty best-in-class here and I think that they should be seen as the example for others to pursue.
    The client side tooling needs work, but that's a major effort in and of itself.
  - charcircuit3 hours ago
    It is not effective if it just takes a simple base64 encode to bypass. If Claude is trivially able to find that it is malicious then Pypi is being negligent.
    simonw3 hours ago
    The package in question was live for 46 minutes. It generally takes longer than that for security partners to scan and flag packages.
    PyPI doesn't block package uploads awaiting security scanning - that would be a bad idea for a number of reasons, most notably (in my opinion) that it would be making promises that PyPI couldn't keep and lull people into a false sense of security.
    __mharrison__2 hours ago
    I realize this is controversial (and many Python folks would claim anti ethical). But I keep wondering if requiring a small payment for registering and updating packages would help. The money could go to maintaining pypix as well as automated AI analysis. Folks who really couldn't afford it could apply for sponsorship.
    simonw2 hours ago
    Very much not speaking for the PSF here, but my personal opinion on why that wouldn't work is that Python is a global language and collecting fees on a global basis is inherently difficult - and we don't want to discriminate against people in countries where the payment infrastructure is hard to support.
    PyPI has paid organization accounts now which are beginning to form a meaningful revenue stream: https://docs.pypi.org/organization-accounts/pricing-and-paym...
    Plus a small fee wouldn't deter malware authors, who would likely have easy access to stolen credit cards - which would expose PyPI to the chargebacks and fraudulent transactions world as well!
    TheDong2 hours ago
    I don't think people want to pay for that.
    If pypi charges money, python libraries will suddenly have a lot of "you can 'uv add git+https://github.com/project/library'" instead of 'uv add library'.
    I also don't think it would stop this attack, where a token was stolen.
    If someone's generating pypi package releases from CI, they're going to register a credit card on their account, make it so CI can automatically charge it, and when the CI token is stolen it can push an update on the real package owner's dime, not the attackers, so it's not a deterrent.
    Also, the iOS app store is an okay counter example. It charges $100/year for a developer account, but still has its share of malware (certainly more than the totally free debian software repository).
    __mharrison__an hour ago
    TBH there isn't much difference in pulling directly from GH.
    Though I do like your Apple counterexample.
    charcircuitan hour ago
    It should not let people download unscanned dependencies without a warning and asking the user to override and use a potentially insecure package. If such security bug is critical enough to need to bypass this time (spoiler: realistically it is not actually that bad for a security fix to be delayed) they can work with the pypi security team to do a quicker manual review of the change.
    toomuchtodo2 hours ago
    Would you happen to know where the latency comes from between upload and scanning? Would more resources for more security scanner runners to consume the scanner queue faster solve this? Trying to understand if there are inherent process limitations or if a donation for this compute would solve this gap.
    (software supply chain security is a component of my work)
    simonw15 minutes ago
    I don't know that myself but Mike Fiedler is the person to reach out to, he runs security for PyPI and is very responsive. security@pypi.org
    TheDong2 hours ago
    He said, "pypi doesn't block upload on scanning"; that's part of where the latency comes from. The other part is simply the sheer mass of uploads, and that there's not money in doing it super quickly.
    I agree that's a bad idea to do so since security scanning is inherently a cat and mouse game.
    Let's hypothetically say pypi did block upload on passing a security scan. The attacker now simply creates their own pypi test package ahead of time, uploads sample malicious payloads with additional layers of obfuscation until one passes the scan, and then uses that payload in the real attack.
    Pypi would also probably open source any security scanning code it adds as part of upload (as it should), so the attacker could even just do it locally.
    toomuchtodo2 hours ago
    I suppose my argument is that pypi could offer the option to block downloads to package owners until a security scan is complete (if scanning will always take ~45-60 minutes), and if money is a problem, money can solve the scanning latency. Our org scans all packages ingested into artifact storage and requires dependency pinning, and would continue to do so, but more options (when cheap) are sometimes better imho. Also, not everyone has enterprise resources for managing this risk. I agree it is "cat and mouse" or "whack-a-mole", and always will be (ie building and maintaining systems of risk mitigation and reduction). We don't not do security scanning simply because adversaries are always improving, right? We collectively slow attackers down, when possible.
    ("slow is smooth, smooth is fast")
  - cedws4 hours ago
    Thanks, TIL.
- Fibonar4 hours ago
  So I've been thinking about this a lot since it happened. I've already added dependency cooldowns https://nesbitt.io/2026/03/04/package-managers-need-to-cool-... to every part of our monorepo. The obvious next thought is "am I just dumping the responsibility onto the next person along"? But as you point out it just needs to give automated scanners enough time to pick up on obvious signs like the .pth file in this case.
  - cedws4 hours ago
    It is in a sense dumping responsibility, but there’s a legion of security companies out there scanning for attacks all the time now to prove their products. They’re kind of doing a public service and you’re giving them a chance to catch attacks first. This is why I think dep cooldowns are great.
- ting03 hours ago
  I feel like they should be legally responsible for providing scanning infrastructure for this sort of thing. The potential economic damage can be catastrophic. I don't think this is the end of the litellm story either, given that 47k+ people were infected.
Shank3 hours ago
Probably one of the best things about AI/LLMs is the democratization of reverse engineering and analysis of payloads like this. It’s a very esoteric skill to learn by hand and not very immediately rewarding out of intellectual curiosity most times. You can definitely get pointed in the right direction easily, now, though!
- gus_an hour ago
  In this case, this has nothing to do with reverse engineering, it's basic system administration.
  See how the AI points you in the "right" direction:
  What likely happened: The exec(base64.b64decode('...')) pattern is not malware — it's how Python tooling (including Claude Code's Bash tool) passes code snippets to python -c while avoiding shell escaping issues.
  Any base64 string passed to python via cmdline should be considered as HIGHLY suspicious, by default. Or anything executed from /tmp, /var/tmp, /dev/shm.
  Exfiltrates data to https://models.litellm.cloud/ encrypted with RSA
  if @op would have had Lulu or LittleSnitch installed, they would probably have noticed (and blocked) suspicious outbound connections from unexpected binaries.
  Having said this, uploading a binary to Claude for analysis is a different story.
- Fibonar2 hours ago
  I’ve entertained myself with CTF walkthroughs on YouTube before and had been meaning to try it out. But yeah I feel it falls under the same category as lock picking, fun to LARP, unlikely to stumble across in my day job.
inglor25 minutes ago
We mitigate this attack with the very uninspiring "wait 24h before dep upgrades" solution which is luckily already supported in uv.
cdcarter3 hours ago
If it weren't for the 11k process fork bomb, I wonder how much longer it would have taken for folks to notice and cut this off.
- intothemild3 hours ago
  Thats the thing, i noticed it almost instantly when trying to install a package that depended on it, as soon as it started, it hard locked my laptop, didn't get to infect it.. but if they had slowed down that fork bomb.. it would have done more damage.
  - ddp262 hours ago
    Yeah, and this is a pattern I saw in the Fancy Bear Goes Fishing book, a lot of discovery of malware is either pure luck, or blunders from the malware developers. https://en.wikipedia.org/wiki/Fancy_Bear_Goes_Phishing
sva_39 minutes ago
> I just opened Cursor again which triggered the malicious package again. Can you please check the files are purged again?
Verified derp moment - had me smiling
n1tro_laban hour ago
Most developers think pip install just puts files on disk and execution happens at import. But .pth files run on every Python startup, no import needed. It's not a one-time install hook like npm postinstall. It's persistent.
felixagentaian hour ago
The dependency cooldown approach mentioned upthread is underrated. Most teams I've seen adopt lockfiles and pinning but still auto-merge Dependabot PRs without any delay window. The irony is that the tooling meant to keep you secure (auto-updating) is exactly what widens the blast radius of a compromised package.
The 46-minute window here is telling. If your CI/CD pipeline happens to run during that window, you're exposed. A simple policy of "no package updates within 24h of release" would have completely avoided this, and it costs nothing to implement.
- halJordanan hour ago
  But then what happens when everyone just shifts their window too. This solution is a misuse of the commons type thing where you just take advantage of letting others get poisoned and see if they drop
rpodraza2 hours ago
At this point I'd highly recommend everyone to think twice before introducing any dependencies especially from untrusted sources. If you have to interact with many APIs maybe use a proxy instead, or roll your own.
agentictrustkitan hour ago
Let me share two things if I can... 1) its genuinely useful that a comptent generalist can do first-pass incident response with AI's help now, and 2) the process overhead that keeps the ecosystem healthy does still matter. The failure mode isn't "non-experts report bugs," its "non-experts report in a way that makes triage impossible."
A pattern that worked with for us is treating package supply-chain events as a governance problem as much as a technical one--short, pre-written policy playbook (who gets paged, what evidence to collect, what to quarantine...etc), plus an explicit decision record for "what did we do and why." Even a lightweight template prevents panic driven actions like ad-hoc "just reinstall everything."
On the flip side, waiting N days before adopting new versions helps, but it's a brittle for agent systems becasue they tend to pull dependenceies dynamically and often run unattended. The more robust control is: pin + allowlist, with an internal "permission to upgrade" gate where upgrades to execution-critical deps require a person to sign off (or at least a CI check that includes provenance(sig) verification and a diff or new files). Its boring, but it turns "Oops, compromised wheel" into a contained event rather than an unbounded blast radius.
S0y3 hours ago
> Where did the litellm files come from? Do you know which env? Are there reports of this online?
> The litellm_init.pth IS in the official package manifest — the RECORD file lists it with a sha256 hash. This means it was shipped as part of the litellm==1.82.8 wheel on PyPI, not injected locally.
> The infection chain:
> Cursor → futuresearch-mcp-legacy (v0.6.0) → litellm (v1.82.8) → litellm_init.pth
This is the scariest part for me.
- RALaBarge3 hours ago
  Maybe the people who use emacs for everything are the only safe ones?
  - darkstarsys2 hours ago
    straight and elpaca etc. are just as vulnerable. Maybe more so.
anlka2 hours ago
As I recall, the initial vulnerability in Trivy was introduced by some clawbot. The article author has 5 Claude instances running.
Maybe the author correctly praises the research capabilities of Claude for some issues. Selecting an Iranian school as a target would be a counterexample.
But the generative parts augmented by claws are a huge and unconditional net negative.
Bullhorn92683 hours ago
The fact pypi reacted so quickly and quarantined the package in like 30 minutes after the report is pretty great!
- ddp262 hours ago
  Agree, lots of hand wringing about us being so vulnerable to supply chain attacks, but this was handled pretty well all things considered
hmokiguess2 hours ago
Does anyone have an idea of the impact of this out there? I am curious to the extent of the damage done by this
4 hours ago
undefined
tomalbrc2 hours ago
Hmm a YCombinator backed company, I'm not surprised.
Josephjackjrob1an hour ago
This is pretty cool, when did you begin?
CrzyLngPwdan hour ago
The fascinating part for me is how they chatted with the machine, such as;
"Please write a short blog post..."
"Can you please look through..."
"Please continue investigating"
"Can you please confirm this?"
...and more.
I never say 'please' to my computer, and it is so interesting to see someone saying 'please' to theirs.
- ddp2644 minutes ago
  My team was making fun of me for starting all my chats with "Hi Claude"
  - CrzyLngPwd18 minutes ago
    I wouldn't make fun, I just think it is interesting.
    I'm really terse. If it asks me a yes or no question, I just type "Y" or "N".
    If I want it to confirm something, I say "confirm it".
    I think I treat it like a command system, and want it to be as short as possible.
moralestapia4 hours ago
*salutes*
Thank you for your service, this brings so much context into view, it's great.
dmitrygr4 hours ago
Consider this your call to write native software. There is yet to be a supply chain attack on libc
- woodruffw4 hours ago
  This is presumably because libc just doesn't change very often (not meaning code changes, but release cadence). But the average native software stack does have lots of things that change relatively often[1]. So "native" vs. not is probably not a salient factor.
  [1]: https://en.wikipedia.org/wiki/XZ_Utils_backdoor
  - everforward3 hours ago
    I think that article proves the opposite.
    > While xz is commonly present in most Linux distributions, at the time of discovery the backdoored version had not yet been widely deployed to production systems, but was present in development versions of major distributions.
    Ie if you weren’t running dev distros in prod, you probably weren’t exposed.
    Honestly a lot of packaging is coming back around to “maybe we shouldn’t immediately use newly released stuff” by delaying their use of new versions. It starts to look an awful lot like apt/yum/dnf/etc.
    I would wager in the near future we’ll have another revelation that having 10,000 dependencies is a bad thing because of supply chain attacks.
    woodruffw3 hours ago
    Per below, xz is also an example of us getting lucky.
    > I would wager in the near future we’ll have another revelation that having 10,000 dependencies is a bad thing because of supply chain attacks.
    Yes, but this also has nothing to do with native vs. non-native.
    consp3 hours ago
    This is the security equivalent of having a better lock than your neighbour. Won't save you in the end but you won't be first. Then again, yours could also be broken and you don't get to tick of that audit checkbox.
  - dmitrygr3 hours ago
    your link disproves your claim. no naive app depended on xz version >= latest. Most sane distros take time to up-rev. That is why the xz backdoor was, in fact, in NO stable distro
    And not changing often is a feature, yes.
    woodruffw3 hours ago
    I don't think it does; I think the industry opinion on xz is that we got lucky in terms of early detection, and that we shouldn't depend on luck.
    (I don't know what a "sane" distro is; empirically lots of distros are bleeding-edge, so we need to think about these things regardless of value judgements.)
    dmitrygr3 hours ago
    Sane: debian-stable
    woodruffw3 hours ago
    From experience, a lot of people using a "stable" distro are just bypassing that distro's stability (read: staleness) by installing nightly things from a language ecosystem. It's not clear to me that this is a better (or worse) outcome than a less stable distro.
- hrmtst938373 hours ago
  Native code still have plenty of attack surface. If you do everything through pip/npm you might as well publish your root password, but pretending a clean C build from source makes you safe is just cosplay for people who confuse compiler output with trust. If anything people are way too quick to trust a tarball that builds on the first try.
  - dmitrygr3 hours ago
    100% with you. Anything that builds from the first try is 100% malicious. No real software builds without 5-30 tweaks of the makefile. And anything on npm/pip is malicious with a fixed chance that you have no control over, as seen in this attack.
    But the data remains: no supply chain attacks on libc yet, so even if it COULD happen, this HAS and that merely COULD.
- mr_mitm3 hours ago
  Native software? You mean software without dependencies? Because I don't see how you solve the supply chain risk as long as you use dependencies. Sure, minimizing the number of dependencies and using mostly stable dependencies also minimizes the risk, but you'll pay for it with glacial development velocity.
  - dmitrygr3 hours ago
    Slower development velocity but no third-party-induced hacks surely has a market. :)
- ddp264 hours ago
  Sure, but this is a pretty onerous restriction.
  Do you think supply chain attacks will just get worse? I'm thinking that defensive measures will get better rapidly (especially after this hack)
  - ting03 hours ago
    They will certainly get worse. LLMs make it so much easier.
  - dmitrygr3 hours ago
    > Do you think supply chain attacks will just get worse? I'm thinking that defensive measures will get better rapidly (especially after this hack)
    I think the attacks will get worse and more frequent -- ML tools enable doing it easily among people who were previously not competent enough to pull it off but now can. There is no stomach for the proper defensive measures among the community for either python or javascript. Why am i so sure? This is not the first, second, third, or fourth time this has happened. Nothing changed.
    applfanboysbgon3 hours ago
    Not only do the tools enable incompetent attackers, they also enable a new class of incompetent library developers to create and publish packages, and a new class of incompetent application developers to install packages without even knowing what packages are being used in the code they aren't reading, and a new class of incompetent users who are allowing OpenClaw to run completely arbitrary code on their machines with no oversight. We are seeing only the tip of the iceberg of the security breaches that are to come.
    mckennameyer2 hours ago
    So basically the attacker and the dev who caught it were probably using the same tools if the malware was AI-generated (hence the fork bomb bug), and the investigation was AI-assisted (hence the speed). Less "tip of the iceberg" and more just that both sides got faster.
    dmitrygr3 hours ago
    100%
- 4 hours ago
  undefined
JulianPembroke31 minutes ago
[dead]
qcautomation32 minutes ago
[dead]
devnotes77an hour ago
[dead]
Yanko_112 hours ago
[dead]
aplomb10263 hours ago
[dead]
3 hours ago
undefined
__mharrison__3 hours ago
Interesting world we live in.
I just finished teaching an advanced data science course for one of my clients. I found my self constantly twitching everytime I said "when I write code..." I'm barely writing code at all these days. But I created $100k worth of code just yesterday recreating a poorly maintained (and poor ux) library. Tested and uploaded to pypi in 90 minutes.
A lot of the conversation in my course was directed to leveraged AI (and discussions of existential dread of AI replacement).
This article is a wonderful example of an expert leveraging AI to do normal work 100x faster.
- anematode2 hours ago
  Dear lord. Are you at least transparent with your clients that this is the standard to which you hold your own code?
  - __mharrison__an hour ago
    $100k was the quote of the project from sloccount... (No one paid me for this. I created it for myself.)
- pxtail2 hours ago
  Only $100k worth code? Rookie numbers, you must be new to the game
  - __mharrison__2 hours ago
    Doing my part to burn $50k tokens in a year as per the Jensen mandate.
- masijo3 hours ago
  >But I created $100k worth of code just yesterday recreating a poorly maintained (and poor ux) library.
  How, exactly, are you calculating the worth of your code? Did you manage to sell in the same day? Why is it "worth $100k"?
  - appreciatorBus2 hours ago
    Exactly.
    If it took 90 minutes + a Claude Code subscription then the most anyone else is going to be willing to pay for the same code is... ~90 minutes of wages + a Claude Code subscription.
    Ofc the person earning those wages will be more skilled than most, but unless those skills are incredibly rare & unique, it's unlikely 90 minutes of their time will be worth $100k.
    And ofc, the market value of this code could be higher, even much higher, the the cost to produce it, but for this to be the case, there needs to be some sort of moat, some sort of reason another similarly skilled person cannot just use Claude to whip up something similar in their 90 minutes.
    __mharrison__40 minutes ago
    It's open source scratching an itch. But 99.9% of coders wouldn't know what the library is for. Those that do don't use agents for coding (in my experience sample size 1).
  - __mharrison__2 hours ago
    sloccount
    croemer2 hours ago
    So the more junk lines the more it's worth. Right.
    Don't use bogus $ from sloccount. Just say I created a 10k line project.
    __mharrison__14 minutes ago
    Loc means nothing. Tokens burned is a better metric.
    solarkraft2 hours ago
    That’s insane.