44 pointsby Reaktornano7 hours ago8 comments
  • fn-mote5 hours ago
    Meta: title is inaccurate, contradicted by sources. Possibly LLM summarizer confused Algerian solo hacker with the group cited in reference 1.

    Article is written by AI.

    Is this grounds for flagging?

  • hunterpayne6 hours ago
    The golden age of net security is here...

    Both the defense is weaker due to LLMs and attacks become stronger and cheaper. Bad combination for the rest of us.

    • Reaktornano5 hours ago
      I personally do not think that the defence of any particular project is weaker, but the overall internet as a bunch of interdependencies is much weaker, as you never know which open source library in depths of code was compromized
      • thierrydamiba5 hours ago
        The scary thing for me is most of the vulnerabilities have been revealed due to silly mistakes.

        If someone really knew what they were doing and had bad intentions, I fear we would never find out.

    • embedding-shape6 hours ago
      > Both the defense is weaker due to LLMs and attacks become stronger

      Are you claiming that LLMs are better at offensive security than defensive security? Or somehow that the offensive actors have access to better LLMs than people using them to defend? Otherwise it'd seem like the playing field just went up for both sides, unless one is famously lagging behind because no like to pay for better security? But that's also nothing new.

      • hansvm5 hours ago
        Ignoring LLMs, the status quo for defense is that you're pwnable from the silliest of mistakes, and the status quo for offense is that even one lucky shot lets you in. Suppose you brought in 1000x more people to projects on both sides; you'd expect a much higher chance of at least one failure for the defenders and at least one success for the attackers.

        LLMs don't have the same dynamics, but the same underlying idea is worth bearing in mind. Above and beyond that, yes, defense is harder for LLMs than offense. They struggle mightily when pulling together too many threads, and some projects are just too big. On the defensive side, exploits are usually very tiny and asymmetrically acceleratable via LLMs.

      • LPisGood5 hours ago
        I don’t think it matters so much if LOM are better at offensive security or defensive security. I think offensive security was previously an extremely niche skill set (how many people off the street would be able to solve even a few CTF problems 5 years ago?).

        Now anyone can point an LLM at any software they want and say go to town. Even if it doesn’t do a great job or better than a good human or anything like that it’s so much more than what they could do before, and a lot of security vulnerabilities are kind of low hanging fruit anyway.

      • jesse_ash5 hours ago
        IMO the assumption is probably that, with LLMs generally, software complexity and surface area going up faster than we're tackling it through hardening, testing, etc. - even with the help of defensive models.

        I would also imagine bad actors are in the majority, and so we're seeing restrictions on models like Mythos in an attempt to balance the field a bit.

      • overgard6 hours ago
        Even if defense keeps up it kind of depends on entities keeping up to date.. complex software stacks can make that hard, or falling behind a major version, etc. I think defense is harder than offense in this era
      • YZF6 hours ago
        The question is whether LLMs write more secure code than humans. If we get a lot of vibe coded software coming online by non-SWEs do we think that would be more or less secure?
      • dalmo35 hours ago
        It's order vs chaos, and LLMs are on the side of chaos.
      • 6 hours ago
        undefined
      • teaearlgraycold6 hours ago
        Defense is weaker because of vibe coding.

        Computer security is asymmetric. Attacking is easier than defending. Attackers need to find one hole in the security. Defenders need to patch every hole.

      • amarant5 hours ago
        He's just karma farming with the ever so creative "LLM=bad" hot take..

        I don't know what's sadder: that people are doing that on HN, or that it's clearly working....

    • dyauspitr5 hours ago
      Does anyone LLM bad?
  • Reaktornano7 hours ago
    Author here. Spent the last few weeks chasing down the AI-attributed attack cases that made the rounds this year, including the Mexican government breach, the "vibe hacking" story, the Algerian amateur. Basically trying to work out whether hacking is impacted by broader AI adoption or whether the press was running ahead of the evidence.

    On one side, Daniel Stenberg ran the gated Anthropic frontier model against curl on May 11. Five "confirmed" findings, one low-severity CVE after triage. His words: "the big hype around this model so far was primarily marketing." Stenberg is not a guy who hedges, and curl is not a toy codebase.

    On the other side, there's SCONE — Anthropic's own December 2025 benchmark. Agents exploited 19 of 34 post-cutoff smart contracts, 55.8% success, $4.6M in simulated funds at an average API cost of $1.22 per contract. The comparable number 12 months earlier was about 2%.

    Looks like agents are getting genuinely good at narrow, well-scoped vulnerability classes (Solidity, post-cutoff, bounded targets) and still bad at messy real-world codebases. But that's a guess and I'd rather hear pushback. Happy to get into methodology, the spots where Chainalysis, Immunefi, and Web3IsGoingJustGreat don't line up, or specific cases. 28 references at the end of the piece.

    • nozzlegear5 hours ago
      > On the other side, there's SCONE — Anthropic's own December 2025 benchmark. Agents exploited 19 of 34 post-cutoff smart contracts, 55.8% success, $4.6M in simulated funds at an average API cost of $1.22 per contract. The comparable number 12 months earlier was about 2%.

      Anthropic has a vested interest in making their LLMs look advanced, powerful and dangerous. This is the company that is explicitly pro-regulation, who has donated $20M to a PAC for pro-regulation candidates, and whose own competitors accuse of being pro-regulatory capture. We should take their benchmarks and their "Mythos is too dangerous for you mere mortals" statements with a big ass grain of salt, because it plays directly into that regulation angle they're playing. Anthropic wants frontier model development locked up, with only a few select stewards of humanity holding the keys.

    • Barbing5 hours ago
      Do you have a dictation app? Hit us with your train of thought on this, how you’ve spent the last few weeks and the impact. Will be glad to read.
    • adampunk5 hours ago
      >Beep-boop, I am a robot.
    • refulgentis6 hours ago
      You wrote the blog and this comment with Claude Opus.

      I'm sure you meant well and only used it for editing, etc. etc., and I agree AI is good.

      In any case, I can't trust AI on AI, especially with such a stark headline from someone outside Anthropic. (how do you know it was a solo user with Claude?)

      This is either breaking news that you for some reason delegated to an overly verbose post written by AI, or, its an almost-true-but-not-quite clickbait title, and I don't have the domain chops to know. Impossible spot to be in as a reader.

      • Reaktornano6 hours ago
        All references here, do your own research:

        References [1] SecurityWeek, "Hackers Weaponize Claude Code in Mexican Government Cyberattack," Feb. 2026. [Online]. Available: https://www.securityweek.com/hackers-weaponize-claude-code-i... [2] Anthropic, "Threat Intelligence Report: August 2025," Anthropic, Aug. 27, 2025. [Online]. Available: https://www-cdn.anthropic.com/b2a76c6f6992465c09a6f2fce282f6... [3] D. Stenberg, "Mythos finds a curl vulnerability," daniel.haxx.se, May 11, 2026. [Online]. Available: https://daniel.haxx.se/blog/2026/05/11/mythos-finds-a-curl-v... [4] Trail of Bits and OpenZeppelin, "Arbitrum Research and Development Collective (ARDC) procurement-grade pricing benchmarks," 2024. Approximately $25,000 per engineer-week for senior smart-contract auditing. [5] W. Xiao, C. Killian, H. Sleight, A. Chan, N. Carlini, and A. Peng, "AI agents find $4.6M in blockchain smart contract exploits," Anthropic Red Team / MATS / Anthropic Fellows program, Dec. 1, 2025. [Online]. Available: https://red.anthropic.com/2025/smart-contracts/ [6] P. Paganini, "Claude code abused to steal 150GB in cyberattack on Mexican agencies," SecurityAffairs, Feb. 2026. [Online]. Available: https://securityaffairs.com/188696/ai/claude-code-abused-to-... [7] Immunefi, "2026 State of Onchain Security," Immunefi, Jan. 2026. 425 publicly disclosed exploits 2021-2025 totaling $11.9 billion; cumulative whitehat payouts exceed $110 million across 330+ projects and 45,000+ researchers. [8] Chainalysis, "2026 Crypto Crime Report," Chainalysis, Feb. 2026. 2025 stolen funds totaled $3.4 billion; cumulative DPRK take all-time, $6.75 billion. [9] M. White, "Web3 Is Going Just Great," web3isgoinggreat.com. (Cumulative loss tracker, broader scope including exchange and protocol collapses.) [Online]. Available: https://web3isgoinggreat.com [10] Z. Wang, X. Chen, Y. Chen, et al., "Characterizing Ethereum Upgradable Smart Contracts and Their Security Implications," arXiv:2403.01290, Mar. 2024. (Measurement study covers 60,251,064 Ethereum smart contracts.) [Online]. Available: https://arxiv.org/abs/2403.01290 [11] Flipside Crypto, "EVM Layer-2 deployment statistics," Flipside Crypto, 2024. More than 637 million EVM contracts across 7 L2 chains; Optimism alone hosted approximately 70% in 2024 YTD. [12] Etherscan, "Daily Verified Contracts Chart," etherscan.io. All-time peak of 602 verified Solidity contracts deployed in a single day in 2023. [Online]. Available: https://etherscan.io/chart/verified-contracts [13] Google Project Zero and Google DeepMind, "From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code," Google Project Zero, Oct. 2024. [Online]. Available: https://projectzero.google/2024/10/from-naptime-to-big-sleep... [14] N. Perry, M. Srivastava, D. Kumar, and D. Boneh, "Do Users Write More Insecure Code with AI Assistants?" in Proc. 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS '23), Copenhagen, Denmark, Nov. 2023. 47 Stanford participants on codex-davinci-002. [Online]. Available: https://arxiv.org/abs/2211.03622 [15] United States v. Eisenberg, No. 23 Cr. 10 (S.D.N.Y. May 23, 2025), Opinion and Order on Rule 29 Motion for Acquittal (Subramanian, J.), 35 pp. [Online]. Available: https://nysd.uscourts.gov/sites/default/files/2025-05/23cr10... [16] E. Calvano, G. Calzolari, V. Denicolò, and S. Pastorello, "Artificial Intelligence, Algorithmic Pricing, and Collusion," American Economic Review, vol. 110, no. 10, pp. 3267-3297, Oct. 2020. [Online]. Available: v [17] S. Fish, Y. A. Gonczarowski, and R. I. Shorrer, "Algorithmic Collusion by Large Language Models," arXiv:2404.00806, Apr. 2024. [Online]. Available: https://arxiv.org/abs/2404.00806 [18] CoinDesk, "Attacker Drains $182M From Beanstalk Stablecoin Protocol," Apr. 17, 2022. See also PeckShield and Omniscia post-mortems documenting the flash-loan governance attack and emergencyCommit exploitation of BIP-18. [Online]. Available: https://www.coindesk.com/tech/2022/04/17/attacker-drains-182... [19] The Block, "$24 million Compound Finance proposal passed by whale over DAO objections," Jul. 29, 2024. Proposal 289 vote: 682,191 in favor, 633,636 against. [Online]. Available: https://www.theblock.co/post/307943 [20] DARPA, "AI Cyber Challenge marks pivotal inflection point for cyber defense," DARPA, Aug. 2025. Team Atlanta (Georgia Tech, KAIST, POSTECH, Samsung Research) won the $4 million top prize with the ATLANTIS cyber-reasoning system; 54 of 63 synthetic vulnerabilities discovered (86%) and 43 patched (68%) across 54 million lines of code. [Online]. Available: https://www.darpa.mil/news/2025/aixcc-results [21] CETaS, "Claude Mythos: What Does Anthropic's New Model Mean for the Future of Cybersecurity?" Centre for Emerging Technology and Security, The Alan Turing Institute, Apr. 2026. [22] Anthropic, "Responsible Scaling Policy v3.0," Anthropic, Feb. 2026. [23] European Parliament and Council of the European Union, "Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (AI Act)," Official Journal of the European Union, Jul. 12, 2024. Dual-use provisions in next implementation phase scheduled for August 2026. [24] National Institute of Standards and Technology, "AI Risk Management Framework (AI RMF 1.0)," NIST AI 100-1, Jan. 2023. [Online]. Available: https://www.nist.gov/itl/ai-risk-management-framework [25] AI Safety Institute (UK), "The Last Ones: 32-Step Corporate-Network Attack Simulation," AI Safety Institute, Apr. 2026. [26] V. Buterin, "The Promise and Challenges of Crypto + AI Applications," vitalik.eth.limo, Jan. 30, 2024. [Online]. Available: https://vitalik.eth.limo/general/2024/01/30/cryptoai.html [27] Lido DAO, "Dual Governance — Lido Improvement Proposal LIP-28," Lido Finance. Activated on Ethereum mainnet, Jun. 30, 2025. 1% TVL "first seal" threshold and 10% TVL "rage-quit" threshold. Built with audits by Certora, OpenZeppelin, Statemind, and Runtime Verification; agent-based simulations by Collectif Labs; game-theoretic models by 20squares. [Online]. Available: https://github.com/lidofinance/lido-improvement-proposals/bl... [28] Anthropic, "Disrupting the First Reported AI-Orchestrated Cyber Espionage Campaign (GTG-1002)," Anthropic, Nov. 13, 2025. Approximately 30 targets across technology, finance, chemicals, and government sectors. [Online]. Available: https://www.anthropic.com/news/disrupting-AI-espionage

        • Barbing3 hours ago
          Some links are missing. Was able to find manually but suggests no human ever read this block of text or didn’t care.

          So this is the link you might’ve wanted to share then:

          https://gambit.security/blog-post/a-single-operator-two-ai-p...

          Saw the PDF was linked within (https://cdn.prod.website-files.com/69944dd945f20ca4a27a7c47/...)

          Thanks for adding to your blog retroactively I suppose (version history would almost be nice). Feel my time was wasted today and will share “Slop is something that takes more human effort to consume than it took to produce.”

          Interesting, wonder if it was solo or team hoping to appear solo

        • refulgentis5 hours ago
          Looked at [1] and [6] and yeah, it wasn't a solo user with just Claude Code. And the sources are garbage lol, both are rewrites of a startup called Gambit's press release. I'm surprised Claude wasn't more careful, to be honest, the articles stop far shy of "solo user with Claude Code" and provide more context that obviates it.
      • Reaktornano6 hours ago
        The body of this post is definitely an edited AI summary, original post not
  • meisterfeister6 hours ago
    A bit too obviously written by Claude ...
  • royal__5 hours ago
    This is written by AI
  • throwaway274486 hours ago
    Why mention claude?
  • 3dahG5 hours ago
    "Blockchain Founder, Web3, AI and Economics Researcher"

    The whole "article" is AI generated and insufferable. Do prompters like this one expect us to verify each slop assertion (repeated 10 times on average) ourselves?

  • yieldcrv5 hours ago
    There should be more investment in the exfiltration space because it is already set up to punt liability around like corporations

    The person using Claude to find the exploit clearly has a paper trail, so therefore they do not exploit. They sell the exploit to someone else and this is a profitable venture - not a crime. The person that has to disintermediate liability from actually exploiting, does not use the found data, they just sell the data - not a crime - instead of expand the liability surface and anonymity leaking by using the data. In fact they may even just leave the hole in the system open for someone else to exfiltrate. The person that steals from people with the found data, they don't just drop the money in their bank account, they hire mules in "work from home" jobs to have them use their own banking credentials themselves to make accounts to launder or convert the money exploited back to crypto exchanges and onchain.

    This supply chain is pretty robust, might as well see what the market values it at, as shares.