How Well Does AI Find Code Vulnerabilities?(ericfriese.substack.com)

5 pointsby weagle056 hours ago3 comments

logicx243 hours ago
Disclosure - I run https://tachyon.so/, an AI SAST tool.
It makes sense that SAST is better for the provided task. The CWE Top 25 seem like issues focused around patterns. Each one has a strictly enumerated set of vulnerable patterns that you can scan for, and then, the tool's task becomes simply finding an exploitable path to that pattern. This lends itself towards static methods. Every known weakness of LLMs, like hallucinations, needle-in-haystack, and context overflow, show up in this taint-analysis issue.
I also think this is why SAST did much better in Java. Pattern-based vulns + static languages make static taint analysis really powerful. LLMs have no advantage here, while all of their disadvantages are highlighted.
This article doesn't go into issues that LLMs are able to find that traditional SAST isn't. Auth vulnerabilities, for example - privilege escalation is a software pattern but not a code one, and it takes reasoning to build a permissions model and then test it for breaches. Business logic issues are other: ways users can get around usage limits, or get access to premium features or private data.
plexui5 hours ago
This makes sense. Most SAST tools have years of engineering behind them specifically for static analysis, while LLMs are general-purpose models trying to approximate reasoning from patterns.
The interesting question isn’t whether LLMs outperform SAST today, but whether they can complement them — for example, identifying logic-level issues, insecure design patterns, or unusual edge cases that rule-based tools might miss.
It feels like the future is hybrid: deterministic scanners for known classes of vulnerabilities, and LLMs for higher-level semantic and architectural analysis.
StevenThompson6 hours ago
I actually expected the models to perform better than they did, but I'm a bit of an optimist. It makes sense for the more mature SAST tool to outperform them just based on maturity alone.