Lately, though, I’ve been wrestling with a slightly uncomfortable thought: are we spending a disproportionate amount of money on traditional penetration tests for what they actually provide?
Don’t get me wrong — good testers are worth every cent. The sharp ones don’t just run tools. They think. They chain “low-risk” findings into real impact. They notice when something feels off, even if it doesn’t trigger a scanner. Some of the most critical issues I’ve seen were uncovered purely because a human followed a hunch.
But if I’m honest, a large chunk of many commercial engagements doesn’t look like that.
A lot of it is structured, repeatable work:
Recon
Enumeration
Checking common misconfigurations
Validating known vulnerability classes
Re-testing issues from last year’s report
And companies pay significant amounts — often tens of thousands — for a time-boxed assessment that results in a PDF. A snapshot in time.
Meanwhile, their environment changes constantly.
New features ship weekly. Cloud permissions drift. APIs get added. Infrastructure gets rebuilt from scratch with Terraform.
Yet testing often happens once a year, sometimes primarily to satisfy compliance requirements.
That disconnect is hard to ignore.
I’m starting to wonder whether there’s room for a different layer in the model — something that sits between vulnerability scanners and full-blown human red teams.
Specifically: an AI-driven system that behaves more like a persistent junior offensive analyst than a static scanner. Something that can:
Maintain authenticated sessions
Traverse application flows
Model attack paths instead of isolated findings
Re-test automatically after deployments
Continuously evaluate cloud permissions and exposure
Not to replace human testers. But to reduce the repetitive groundwork and provide continuous coverage between manual engagements.
The economics are interesting. We repeatedly pay experienced professionals to perform work that, in many cases, follows established patterns. That expertise is valuable — but not every task in an engagement requires senior-level creativity.
If 60–70% of the mechanical work could be automated in a way that’s context-aware and stateful (not just signature-based), it might free human testers to focus on the genuinely hard problems: business logic abuse, novel chaining, adversarial thinking.
Of course, there are real challenges:
Legal boundaries around active exploitation
Avoiding destructive actions in production
False positives eroding trust
Compliance frameworks that require “independent” third parties
The cultural weight of recognizable consultancy names
And there’s the deeper question: would security teams actually trust such a system? Or would it always be seen as “just another tool,” no matter how advanced it becomes?
I don’t have a product to pitch. I’m genuinely trying to sanity-check the idea.
Is there a real niche for continuous, AI-driven offensive coverage that complements — not replaces — human pen testers?
Or is this one of those concepts that sounds efficient on paper but collapses under real-world complexity?
Curious how others here see it.