> Several critics seemed to assume I claimed Claude had "decompiled" the executable in the traditional sense. In reality, as I described in our conversation, it analyzed visible strings and inferred functionality - which is still impressive but different from true decompilation.
So I’m not sure that the implications are as big as the article author is claiming. It seems Claude is good at de-minifying JavaScript but that is a long way away from decompiling highly optimized binary code.
He had Claude gin up a post that would go viral and it added this hallucination:
Perfectly replicated the functionality [of the original game]
https://claude.ai/share/3eecebc5-ff9a-4363-a1e6-e5c245b81a16https://claude.ai/share/3eecebc5-ff9a-4363-a1e6-e5c245b81a16
Is the claim a general system exists? I'm extremely doubtful of that claim, but one that could do every published nes game to some current linux enviornment? Would definitely be easier than making something like current Claude.
> Understand dear reader that this technique can be done on any programming language and even from pre-existing binaries themselves.
Following that sentence, the author included a twitter embed pointing to a reddit thread about decompiling a binary. Only after I went to the reddit thread, I found that there was no decompilation involved.
> all information relevant to the execution of the program can certainly be recovered.
This is moving goalposts as it has little to do with my original premise.
Of course we're a long way off from this, but there's no reason it couldn't be done in theory.
1:1 Source recovery is never the goal of decompilation, or reverse engineering in general. You just want functionally equivalent, idiomatic code.
> 1:1 Source recovery is never the goal of decompilation
I am aware, but I appreciate the lesson. Now try to consider why that has no bearing on the points I've made.
Your original comment was (replying to one) about “decompilation” of binaries. You seem to be talking about some fantasy perfect decompilation that gets variable names too, but no such decompilation tool or human-employable method does this, and I would argue that variable names aren’t information so much as they’re implicit context - proper code implies certain variable names (stylistic differences notwithstanding) and vice versa.
> Some transformations irrecoverably lose information. A recontextualization engine such as an LLM might be able to "recover" some information by comparing it to other code in its training set, but it's still a guess and not all code will have representation in the training set.
It is a completely generalized statement about lossy transformations, aka transformations that aren't reversible with a 1 to 1 map. It says nothing in particular about decompilation, or getting variable names.
> I would argue that variable names aren’t information so much as they’re implicit context
You are free to argue what you want, but that doesn't change reality--variable names are information about the original state before it was irreversibly transformed. Is the information important to you and your current task? Who knows. But my original comment is absolutely correct.
That means even though information is irrecoverably transformed during compilation, LLM could come up with semantically valid expanded solution that both fits the compiled information, that is it can be recompiled successfully, and makes sense semantically for whoever would eventually want to evolve the decompiled code with an understanding of the domain.
So the goal isn't to recover exactly what code really was, but to provide a possible solution that fits and is useful.
It’s a guess, sure, but I don’t see why it would be a less good guess than a human’s.
> Stuff like variable names won’t be exact
Yes, this is the point of my comment around information loss. Lossy transformations inherently cannot be recovered with absolute certainty, and the degree of certainty depends on available context.
This is true for absolutely any system which produces lossy transformations, not just in the context of code and LLMs, but encoders, etc.
This is still a loss of information. You're completely misunderstanding the point: Minification is an irreversible process unless you happen to have exactly the context you need, and can verify the provenance of the source in question. Coming up with "better" variable names is not recovering information, it's extrapolating information about the old state to the point of error.
For that purpose there is no reason LLMs can’t be as good as the best humans.
You lose a lot more than whitespace, you lose semantics, metadata, and lots of other information that makes it so
Obviously byte coded languages are a bit easier, and you may get lucky and experts help a lot, and perhaps LLMS can help a little.
Both Rice's theorem and the system identification problem from the cybernetics days relate to why it is so hard.
> Given a system in the form of a black box (BB) allowing finite input-output interactions, deduce a complete specification of the system’s machine table (i.e., algorithm or internal dynamics).
AND
> Given a complete specification of a machine table (i.e., algorithm or internal dynamics), recognize any BB having that description.
Decompilation doesn't give you the equivalent of the original source code, it gives you new source code that appears to be functionally similar to it, and many people who has been forced to use decompilation to recover from lost source code or consultant time-bombs have run into the problems that admittently seem pretty counter intuitive.
Remember that Rice-Shapiro and Kreisel-Lacombe-Shoenfield-Tseitin extend Rice to partial and total functions in finite time.
Telling if a program is equivalent to a fixed other program, even for total functions is still undecidable as they are 'non-trivial' properties.
Most of the time you can make it work, but it isn't a case of:
> “let’s construct source that can be used interchangeably with the original”
I do appreciate your comment as it's not defensive, it's made in earnest, so please don't take my above statement to be a reflection of you in particular.
There is no good way to reconstruct that, AI/ML or not.
So why would the blue teams care beyond "oh fun, a new tool for speeding up malware decompilation"?
Edit: To be clear, I get the new reverse engineering and reimplementation possibilities got much better and simpler. But the alarmist tone seems weird.
That makes the tone make a bit more sense to me.
With decent backend controls - apps don't/shouldn't do much in the end. Once you show information on a screen consider it potentially gone.
You really need to be able to build + run + verify features + compare compiled outputs; then you can be somewhat confident it really did what the author is claiming.
> Systemically, I'm concerned that there is a lack of professional liability, rigorous industry best practices, and validation in the software industry which contributes to why we see Boeings flying themselves into the ground, financial firms losing everyone's data day in and out, and stories floating around our industry publications about people being concerned about the possibility of a remotely exploitable lunar lander on Mars.
> There's a heap of [comical?] tropes in the software industry that are illogical/counterproductive to the advancement of our profession and contribute to why other professions think software developers are a bunch of immature spoiled children that require constant supervision.
3 weeks ago you posted something titled "The future belongs to people who can just do things".
Today you post this:
> Because cli.mjs is close to 5mb - which is way bigger than any LLM context window out here. You're going to need baby sit it for a while and feed it reward tokens of kind words ("your doing good, please continue") and encourage it to keep on going on - even if it gives up. It will time out, lots...
I don't think you are someone who can just "do things" if you think a good way to de-obfuscate 5MB of minified javascript is to pass it to a massive LLM.
Do you think you are advancing your profession?
Obviously you don’t need an LLM to prettify obfuscated JavaScript. But take a look at the repo. It didn’t just add the whitespace back — it restored the original file structure, inferred function and variable names, wrote TypeScript type definitions based on usage, and added (actually decent) comments throughout the source code. That simply isn’t possible without an LLM.
Do you have a lot of experience with minified code?
Please link to a tool that can infer function/variable names and TypeScript type definitions from minified JS without using LLMs or requiring significant user input.
Also, it's guessing at the names, guessing at type definitions, and it makes further guesses based on its previous ones, correct or no. If you don't already know what you're doing, you're in trouble.
For someone to claim that they're sharing the "decompiled" source of Claude Code for the public good is self-important back-patting nonsense, let alone misleading.
This is profiling/projection. You're incapable of responding to the GP's points so you're instead emotionally lashing out and attacking them. This is not really suitable for HN.
> But making someone’s day worse on Hacker News isn’t a good way to deal with that.
This suggests that you're incapable of distinguishing criticism of someone's work with personal attacks on them (furthered by the profiling that you tried to conduct above). Those things are not the same. If your day is ruined by someone posting reasonable criticism of an article that you personally submitted to HN, a place explicitly designed for intellectual curiosity, your expectations need to be adjusted.
And you also profiled and personally attacked them.
> I didn’t say “day is ruined.”
> making someone’s day worse on Hacker News
Now you're continuing to be dishonest. For the purposes of this discussion, those are the same thing.
> I’m also not OP.
Reading my comment will show that I never said you were nor are any of the points I made predicated on that.
> I get the sense that you’re frustrated that something you’ve invested a lot of time and energy into learning is being automated. Maybe you’re scared because the technology has moved so quickly and your understanding of how to use it hasn’t kept up
Wrong on all counts. I use LLMs to write code all the time, and I know how they work, which is why I find processing 5MB of JS through one to be an obscene waste of energy.
I do not use LLMs to publicly claim abilities I don't already have myself. Reading this article does not worry me one bit about my job security.
At no point in this process does the author seem to stop and inspect the results to see if they actually amount to what he’s asking for. Claiming that this output represents a decompilation of the obfuscated target seems to require at least demonstrating that the resulting code produces an artifact that does the same thing.
Further, the claim that “Using the above technique you can clean-room any software in existence in hours or less.” is horrifyingly naive. This would in no way be considered a ‘clean room’ implementation of the supplied artifact. It’s explicitly a derived work based on detailed study of the published, copyrighted artifact.
Please step away from the LLM before you hurt someone.
Having spent my misguided youth doing horrible things to Sentinel Rainbow and its cousins - I can only chuckle.
SAAS pretty much made all of that obsolete, since with SAAS you get an unbeatable DRM for free.
If you create a new design that doesn't have the proprietary elements in it that's not grounds for copyright infringement?
https://en.m.wikipedia.org/wiki/Clean-room_design
But maybe you could use one LLM to study the software and write a specification, then throw that over the wall to a different human who uses an LLM to write software based on that spec
LLM1: code -> english description of code
LLM2: english description of code -> code
And that would be clean room? Might be cool to automate that. I bet you could train LLMs to do exactly that.
Asking it for its source code (AI never lies, right?) and then buying it on your personal card so corporate security doesn’t know what you’re doing makes me feel a lot better about it.
If you had it generate tests then handed the tests off to a second agent to implement against...
This reads to me like "Please understand that legal protections no longer matter because computers can now break the law for you automatically".
The AI has just made educated guesses about the functionality, wrote some sensible-looking code and hallucinated a whole lot.
The provided code on GitHub does not compile, does not work in the slightest, does not include any of the prompts from the original source, does not contain any API URLs and endpoints from the original, and uses Claude 3 Opus! And this is just from a cursory 5-minute look.
I let the author know on Twitter too: https://x.com/thegeomaster/status/1895869781229912233
If it's the former, I assume he will update or take down the blog post.
"if it's not" is so troubling
I’m pretty sure translation of a text into another language would still count as copyright infringement. It may be hard to prove, but this isn’t a copyright bypass.
All they've done so far is add an unnecessary step by putting a bounty on who will be the first to extract all the prompts and the agent orchestration layer.
> This is the meat of the application itself. It is your typical commonjs application which has been compiled from typescript.
Why is it .mjs then?
> After examining the provided code, I've determined that this appears to be a CLI application for Claude code-related functionality, built as a CommonJS TypeScript application that has been compiled with webpack.
Although, looking at the minified code, it seems to be using module.createRequire for CommonJS compatibility, so maybe it isn't completely wrong: https://nodejs.org/api/module.html#modulecreaterequirefilena...
I wonder if it is possible to transpile all the C Python modules to an api version that has no GIL, this way.
The author thinks this invalidates the business models of companies with closed source or mixed open and closed components. This misunderstands why companies license software. They want to be compliant with the license, and they want support from the team that builds the software.
Yes, hustlers can and will fork things just like they always have. There are hustlers that will fork open source software and turn it into proprietary stuff for app stores, for example. That's a thing right now. Or even raise investment money on it (IMHO this is borderline fraud if you aren't adding anything). Yet the majority of them will fail long term because they will not be good at supporting, maintaining, or enhancing the product.
I don't see why this is so apocalyptic. It's also very useful for debugging and for security researchers. It makes it a lot easier to hunt for bugs or back doors in closed software.
The stuff about Grok planning a hit on Elon is funny, but again not apocalyptic. The hard part about carrying out a hit is doing the thing, and someone who has no clue what they're doing is probably going to screw that up. Anyone with firearms and requisite tactical training probably doesn't need much help from an LLM. This is sensationalism.
I've also seen stuff about Grok spitting out how to make meth. So what? You can find guides on making meth -- whole PDF books -- on the clear web, and even more on dark web sites. There are whole forums. There's even subreddits that do not not (wink wink nudge nudge) provide help for people cooking drugs. This too is AI doom sensationalism. You can find designs for atomic bombs too. The hard part about making an a-bomb is getting the materials. The rest could be done by anyone with grad level physics knowledge, a machine shop, and expertise in industrial and electrical engineering. If you don't have the proper facilities you might get some radiation exposure though.
There is one area that does alarm me a little: LLMs spitting out detailed info on chemical and biological weapons manufacture. This is less obvious and less easy to find. Still: if you don't have the requisite practical expertise you will probably kill yourself trying to do it. So it's concerning but not apocalyptic.
> All you need is access to their source-code and they have given you the keys to the kingdom on a golden platter by going to market as...
Definitely going on my wall of shame.
No thanks.
I opened another article someone posted by the same author and now that I know they write like this, u couldn’t make it through the first paragraph. Absolute trash.
The real Thompson'd be more likely to say this guy has given his mind to a goddam machine and he's not even using his soul.
It also contains some gems previously unknown to me like Claude's binary VB to Python capabilities.