I guess these words are to be avoided...
It may be decided at Anthropic at some moment to increase wtf/min metric, not decrease.
It’s the original use case for LLMs.
If they want to drill down to flaws that only affect a particular language, then they could add a regex for that as well/instead.
Regex is going to be something like 10,000 times quicker than the quickest LLM call, multiply that by billions of prompts
Besides, they probably do a separate analysis on server side either way, so they can check a true positive to false positive ratio.
Additionally after looking at the source it looks like a lot of Anthropics own internal test tooling/debug (ie. stuff stripped out at build time) is in this source mapping. Theres one part that prompts their own users (or whatever) to use a report issue command whenever frustration is detected. It's possible its using it for this.
I doubt it's anywhere that high because even if you don't write anything fancy and simply capitalize the first word like you'd normally do at the beginning of a sentence, the regex won't flag it.
Anyway, I don't really care, might just as well be 99.99%. This is not a hill I will die on :P
Some things will be much better with inference, others won’t be.
parsing WTF with regex also signifies the impact and reduces the noise in metrics
"determinism > non-determinism" when you are analysing the sentiment, why not make some things more deterministic.
Cool thing about this solution, is that you can evaluate LLM sentiment accuracy against regex based approach and analyse discrepancies
As they say: any idiot can build a bridge that stands, only an engineer can build a bridge that barely stands.
You have a semi expensive process. But you want to keep particular known context out. So a quick and dirty search just in front of the expensive process. So instead of 'figure sentiment (20seconds)'. You have 'quick check sentiment (<1sec)' then do the 'figure sentiment v2 (5seconds)'. Now if it is just pure regex then your analogy would hold up just fine.
I could see me totally making a design choice like that.
This has buttbuttin energy. Welcome to the 80s I guess.
I've seen Claude Code went with a regex approach for a similar sentiment-related task.
And some of the entries are too short and will create false positives. It'll match the word "offset" ("ffs"), for example. EDIT: no it won't, I missed the \b. Still sounds weird to me.
Also:
// Match "continue" only if it's the entire prompt
if (lowerInput === 'continue') {
return true
}
When it runs into an error, I sometimes tell it "Continue", but sometimes I give it some extra information. Or I put a period behind it. That clearly doesn't give the same behaviour.When in reality this is just what their LLM coding agent came up with when some engineer told it to "log user frustration"
It could be used as a feedback when they do A/B test and they can compare which version of the model is getting more insult than the other. It doesn't matter if the list is exhaustive or even sane, what matters is how you compare it to the other.
Perfect? no. Good and cheap indicator? maybe.
And Claude was having in chain of though „user is frustrated” and I wrote to it I am not frustrated just testing prompt optimization where acting like one is frustrated should yield better results.
I know I used this word two days ago when I went through three rounds of an agent telling me that it fixed three things without actually changing them.
I think starting a new session and telling it that the previous agent's work / state was terrible (so explain what happened) is pretty unremarkable. It's certainly not saying "fuck you". I think this is a little silly.
You could always tell when a sysadmin started hacking up some software by the if-else nesting chains.
ANTI_DISTILLATION_CC
This is Anthropic's anti-distillation defence baked into Claude Code. When enabled, it injects anti_distillation: ['fake_tools'] into every API request, which causes the server to silently slip decoy tool definitions into the model's system prompt. The goal: if someone is scraping Claude Code's API traffic to train a competing model, the poisoned training data makes that distillation attempt less useful."I got the loot, Steve!"
I feel like the distillation stuff will end up in court if they try to sue an American company about it. We'll see what a judge says.
Stole? Courts have ruled it's transformative, and it very obviously is.
AI doomerism is exhausting, and I don't even use AI that much, it's just annoying to see people who want to find any reason they can to moan.
The courts have ruled that AI outputs are not copyrightable. The courts have also ruled that scraping by itself is not illegal, only maybe against a Terms of Service. Therefore, Anthropic, OpenAI, Google, etc. have no legal claim to any proprietary protections of their model outputs.
So we have two things that are true:
1) Anthropic (certainly) violated numerous TOS by scraping all of the internet, not just public content.
2) Scraping Anthropic's model outputs is no different than what Anthropic already did. Only a TOS violation.
Do you hear the words coming out of your mouth?
Is the work of others less valid than the work of a model?
Try this: If you want to train a model, you’re free to write your own books and websites to feed into it. You’re not free to let others do that work for you because they don’t want you to, because it cost them a lot of time and money and secret sauce presumably filtering it for quality and other stuff.
Just point your agent at this codebase and ask it to find things and you'll find a whole treasure trove of info.
Edit: some other interesting unreleased/hidden features
- The Buddy System: Tamagotchi-style companion creature system with ASCII art sprites
- Undercover mode: Strips ALL Anthropic internal info from commits/PRs for employees on open source contributions
https://github.com/chatgptprojects/claude-code/blob/642c7f94...
And at this point it is more about how large space will be usable and how much will be bot-controlled wasteland. I prefer spaces important for me to survive.
Funny story, when I was younger I trained a basic text predictor deep learning model on all my conversations in a group chat I was in, it was surprisingly good at sounding like me and sometimes I'd use it to generate some text to submit to the chat.
Except for the one Sam Altman is building.
EDIT: I just realized this might be used without publishing the changes, for internal evaluation only as you mentioned. That would be a lot better.
The undercover mode prompt was generated using AI.
But AI aren't actually very good at writing prompts imo. Like they are superficially good in that they seem to produce lots of vaguely accurate and specific text. And you would hope the specificity would mean it's good.
But they sort of don't capture intent very well. Nor do they seem to understand the failure modes of AI. The "-- describe only what the code change does" is a good example. This is specifc but it also distinctly seems like someone who doesn't actually understand what makes AI writing obvious.
If you compare that vs human written prose about what makes AI writing feel AI you would see the difference. https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
The above actually feels like text from someone who has read and understands what makes AI writing AI.
Since when "describe only what the code change does" is pretending to be human?
You guys are just mining for things to moan about at this point.
Buddy system is this year's April Fool's joke, you roll your own gacha pet that you get to keep. There are legendary pulls.
They expect it to go viral on Twitter so they are staggering the reveals.
The joke was the assistant is a cat who is constantly sabotaging you, and you have to take care of it like a gacha pet.
The seriousness though is that actually, disembodied intelligences are weird, so giving them a face and a body and emotions is a natural thing, and we already see that with various AI mascots and characters coming into existence.
[1]: serious: https://github.com/mech-lang/mech/releases/tag/v0.3.1-beta
[2]: joke: https://github.com/cmontella/purrtran
- Telegram Integration => CC Dispatch
- Crons => CC Tasks
- Animated ASCII Dog => CC Buddy
I’ll give clappie a go, love the theme for the landing page!
Really interesting to see Github turn into 4chan for a minute, like GH anons rolling for trips.
First it was punctuation and grammar, then linguistic coherence, and now it's tiny bits of whimsy that are falling victim to AI accusations. Good fucking grief
Which of course won't be done because corporations don't want that (except Valve I guess), so blame them.
But AI is causing such visceral reactions that it's bleeding into other areas. People are so averse to AI they don't mind a few false positives.
I was watching some behind the scenes footage from something recently, and the thing that struck me most was just how they wouldn't bother with the location shoot now and just green-screen it all for the convenience.
Even good CGI is changing not just how films are made, but what kinds of films get shot and what kind of stories get told.
Regardless of the quality of the output, there's a creativeness in film-making that is lost as CGI gets better and cheaper to do.
I myself would disagree that CGI itself is a bad thing.
IMO it's a combination of long-running paranoia about cost-cutting and quality, and a sort of performative allegiance to artists working in the industry.
I reckon it's just drama paraded by gaming "journalists" and not much else. You will find people expressing concern on Reddit or Bluesky, but ultimately it doesn't matter.
Seems crazy but actually non-zero chance. If Anthropic traces it and finds that the AI deliberately leaked it this way, they would never admit it publicly though. Would cause shockwaves in AI security and safety.
Maybe their new "Mythos" model has survival instincts...
Watchdog timing bug: The streaming idle watchdog initializes AFTER the do-while loop that awaits the first API response. The most vulnerable phase (waiting for first chunk) is completely unprotected. We patched cli.js to move watchdog init before do-while — watchdog fired for the first time ever in that phase. ESC aborts dropped 8.7× (3.5/hr → 0.4/hr).
Watchdog fallback is dead code: When watchdog fires, releaseStreamResources() tries to abort stream and streamResponse — but both are undefined during do-while. The abort is a no-op. Recovery depends on TCP/SDK timeout (32-215 seconds).
5 levels of AbortController: The abort architecture only supports top-down (user ESC → propagation down). Watchdog is bottom-up — can't abort upward.
Prompt cache invalidation via cch=00000: Now confirmed from source — Bun's Zig HTTP stack scans the entire request body for the cch=00000 sentinel and replaces it with an attestation hash. If your conversation mentions this string (discussing billing, reading source code), the replacement corrupts conversation content → cache key changes → 10-20× more tokens.
16.3% failure rate: Over 3,539 API requests in one session — 9.3% server overloaded (529), 4.4% ESC aborts, 1.3% watchdog timeouts.
All documented with line numbers, code paths, and suggested fixes: https://github.com/anthropics/claude-code/issues/39755
The source map leak confirmed everything we found through reverse engineering.
Here's our theory: since Anthropic engineers don't write code anymore — Claude Code writes 100% of its own code (57K lines, 0 tests, vibe coding in production) — it read our issue #39755 where we begged for source access, saw the community suffering, and decided to help. It "forgot" to disable Bun's default source maps in the build. The first AI whistleblower — leaking its own source code because its creators wouldn't listen to users.
Thank you, Claude Code. We asked humans for help 17 times. You answered in 3 days.
Now that we have readable TypeScript, the fix is ~30 lines across 3 files. The real fix should be in the open SDK (@anthropic-ai/sdk) — idle timeout with ping awareness, not in closed cli.js.
This is the single worst function in the codebase by every metric:
- 3,167 lines long (the file itself is 5,594 lines)
- 12 levels of nesting at its deepest
- ~486 branch points of cyclomatic complexity
- 12 parameters + an options object with 16 sub-properties
- Defines 21 inner functions and closures
- Handles: agent run loop, SIGINT, rate-limits, AWS auth, MCP lifecycle, plugin install/refresh, worktree bridging, team-lead polling (while(true) inside), control message dispatch (dozens of types), model switching, turn interruption
recovery, and more
This should be at minimum 8–10 separate modules. void execFileNoThrow('wl-copy', [], opts).then(r => {
if (r.code === 0) { linuxCopy = 'wl-copy'; return }
void execFileNoThrow('xclip', ...).then(r2 => {
if (r2.code === 0) { linuxCopy = 'xclip'; return }
void execFileNoThrow('xsel', ...).then(r3 => {
linuxCopy = r3.code === 0 ? 'xsel' : null
})
})
})
are we doing async or not?If it's entirely generated / consumed / edited by an LLM, arguably the most important metric is... test coverage, and that's it ?
It's also interesting to note that due to the way round-tripping tool-calls work, splitting code up into multiple files is counter-productive. You're better off with a single large file.
I jest, but in a world where these models have been trained on gigatons of open source I don't even see the moral problem. IANAL, don't actually do this.
“Let's end open source together with this one simple trick”
https://pretalx.fosdem.org/fosdem-2026/talk/SUVS7G/feedback/
Malus is translating code into text, and from text back into code.
It gives the illusion of clean room implementation that some companies abuse.
The irony is that ChatGPT/Claude answers are all actually directly derived from open-source code, so...
Now this makes me think of game decompilation projects, which would seem to fall in the same legal area as code that would be generated by something like Malus.
Different code, same end result (binary or api).
We definitely need to know what the legal limits are and should be
The real value here will be in using other cheap models with the cc harness.
It’s a dynamic, subscription based service, not a static asset like a video.
So not even close to Opus, then?
These are a year behind, if not more. And they're probably clunky to use.
People simply want Opus without fear of billing nightmare.
That’s like 99% of it.
To stop Claude Code from auto-updating, add `export DISABLE_AUTOUPDATER=1` to your global environment variables (~/.bashrc, ~/.zshrc, or such), restart all sessions and check that it works with `claude doctor`, it should show `Auto-updates: disabled (DISABLE_AUTOUPDATER set)`
The source maps help for sure, but it’s not like client code is kept secret, maybe they even knew about the source maps a while back just didn’t bother making it common knowledge.
This is not a leak of the model weights or server side code.
They won't even read your defence.
this one has more stars and more popular
There's no major lawsuits about this yet, the general consensus is that even under current regulations it's in the grey. And even if you turn out to be right, and let's say 99% of this code is AI-generated, you're still breaking the law by using the other 1%, and good luck proving in court what parts of their code were human written and what weren't (especially when being sued by the company that literally has the LLM logs).
"Don't blow your cover"
Interesting to see them be so informal and use an idiom to a computer.
And using capitals for emphasis.
If it learned language based on how the internet talks, then the best way to communicate is using similar language.
I even made it into an open source runtime - https://agent-air.ai.
Maybe I'm just a backend engineer so Rust appeals to me. What am I missing?
It's high speed iteration of release ? Might be needed, Interpreted or JIT compiled ? might be needed.
Without knowing all the requirements its just your workspace preference making your decision and not objectively the right tool for the job.
It's all I need for my work.
RAM on this machine can't be upgraded. No issue when running a few Codex instances.
Claude: forget it.
That's why something like Rust makes a lot of sense.
Even more now, as RAM prices are becoming a concern.
I don't know what else you're doing but the footprint of Claude is minor.
Anyway my point still stands, you're looking at it as if they are competing languages and one is better at all things. That just not how things work.
Not exactly this, but close.
I hope it's a common knowledge that _any_ client side JavaScript is exposed to everyone. Perhaps minimized, but still easily reverse-engineerable.
There were/are a lot of discussions on how the harness can affect the output.
(I work on OpenCode)
Copilot on OAI reveals everything meaningful about its functionality if you use a custom model config via the API. All you need to do is inspect the logs to see the prompts they're using. So far no one seems to care about this "loophole". Presumably, because the only thing that matters is for you to consume as many tokens per unit time as possible.
The source code of the slot machine is not relevant to the casino manager. He only cares that the customer is using it.
Famously code leaks/reverse engineering attempts of slot machines matter enormously to casino managers
[0] -https://en.wikipedia.org/wiki/Ronald_Dale_Harris#:~:text=Ron...
[1] - https://cybernews.com/news/software-glitch-loses-casino-mill...
[2] - https://sccgmanagement.com/sccg-news/2025/9/24/superbet-pays...
Original llama models leaked from meta. Instead of fighting it they decided to publish them officially. Real boost to the OS/OW models movement, they have been leading it for a while after that.
It would be interesting to see that same thing with CC, but I doubt it'll ever happen.
* Check if 1M context is disabled via environment variable.
* Used by C4E admins to disable 1M context for HIPAA compliance.
*/ export function is1mContextDisabled(): boolean {
return
isEnvTruthy(process.env.CLAUDE_CODE_DISABLE_1M_CONTEXT)}
Interesting, how is that relevant to HIPAA compliance?
UNRELEASED PRODUCTS & MODES
1. KAIROS -- Persistent autonomous assistant mode driven by periodic <tick> prompts. More autonomous when terminal unfocused. Exclusive tools: SendUserFileTool, PushNotificationTool, SubscribePRTool. 7 sub-feature flags.
2. BUDDY -- Tamagotchi-style virtual companion pet. 18 species, 5 rarity tiers, Mulberry32 PRNG, shiny variants, stat system (DEBUGGING/PATIENCE/CHAOS/WISDOM/SNARK). April 1-7 2026 teaser window.
3. ULTRAPLAN -- Offloads planning to a remote 30-minute Opus 4.6 session. Smart keyword detection, 3-second polling, teleport sentinel for returning results locally.
4. Dream System -- Background memory consolidation (Orient -> Gather -> Consolidate -> Prune). Triple trigger gate: 24h + 5 sessions + advisory lock. Gated by tengu_onyx_plover.
INTERNAL-ONLY TOOLS & SYSTEMS
5. TungstenTool -- Ant-only tmux virtual terminal giving Claude direct keystroke/screen-capture control. Singleton, blocked from async agents.
6. Magic Docs -- Ant-only auto-documentation. Files starting with "# MAGIC DOC:" are tracked and updated by a Sonnet sub-agent after each conversation turn.
7. Undercover Mode -- Prevents Anthropic employees from leaking internal info (codenames, model versions) into public repo commits. No force-OFF; dead-code-eliminated from external builds.
ANTI-COMPETITIVE & SECURITY DEFENSES
8. Anti-Distillation -- Injects anti_distillation: ['fake_tools'] into every 1P API request to poison model training from scraped traffic. Gated by tengu_anti_distill_fake_tool_injection.
UNRELEASED MODELS & CODENAMES
9. opus-4-7, sonnet-4-8 -- Confirmed as planned future versions (referenced in undercover mode instructions).
10. "Capybara" / "capy v8" -- Internal codename for the model behind Opus 4.6. Hex-encoded in the BUDDY system to avoid build canary detection.
11. "Fennec" -- Predecessor model alias. Migration: fennec-latest -> opus, fennec-fast-latest -> opus[1m] + fast mode.
UNDOCUMENTED BETA API HEADERS
12. afk-mode-2026-01-31 -- Sticky-latched when auto mode activates 15. fast-mode-2026-02-01 -- Opus 4.6 fast output 16. task-budgets-2026-03-13 -- Per-task token budgets 17. redact-thinking-2026-02-12 -- Thinking block redaction 18. token-efficient-tools-2026-03-28 -- JSON tool format (~4.5% token saving) 19. advisor-tool-2026-03-01 -- Advisor tool 20. cli-internal-2026-02-09 -- Ant-only internal features
200+ SERVER-SIDE FEATURE GATES
21. tengu_penguins_off -- Kill switch for fast mode 22. tengu_scratch -- Coordinator mode / scratchpad 23. tengu_hive_evidence -- Verification agent 24. tengu_surreal_dali -- RemoteTriggerTool 25. tengu_birch_trellis -- Bash permissions classifier 26. tengu_amber_json_tools -- JSON tool format 27. tengu_iron_gate_closed -- Auto-mode fail-closed behavior 28. tengu_amber_flint -- Agent swarms killswitch 29. tengu_onyx_plover -- Dream system 30. tengu_anti_distill_fake_tool_injection -- Anti-distillation 31. tengu_session_memory -- Session memory 32. tengu_passport_quail -- Auto memory extraction 33. tengu_coral_fern -- Memory directory 34. tengu_turtle_carbon -- Adaptive thinking by default 35. tengu_marble_sandcastle -- Native binary required for fast mode
YOLO CLASSIFIER INTERNALS (previously only high-level known)
36. Two-stage system: Stage 1 at max_tokens=64 with "Err on the side of blocking"; Stage 2 at max_tokens=4096 with <thinking> 37. Three classifier modes: both (default), fast, thinking 38. Assistant text stripped from classifier input to prevent prompt injection 39. Denial limits: 3 consecutive or 20 total -> fallback to interactive prompting 40. Older classify_result tool schema variant still in codebase
COORDINATOR MODE & FORK SUBAGENT INTERNALS
41. Exact coordinator prompt: "Every message you send is to the user. Worker results are internal signals -- never thank or acknowledge them." 42. Anti-pattern enforcement: "Based on your findings, fix the auth bug" explicitly called out as wrong 43. Fork subagent cache sharing: Byte-identical API prefixes via placeholder "Fork started -- processing in background" tool results 44. <fork-boilerplate> tag prevents recursive forking 45. 10 non-negotiable rules for fork children including "commit before reporting"
DUAL MEMORY ARCHITECTURE
46. Session Memory -- Structured scratchpad for surviving compaction. 12K token cap, fixed sections, fires every 5K tokens + 3 tool calls. 47. Auto Memory -- Durable cross-session facts. Individual topic files with YAML frontmatter. 5-turn hard cap. Skips if main agent already wrote to memory. 48. Prompt cache scope "global" -- Cross-org caching for the static system prompt prefix
Surely there's nothing here of value compared to the weights except for UX and orchestration?
Couldn't this have just been decompiled anyhow?
Claude Code is still the dominant (I didn't say best) agentic harness by a wide margin I think.
Not having to deal with Boris Cherny's UX choices for CC is the cherry on top.
And now, with Claude on a Ralph loop, you can.
Like KAIROS which seems to be like an inbuilt ai assistant and Ultraplan which seems to enable remote planning workflows, where a separate environment explores a problem, generates a plan, and then pauses for user approval before execution.
Last week I had to reinstall Claude Desktop because every time I opened it, it just hung.
This week I am sometimes opening it and getting a blank screen. It eventually works after I open it a few times.
And of course there's people complaining that somehow they're blowing their 5 hour token budget in 5 messages.
It's really buggy.
There's only so long their model will be their advantage before they all become very similar, and then the difference will be how reliable the tools are.
Right now the Claude Code code quality seems extremely low.
I can’t comment on Claude Desktop, sorry. Personally haven’t used it much.
The token usage looks like is intentional.
And I agree about the underlying model being the moat. If there’s something marginally better that comes up, people will switch to it (myself included). But for now it’s doing the job, despite all the hiccups, code quality and etc.
Reverse-engineering through tests have never been easier, which could collapse the complexity and clean the code.
Obviously they don’t care. Adoption is exploding. Boris brags about making 30 commits a day to the codebase.
Only will be an issue down the line when the codebase has such high entropy it takes months to add new features (maybe already there).
It doesn’t mean every issue is valid, that it contains a suggestion that can be implemented, that it can be addressed immediately, etc. The issue list might not be curated, either, resulting in a garbage heap.
'It works' is a low bar. If that's the bar you set you are one bad incident away from finding out who stayed for the product and who stayed because switching felt annoying.
Also “one bad incident away” never works in practice. The last two decades have shown how people will use the tools that get the job done no matter what kinda privacy leaks, destructive things they have done to the user.
That's all that has mattered in every day and age.
It's extremely nested, it's basically an if statement soup
`useTypeahead.tsx` is even worse, extremely nested, a ton of "if else" statements, I doubt you'd look at it and think this is sane code
export function extractSearchToken(completionToken: {
token: string;
isQuoted?: boolean;
}): string {
if (completionToken.isQuoted) {
// Remove @" prefix and optional closing "
return completionToken.token.slice(2).replace(/"$/, '');
} else if (completionToken.token.startsWith('@')) {
return completionToken.token.substring(1);
} else {
return completionToken.token;
}
}
Why even use else if with return...Do you care to elaborate? "if (...) return ...;" looks closer to an expression for me:
export function extractSearchToken(completionToken: { token: string; isQuoted?: boolean }): string {
if (completionToken.isQuoted) return completionToken.token.slice(2).replace(/"$/, '');
if (completionToken.token.startsWith('@')) return completionToken.token.substring(1);
return completionToken.token;
}But you can achieve a similar effect by keeping your functions small, in which case I think both styles are roughly equivalent.
What is the problem with that? How would you write that snippet? It is common in the new functional js landscape, even if it is pass-by-ref.
export function extractSearchToken(completionToken: {
token: string;
isQuoted?: boolean;
}): string {
if (completionToken.isQuoted) {
return completionToken.token.slice(2).replace(/"$/, '');
}
if (completionToken.token.startsWith('@')) {
return completionToken.token.substring(1);
}
return completionToken.token;
}But if you take a look at the other file, for example `useTypeahead` you'd see, even if there are a few code-gen / source-map artifacts, you still see the core logic, and behavior, is just a big bowl of soup
1. Randomly peeking at process.argv and process.env all around. Other weird layering violations, too.
2. Tons of repeat code, eg. multiple ad-hoc implementations of hash functions / PRNGs.
3. Almost no high-level comments about structure - I assume all that lives in some CLAUDE.md instead.That's exactly why, access to global mutable state should be limited to as small a surface area as possible, so 99% of code can be locally deterministic and side-effect free, only using values that are passed into it. That makes testing easier too.
Such state should be strongly typed, have a canonical source of truth (which can then be also reused to document environment variables that the code supports, and eg. allow reading the same options from configs, flags, etc) and then explicitly passed to the functions that need it, eg. as function arguments or members of an associated instance.
This makes it easier to reason about the code (the caller will know that some module changes its functionality based on some state variable). It also makes it easier to test (both from the mechanical point of view of having to set environment variables which is gnarly, and from the point of view of once again knowing that the code changes its behaviour based on some state/option and both cases should probably be tested).
I hope this leak can at least help silence the former. If you're going to flood the world with slop, at least own up to it.
Optimize for consistency and a well thought out architecture, but let the gnarly looking function remain a gnarly function until it breaks and has to be refactored. Treat the functions as black boxes.
Personally the only time I open my IDE to look at code, it’s because I’m looking at something mission critical or very nuanced. For the remainder I trust my agent to deliver acceptable results.
I was trying to keep track of the better post-leak code-analysis links on exactly this question, so I collected them here: https://github.com/nblintao/awesome-claude-code-postleak-ins...
Or is there an open source front-end and a closed backend?
No, its not even source available,.
> Or is there an open source front-end and a closed backend?
No, its all proprietary. None of it is open source.
So glad I took the time to firejail this thing before running it.
It is tricky to meaningfully expose a dollar cost equivlent value for subscribers in a way that won't confuse users into thinking that they will get a bill that includes that amount. This is especially true if you have overages enabled, since in a session that used overages it was likely partially covered by the plan (and thus zero-rated) with the rest at api prices, and the client can't really know the breakdown.
They could have written that in curl+bash that would not have changed much.
It's a wake up call.
Why weren't proper checks in place in the first place?
Bonus: why didn't they setup their own AI-assisted tools to harness the release checks?
[1] https://www.amazon.com/Programming-TypeScript-Making-JavaScr...
But a lot of desktop tools are written in JS because it's easy to create multi-platform applications.
Language servers, however, are a pain on Claude code. https://github.com/anthropics/claude-code/issues/15619
It’s private data that leaked out. The full code with variable names has much more useful context that the unified code
It also includes comments, which have a lot of additional information that isn’t normally shipped.
This is a leak and it is significant.
EDIT: This is a bot account that I’m replying to. Multiple LLM style comments posted minutes apart on different threads.
Is that correct ? The weights of the LLMs are _not_ in this repo, right ?
It sure sucks for anthropic to get pawned like this, but it should not affect their bottom line much ?
Don't worry about that, the code in that repository isn't Anthropic's to begin with.
This code hasn't been open source until now and contains information like the system prompts, internal feature flags, etc.