https://chatgpt.com/share/68e82db9-7a28-8007-9a99-bc6f0010d1...
if random.random() < 0.01:
logging.warning("This feels wrong. Aborting just in case.")
return None
try:
result = a / b
if math.isnan(result):
raise ArithmeticError("Result is NaN. I knew this would happen.")
It's like a fine wine pairing for "The Moon is a Harsh Mistress."
If you completely excise anything too distasteful for a current-day blockbuster, but want a film about a space mining colony uprising you might as well just adapt the game Red Faction instead: have the brave heros blasting away with abandon at corpo guards, mad genetic experimenters and mercenaries and the media coverage can talk about how it's a genius deconstruction of Elon Musk's Martian dream or whatever.
The only reason their libertarian revolution succeeds is because they have a centralised computer that secretly does everything for them.
same with pretty much every scifi movie and book from my youth. What movies that wouldn't have been rendered ridiculous by the invention of the cellphone were done in by the hairstyles or fashion.
This was my favorite line after asking it to review my resume and roast me:
> Structure & Flow: “Like Kubernetes YAML — powerful, but not human-readable.”
Some other good ones:
> Content & Tone: “You’re a CTO — stop talking like a sysadmin with a thesaurus.”
> Overall Impression: “This resume is a technical symphony… that goes on for too many movements.”
I've got some resume work to do haha
But then, I veered that same conversation into asking for GTM (go to market) advice, and it was actually really good. It actually felt tailored to me (unsurprisingly) and a lot more useful.
As always, I don't know whether this is a very light form of "ai psychosis" haha but still, super grateful for the advice. Cheers
<press enter>
damn these ai's are good!
<begins shopping for new username>
I can't say I'm not impressed. That's very funny
I love this and hate this at the same time.
} catch (Exception e) {
if (!((_ok) ? true : (Math.random() > 0.1))) {
return res;
}
final StringBuilder logError = (new StringBuilder("Server seen down: ")).append(_addr);
/* edited for brevity: log the error */
https://github.com/mongodb/mongo-java-driver/blob/1d2e6faa80...System.DmlException: Insert failed. First exception on row 0; first error: UNKNOWN_EXCEPTION, Something is very wrong: []
The AIs in general feel really focused on making the user happy - your example, and another one is how they love adding emojis to the stout and over-commenting simple code.
With RLVR, the LLM is trained to pursue "verified rewards." On coding tasks, the reward is usually something like the percentage of passing tests.
Let's say you have some code that iterates over a set of files and does processing on them. The way a normal dev would write it, an exception in that code would crash the entire program. If you swallow and log the exception, however, you can continue processing the remaining files. This is an easy way to get "number of files successfully processed" up, without actually making your code any better.
Well, it depends a bit on what your goal is.
Sometimes the user wants to eg backup as many files as possible from a failing hard drive, and doesn't want to fail the whole process just because one item is broken.
However, LLM generated code will often, at least in my experience, avoid raising any errors at all, in any case. This is undesirable, because some errors should result in a complete failure - for example, errors which are not transient or environment related but a bug. And in any case, a LLM will prefer turning these single file errors into warnings, though the way I see it, they are errors. They just don't need to abort the process, but errors nonetheless.
> And in any case, a LLM will prefer turning these single file errors into warnings, though the way I see it, they are errors.
Well, in general they are something that the caller should have opportunity to deal with.
In some cases, aborting back to the caller at the first problem is the best course of action. In some other cases, going forward and taking note of the problems is best.
In some systems, you might event want to tell the caller about failures (and successes) as they occur, instead of waiting until the end.
It's all very similar to the different options people have available when their boss sends them on an errand and something goes wrong. A good underling uses their best judgement to pick the right way to cope with problems; but computer programs don't have that, so we need to be explicit.
See https://en.wikipedia.org/wiki/Mission-type_tactics for a related concept in the military.
// Return the result
return result;
I find this quite frustrating when reading/reviewing code generated by AI, but have started to appreciate that it does make subsequent changes by LLMs work better.
It makes me wonder if we'll end up in a place where IDEs hide comments by default (similar to how imports are often collapsed by default/automatically managed), or introduce some way of distinguishing between a more valuable human written comment and LLM boilerplate comments.
It sounds fine and flows nicely, but it doesn't quite make sense. Too much training over-fits an LLM; that's not what we're describing. Bad training might traumatize a model, but bad how? A creative response would suggest an answer to that question—perhaps the model has been made paranoid, scarred by repeat exposure to the subtlest and most severe bugs ever discovered—but the LLM isn't being creative. Its response has that spongy, plastic LLM texture that comes from the model rephrasing its prompt to provide a sycophantic preamble for the thing that was actually being asked for. It uses new words for the same old idea, and a bit of the precision is lost during the translation.
There are plenty of "over-x" phrases in English associated with trauma or harm. Do a web search in quotes for "traumatic over{extension/exertion/stimulation}" (off the top of my head) and you'll get direct hits. And this isn't a Markov chain—its doesn't have to pull n-grams directly from its training material. That it could glue trauma and training into "traumatic over-training" is deeply unsurprising to me.
> I couldn't in a million years put it into writing as succinctly and as precisely as the LLM.
If that's the case, then (with respect) that may be down to your skills as a writer. The LLM puts it decently enough, but it's not very expressive and it doesn't add anything.
> Connecting RL, poor LLMs, extreme fear, and welfare to excess training and severe lasting emotional pain is pretty darn impressive
Is it? Really, we're just analogizing it to an abused pet. You over-train your dog, so it gets traumatized. The LLM connects the ideas and then synthesizes a lukewarm sentence to capture that connection at the cost of losing a degree of precision, because LLMs aren't animals. Models are good at those vector-embedding-style conceptual connections—I won't begrudge them that. Expressive use of language and fine-grained reasoning, though? Not so much.
king and rex (king in latin) map to different tokens but will map to very similar vectors.
And "毛片免费观看" (Free porn movies), "天天中彩票能" (Win the lottery every day), "热这里只有精品" (Hot, only fine products here) etc[1].
Some LLMs can output nerd font glyphs and others can't.
If I recall grok code fast can but codex and sonnet can't
Because, and this is a hot take, LLMs have emergent intelligence
My uninformed suspicion is that this kind of defensive programming somehow improves performance during RLVR. Perhaps the model sometimes comes up with programs that are buggy enough to emit exceptions, but close enough to correct that they produce the right answer after swallowing the exceptions. So the model learns that swallowing exceptions sometimes improves its reward. It also learns that swallowing exceptions rarely reduces its reward, because if the model does come up with fully correct code, that code usually won’t raise exceptions in the first place (at least not in the test cases it’s being judged on), so adding exception swallowing won’t fail the tests even if it’s theoretically incorrect.
Again, this is pure speculation. Even if I’m right, I’m sure another part of the reason is just that the training set contains a lot of code written by human beginners, who also like to ignore errors.
These aren't operating on reward functions because there's no internal model to reward. It's word prediction, there's no intelligence.
Subsequently, ChatGPT/Claude/Gemini/etc will go through additional training with supervised fine-tuning, reinforcement learning with reward functions whether human-supervised feedback (RLHF) or reward functions (RLVR, 'verified rewards').
Whether that fine-tuning and reward function generation give them real "intelligence" is open to interpretation, but it's not 100% plagarism.
In this, at least, AI may very well have copied our worst habits of “learning to the test.”
1. the code is actually wrong (and is wrong regardless of the absurd exception handling situation)
2. some of the exception handling makes no sense regardless, or is incoherent
3. a less absurd version of this actually happens (edit: commonly in actual irl scenarios) if you put emphasis on exception handling in the prompt
The RL objectives probably heavily penalize exceptions, but don't reward much for code readability or simplicity.
It's so annoying.
Furthermore, the code is happy to return NaN from the pre-checks, but replaces a NaN result from the division by None. That doesn't make any sense from an API design standpoint.
In go all SOTA agents are obsessed with being ludicrously defensive against concurrency bugs. Probably because in addition to what if driven development, there are a lot of blog posts warning about concurrency bugs.
In particular, I can't think of any non-pathological situation where a python developer should import logging and update logging.basicConfig within an inner function.
I think the Vexing Exceptions post is on the same tier as other seminal works in computer science; definitely worth a quick read or re-read once in a while.
BUT, to play devil's advocate a little: Most human coders should be writing a lot more try/catch blocks than they actually do. It's very common that you don't actually want an error in one section (however unlikely) to interrupt the overall operation. (and sometimes you do, it just depends)
One is that often I do want error handling, but also often I either know the error just won't happen or if it does, something is very wrong and we should just crash fast to make it easy to fix the bug.
But I am not really sure I would expect someone to know the difference in all cases just looking at some code. This is often an about holistically knowing how the app works.
A second thought - remember the experiment where an LLM was fine tuned on bad code (exploitable security problems for example) and the LLM became broadly misaligned on all sorts of unrelated (non-coding) tasks/contexts? It's as if "good or bad" alignment is encoded as a pretty general concept.
Error-handling is good aligned, which I think is why, even with lots of instructions to fail fast, it's still hard to get the LLM to allow crashing by avoiding error checking. It's gonna be even harder if you do want it to do some error checking, and the code it's looking at has some error checking
Less sarcastically but equally as true: they've learned from the tests you stole from people on the internet as well as the code you stole from people on the internet.
Most developers write tests for the wrong things, and many developers write tests that contain some bullshit edge case that they've been told to test (automatically to meet some coverage metric, or by a "senior" developer who got Dilbert principled away from the coalface and doesn't understand diminishing returns).
But then the end goal is to turn out code about as good as the average developer so they can be replaced more cheaply, so your LLM is meeting its objectives. Congrats.
One reason for this is that you typically lack a type system that allows 'making illegal states unrepresentable' to some extent, or possibly lack a team that can leverage the available type system to that effect due to organisational pressure, insufficient experience or whatever.
I really dislike their underuse of exceptions. I'm working on ETL/ELT scripts. Just let stuff blow up on me if something is wrong. Like, that config entry "foo" is required. There's no point in using config.get("foo") with a None check which then prints a message and returns False or whatever. Just use config["foo"] and I'll know what's wrong from the stack trace and exception text.
If you are actually doing safety critical software, e.g. aerospace, medicine or automotive, then this is a good precaution, although you will not be writing in Python.
I have to constantly remind Claude that we want to fail fast.
Just raise god damn it
I know it's Karpathy, which is why the entire prompt is all the more important to see.
[1] Probably with some "make you sure handle ALL cases in existence", or emphasis, along those lines.
LLMs often write tutorial-ish code without much care how it integrates with rest of codebase.
Swallowing exceptions is one such example.
# Step 3: Preemptively check for catastrophic magnitude differences
if abs(a) > sys.float_info.max / 2:
logging.warning("Value of a might cause overflow. Returning infinity just to be sure")
return math.copysign (float('inf'), a)
if abs(b) < sys.float_info.epsilon:
logging.warning("Value of b dangerously close to zero. Returning NaN defensively.")
return math.nan
Does the above code make any sense? I've not worked with this sort of stuff before, but it seems entirely unreasonable to me to check them individually. E.g. if 1 < b < a, then it seems insane to me to return float('inf') for a large but finite a.I even had this Cursor rule when I was using Claude:
"- Do not use statements to catch all possible errors to mask an error - let it crash, to see what happened and for easier debugging."
And even with this rule, Claude would not always adhere. Never had this issue with GPT-5.
Checked Exceptions are a good concept which just needed more syntactic-sugar. (Like easily specifying that one kind of exception should be wrapped into another.) The badness is not in the logic but in the ecology, the ways that junior/lazy developers are incentivized to take horrible shortcuts.
Checked exceptions are fundamentally the same as managing the types of return-values... except the language doesn't permit the same horrible-shortcuts for people to abuse.
Meme reaction: http://imgur.com/iYE5nLA
_____
Prior discussion: https://news.ycombinator.com/item?id=42946597
Also, division by zero should return Inf
You don’t need exceptions, and they can be replaced by more intricate return types.
OTOH, for the intended use case for signalling conditions that most code directly calling a function does not expect and cannot do anything about, unchecked exceptions reduce code clutter (checked exceptions are isomorphic to "more intricate return types"), at the expense of making the potential error cases less visible.
Whether this tradeoff is a net benefit is somewhat subjective and, IMO, highly situational. but if (unchecked) exceptions are available, you can always convert any encountered in your code into return values by way of handlers (and conversely you can also do the opposite), whereas if they aren’t available, you have no choice.
Most problems stem from poor PL semantics[1] and badly designed stdlibs/APIs.
For exogenous errors, Let It Crash, and let the layer above deal with it, i.e., Erlang/OTP-style.
For endogenous errors, simply use control flow based on return values/types (or algebraic type systems with exhaustive type checking). For simple cases, something like Railway Oriented Programming.
---
1. division by zero in Julia:
julia> 1 / 0
Inf
julia> 0 / 0
NaN
julia> -1 / 0
-Inf
Sometimes yes, sometimes no?
It's a domain specific answer, even ignoring the 0/0 case.
And also even ignoring the "which side of the limit are you coming from?" where "a" and/or "b" might be negative. (Is it positive infinity or negative infinity? The sign of "a" alone doesn't tell you the answer)
Because sometimes the question is like "how many things per box if there's N boxes"? Your answer isn't infinity, it's an invalid answer altogether.
The limit of 1/x or -1/x might be infinity (or negative infinity), and in some cases that might be what you want. But sometimes it's not.
For floating point there is the interesting property that 0 is signed due to its signed magnitude representation. Mathematically 0 is not signed but in floating point signed magnitude representation, "+0" is equivalent to lim x->0+ x and "-0" is equivalent to lim x->0- x.
This is the only situation where a floating point division by "zero" makes mathematical sense, where a finite number divided by a signed zero will return a signed +/-Inf, and a 0/0 will return a NaN.
Why should 0/0 return a NaN instead of Inf? Because lim x->0 4x/x = 4, NOT Inf.
I think the most pragmatic solution is to have 2 tiers:
1. use existing standards (i.e. IEEE 754 for FP, de-facto standards for integers, like two's complement, Big-Endian, etc.)
2. fast, native format per each compute device, using different sub-types so you will not be able to mix them in the same expression
a/0 = Inf when a>0
a/0 = -Inf when a<0
a/0 = NaN when a=0
In the context of say a/-0.001, a/-0.00000001, a/-0.0000000001, a/<negative minimum epsilon for denormalized floating point>, a/0
Then a/0 is negative when a>0, and positive when a<0
> According to the IEEE 754 standard, floating-point division by zero is not an error but results in special values: positive infinity, negative infinity, or Not a Number (NaN). The specific result depends on the numerator
Way back when during my EE course days, we had like a whole semester devoted to weird edge cases like this, and spent month on ieee754 (precision loss, Nan, divide by zero, etc)
When you took an ieee754 divide by zero value as gospel and put it in the context of a voltage divisor that is always negative or zero, getting a positive infinity value out of divide by zero was very wrong, in the sense of "flip the switch and oh shit there's the magic smoke". The solution was a custom divide function that would know the context, and yield negative infinity (or some placeholder value). It was a contrived example for EE lab, but the lesson was - sometimes the standard is wrong and you will cause problems if it's blindly followed.
Sometimes it's fine, but it depends on the domain
Can you give more context on your voltage math? Was the numerator sometimes negative? If the problem is that your divisor calculation sometimes resulted in positive zero, that doesn't sound like the standard being wrong without more info.
The numerator was always positive. The denominator was always negative (negative voltage is a pretty common thing), except when it became zero. That led to surprising behavior.
Right the whole point of the exercise was that sometimes the standard is wrong for your specific problem at hand. We spent lecture after lecture going over exactly how ieee754 precision loss worked, and other edge cases, so we could know how to exactly follow the standard.
Then we had an example where the sudden sign flip from a/-0.00000000001 = <huge_negative_number> to a/0 = <positive_infinity> would cause big problems with a calculation. If you didn't explicitly handle the divide by zero case and do the "correct for domain, but not following ieee754 standard" way, then you'd fry a component.
It's been a long time so I don't remember the exact setup, just the higher level lesson of "don't blindly follow standards and assume you don't need to check edge cases (exception or otherwise) because the standard does things a certain way".
But with exceptions you can’t use SIMD / vectorization.
(it is in principle possible to construct such a stack, potentially with more context, with a Result type, but I don't know of any way to do so that doesn't sacrifice a lot of performance because you're doing all the book-keeping even on caught errors where you don't use that information)
If you only need it for debugging, then maybe better instrumentation and observability is the answer.
I haven't needed to use a service like Fortinet recently and am now wondering if a LLM is part of their tool and if it's better/worse?
(I used to look out for kaparthy's papers ten years ago... i tend to let out an audible sigh when i see his name today)
I for one really enjoy both his longer form work and his shorter takes.