This intensification of work will not be good for workers’ health. Like, put your phone down man. You can’t be modeling this behavior to young people.
Further, the intensification of work is probably not even good for productivity in the long term. This periodic half-thinking about things without stepping away from the problems you are working to solve will lead to more half-assed solutions. Ideas need room to breathe and dedicated focus.
Burnout is real and these people have lost track of what's important.
That meme about how different countries treat out-of-office is very accurate.
For my personal efforts yes I very much want on demand access, I want the thoughts to flow. It doesn't feel like that would be anything but great at my job too, if my job didn't define such narrow bounds of mandatory butts in chairs. What work says it wants is not effective, is not going to get my best work, is not going to really make sense. It didn't make sense before & it's absurdly out of pace with the Happy Warrior mode of software development today.
1. I feel genuinely more productive, spending a lot less time on boilerplate and much more of my genuine time is spent thinking and communicating the thinking process.
2. I can take a ton of breaks, basically whenever I want. "Flow" is now entirely design flow and can be interrupted much more easily without damaging it.
3. If there is anything I actively dislike in my workflow or that makes me not enjoy my time... I can fix my workflow so that either I'm not the one doing it, or the item in question is no longer necessary.
AI is crazy. I get it, if you're at a shitty job that doesn't understand how to adapt well, it's tough... but if you're working on your own (like Simon does on this project), it's absolutely amazing and you're in full control of your life.
If I go “find issues in this code” it will hallucinate some, but if I say “can you check the recent change, there might be some things that introduced regressions, maybe?” Then it will be more cautious.
Also especially fable but opus too can talk back and advise you against going into a direction it thinks unwise.
And I’ve had much more success in clearing out why I think that is a better approach or asking it to clarify itself, as if if I tell it my assumption, sometimes it self corrects and starts doing what I needed in the first place, it was just coming at it from a different direction before. For example assuming I don’t care about cost and providing “the best solution” or trying to make something reusable where what I needed something quick (or vice versa)
It really is best to think of it as a gradient plane where it might get stuck in local minima, or you can prime it to “teeter on the edge” and able to flow into different directions.
If you had multiple people look at your PRs multiple times on different days results would be very similar.
It’s not perfect but usually it works pretty well, and I’ve had the model come back to me with oh actually the test passed, the bug doesn’t work exist
As a bonus, you’ve now got a test that can detect that bug if it comes up again.
The "keep improving" the code base prompt have been tried and it never works. The LLM has no consciousness of where to stop and where to draw the lines of reasonableness.
For a normal review loops you can ask the model to return with nothing found if nothing is found and not invent things and it will do a better job of exiting without anything found.
typically this means there is some ambiguity in the specification, and the model flips between alternative interpretations
Tell it something like:
Before doing any commits or producing a summary for the user, you must run a verification sub-agent.
Its goal is to adversarially and critically check your supposed findings to look out for false positives and hallucinations.
Doing so with a separate sub-agent with relatively clean context (but with all the relevant details of the problem space that appear to be facts) should improve our confidence in the findings.
Maybe also something like: Try to classify each found issue as either SERIOUS, CRITICAL or NITPICK, discard nitpicks, we only care about impactful issues.
It should somewhat cut down on the useless output.I've largely found the same in regards to generating code - the initial pass will often have bugs that the model itself can find but only when run as a separate sub-agent without the confidence poisoning in its own previous output.
Like when you do recursive programming, have you tried providing more/better stop conditions? If you literally just say "Continue until there are no more issues" then it'll do just that, but if you scope it better, like "Only mention issues related to X, Y or that leads to Z" and so on, you'll get less noise and more focus on issues that actually matter (to you).
Anyway it will never match your judgemend completely unless you upload your brain dump into model.
Not entirely true IME. Eventually the bug hunt will end with general design advices that may not be suitable to your use case and that you can skip.
I've had the same experience, but whenever I've reviewed what it finds it's basically right. It's pedantic, and a lot of the problems aren't things I really care about, but they definitely are real problems.
I'm not sure you can blame the AI for always finding problems if a) you asked it to, and b) there are problems to find.
Would you like it to stop when there's still flaws in the code?
(The fixed prices are just temporary discounts)
At least 2 things the random LinkedIn post will ignore, on purpose or not :
- price today remains low (even though they might feel higher than before), Uber is the business model, no secret there, it's a VC classic
- $150 spent by an expert, a software engineer with significant practical knowledge in AI, is not equivalent to the exact same amount spent by a novice.
Yet now that a number is out, you bet it will be used. Expect alarmist posts tomorrow morning in your feed claiming building software is now as cheap as diner at the restaurant.
Expert software engineers will still accidentally burn $500 or $5000 on tasks that don't work, or are not efficient. Amateurs will accidentally spend $100 to get something great.
So part of the change is a change in the risk structure of using frontier models. Before, you'd burn your quota; now, you can burn uncapped (less-capped) money.
Therefore, I created apsw-utils, a port of sqlite-utils to the amazingly-awesome apsw lib -- which is a really idiomatic sqlite lib for python. It's here: https://answerdotai.github.io/apswutils/
I've used it in lots of projects including in significant production stuff, and it's always worked great for me. IMO if you're serious about doing sqlite in python, at some point you'll probably want to check out apsw.
What specifically are you referring to? The apswutils website also does not explain.
https://docs.python.org/3/library/sqlite3.html#sqlite3.Conne...
You can still use previous behavior with "legacy" mode that lets you control when transactions are opened in which isolation level.
In what way does having autocommit=False hide more fine-grained transaction control?
autocommit=False gives full control to the programmer to do whatever they want.
SQLite behavior is here: https://sqlite.org/lang_transaction.html . The regular implicit transactions there plus explicit where needed aren’t supported in any python mode.
Specific examples would be extremely useful. You've done some work learning and deducing this stuff, others could learn if you would share and explain it.
> I upgraded to the Claude Max $200/month plan (I was previously on $100/month) to increase my Fable allowance for the remaining time until the July 7th Fablepocalypse, when even Claude Max subscribers will have to pay full API cost for the model.
I really wonder if Anthropic will stick with their decision to keep Fable on extra usage credits until they "get more compute", especially in the light of GPT 5.6 very likely coming out next week (it's confirmed to have the exact same pricing as GPT 5.5)
Finally have an explanation why GPT 5.5 xhigh felt dumber and dumber these last few weeks, always the same thing when a new model release is about to come out...
Yet the same claim is being posted every single day, including new claims that the Fable 5 model has degraded compared to the initial release, guardrails aside.
Anyways, heard about A/B testing before? ML people tend to like it a lot, hard to imagine neither OpenAI or Anthropic are already deep into categorizing people into buckets and running an wild amount of A/B testing all over the place, especially in the weeks leading up to new model releases, in various ways.
They are also testing the new models in their coding tools with select customers first.
People working at OpenAI have publicly denied that they are performing any kind of hidden routing or quantization of models after release for Codex. I tend to believe them.
It’s silly to act like this was an added cost in a vacuum, or that any costs translate directly into charity for arbitrary families. Also in some place it would even cover rent for half a day.
no need for charities or any sort. plenty of people in Software and plenty of people laid off every day.
yep. same. my electircity is ~100 USD / year.
So obviously people are going to take their lead and not get legal advice from some greasy dweeb at the bottom of HN.