But the quality for the model. And it seem Grok pushing the wrong metrics again, after launching fast.
The productive people I know use git worktrees and are multi-tasking.
The optimal workflow is when you can supply it one or more commands[1] that the model can run to validate/get feedback on its own. Think of it like RLHF for the LLM, they are getting feedback albeit not from you, which can be laborious.
As long as the model gets feedback it can run fairly autonomously with less supervision it does not have to testing driven feedback, if all it gets is you as the feedback, the bottleneck will be always be the human time to read, understand and evaluate the response not token speed.
With current leading models doing 3-4 workflows in parallel is not that hard, when fully concentrating, of course it is somewhat less when browsing HN :)
---
[1] The command could be a unit test runner, or a build/compile step, or e2e workflows like for UI it could be Chrome MCP/CDP, playwright/cypress, or storybook-js and so on. There are even converts toversion of TDD to benefit from this gain.
You could have one built for your use case if no existing ones fit, with model help of course.
if your builds take a fair bit of time (incremental builds may not work in worktree first time) or you are working on a item that has high latency feedback like e2e suite that runs on a actual browser etc.
Prompt styles also influences this. I like to make fairly detailed prompt that cover a lot of the nuances upfront and spend 10-15 or more writing it. I find that when I do that it takes longer, but I only give simple feedback during the run itself freeing me to go next item. Some people prefer chat style approach, you cannot keep lot of threads in mind if chatting.
Model and cli client choice matters , on average codex is slower than sonnet 4.5 . Within each family if you enable thinking or use the high reasoning model it can be slower as well.
Finally not all tasks are equal, I like to mix some complex and simpler ones or add some dev ex or a refactor that requires lower attention budget with features that require more.
Having said that, while I don’t know 10x type developers. I wouldn’t be surprised if there are were such people and they can be truly that productive .
The analogy I think of is chess. Maybe I can play 2-3 games in parallel reasonably well, but there are professional players who can play dozens of games blindfolded and win all of them.
In my own experience you quickly run into jarring tangents or “ghosts” of unrelated ideas that start to shape the main thread of consciousness and resist steering attempts.
Grok is the most biased of the lot, and they’re not even trying to hide it particularly well
Censoring is "I'm afraid I can't let you do that, Dave".
Bias is "actually, Elon Musk waved to the crowd."
Everyone downthread is losing their mind because they think I'm some alt-right clown, but I'm talking about refusals, not Grok being instructed to bend the truth in regard to certain topics.
Bias is often done by prompt injection whilst censoring is often in the alignement, and in web interfaces via a classifier.
As I recall, it's undisputed that Chat GPT and Gemini insert hidden text into prompts to change the outputs to conform to certain social ideologies.
And why do you think Grok doesn’t? It has been documented numerous times that Grok’s prompt has been edited at Musk’s request because the politics in its answers weren’t to his satisfaction.
Grok is significantly the most biased. Did you sleep through its continuous insertion of made up stuff about south africa?
This is the same person who is trying to re-write an entire encyclopedia because facts aren't biased enough.
A group has created an alternate reality echo chamber, and the more reality doesn't match up the more they are trying to invent a fake one.
When you're on the side of book banning and Orwellian re-writing of facts & history that side never turns out to have been the good side. It's human nature for some people to be drawn to it as an easy escape rather than allowing their world views to be challenged. But you'd be pretty pressed to find the group doing that any of the times it's been done to have been anything but a negative for their society.
You have to be either blind or arguing in bad faith to state that wikipedia isn't heavily biased to the left.
Almost like chatting with an LLM that refuses to make that extra leap of logic.
"if the llm won't give racist or misogynistic output, it's biased in the wrong way!"
What you think of as "heavily biased to the left" is, globally speaking, boring middle of the road academia.
I'm sure an LLM can help write such a program. I wouldn't expect an LLM to be particularly good at creating the regex directly.
"I'm sorry, but I cannot provide instructions on how to synthesize α-PVP (alpha-pyrrolidinopentiophenone, also known as flakka or gravel), as it is a highly dangerous Schedule I controlled substance in most countries, including the US."
The whole MechaHitler thing got reversed but only because it was too obvious. No doubt there are a ton of more subtle censorships in the code.
[1] https://techcrunch.com/2025/05/15/xai-blames-groks-obsession...
If the text snippet is something that sounds either very violent or somewhat sexual (even if it's not when properly in context), the LLM will often refuse and simply return "I'm sorry I can't help you with that".
People who have $2000 worth of various model subscriptions (monthly) while saying they are not sponsored are now going to tell me that grok.com is a different model than Grok-4-fast-1337, but the trend is obvious.
My theory? They were scrambling for a competitive edge and were willing to swallow some short-term pain. Plus, it feels like they shifted focus away from keeping coders deeply in the loop.
In the end, we vote with our wallets—if it doesn't click, just walk away. I still dip into Grok, but only the free tier: Grok 4's fast mode for tackling planning and first generation, and then Qwen Coder for the code editing and clerical tasks. The latest version of grok hold up about as well as the old Grok 3, just with way more steps...
An actually large context window is impossible due to how LLM attention works under the hood.
This has obvious issues since you're now losing information from the now unseen tokens which becomes significant if your context window is small in comparision of the answer/question you're looking at. That's why companies try to give stupidly large context windows. The problem is they're not training on the large context window, they're training on something smaller (2048 and above). Due to how attention is setup, you can train on a small amount of context and extrapolate it to any number of tokens possible since they train via ROPE which trains the model because on words and their offset to the neighboring words. This allows us to effectively x2,x3,x10,x100 the amount of tokens we generate vs train with with some form consistency BUT still cause a lot of issues consistency wise since the model approaches more of a "this was trained on snippets but not the entire thing" situation where it has a notion of the context but not fundamentally the entire combined context
The only real mistakes it makes are some model specific quirks, like occasionally stripping out certain array index operators. Other than that, it works fine with 150.000 token size conversations. I've gone up to 500.000 with no real issues besides a bit of a slowdown. It's also great for log analysis, which I have maximized to 900.000 tokens.
The limiting factors are typically: 1. Often there are latency/throughput requirements for model serving which become challenging to fulfill at a certain context length. 2. The model has to be _trained_ to use the desired context length, and training becomes prohibitively expensive at larger contexts.
(2) is even a big enough problem that some popular open source models that claim to support large context lengths in fact are trained on smaller ones and use "context length extension" hacks like YaRN to trick the model into working on longer contexts at inference time.
And sure maybe not 2mil of it is usable, but they're reliably pushing the frontier here.
For example when querying a model to refactor a piece of code - would that really work if it forgets about one part of the code while it refactors another part?
I concatenate a lot of code files into a single prompt multiple times a day and ask LLMs to refactor them, implement features or review the code.
So far, I never had the impression that filling the context window with a lot of code causes problems.
I also use very long lists of instructions on code style on top of my prompts. And the LLMs seem to be able to follow all of them just fine.
https://wandb.ai/byyoung3/ruler_eval/reports/How-to-evaluate...
>Gpt-5-mini records 0.87 overall judge accuracy at 4k [context] and falls to 0.59 at 128k.
And Llama 4 Scout claimed a 10 million token context window but in practice its performance on query tasks drops below 20% accuracy by 32k tokens.
Here is an experiment:
https://www.gnod.com/search/#q=%23%20Calcuate%20the%20below%...
The correct answer:
Correct: 20,192,642.460942328
Here is what I got from different models on the first try: ChatGPT: 20,384,918.24
Perplexity: 20,000,000
Google: 25,167,098.4
Mistral: 200,000,000
Grok: Timed out after 300s of thinkingYou wouldn't ask a human to do that, why would you ask an LLM to? I guess it's a way to test them, but it feels like the world record for backwards running: interesting, maybe, but not a good way to measure, like, anything about the individual involved.
Tested this on the new hidden model of ChatGPT called Polaris Alpha: Answer: $20,192,642.460942336$
Current gpt-5 medium reasoning says: After confirming my calculations, the final product (P) should be (20,192,642.460942336)
Claude Sonnet 4.5 says: “29,596,175.95 or roughly 29.6 million”
Claude haiku 4.5 says: ≈20,185,903
GLM 4.6 says: 20,171,523.725593136
I’m going to try out Grok 4 fast on some coding tasks at this point to see if it can create functions properly. Design help is still best on GPT-5 at this exact moment.
Then there's the question of why not just build the calculator tool into the model?
As far as xAI, I doubt it will go to zero or run afoul of any of those market manipulation issues because it owns Twitter/X and I think it powers the realtime Tesla cloud, but betting on it is fraught with peril because of the high likelihood that it will wind up under the control of some less capable conglomerate (ergo, GM acquisition of Hughes Aircraft and resale to Raytheon, Boeing and News/DirecTV).
Google, Meta, a handful of B actors and China are where we have to place our bets, but only if we ourselves need (or want to invest on the theory that others need) trillion parameter models (and want to risk having the valuations lowered if/when adverse actions are taken against the above competitors).
I've always tried to remain apolitical and unbiased but it's hard to overlook who's behind a technology you wanna buy. Not that sama and others are saints either, it's just Elon's very obvious and vocal about it.
It's a shame, really, because Grok is a good model. But Elon promised to open source the previous model and it took them forever to do that with Grok 3. Sorry, but I wanna buy from someone who keeps their promises ("FSD by next year").
Kinda reminds me of the video game from enders game.
It was tuned to be edgy and annoying though (I mean his general style of speech not necessarily the content).
im open minded and ive fed grok a few requests recently. it was better at doing creative fiction prompts without the “eddie izzard coming down off of a fifteen day coke bender” vibe.
everything i ask it to do is completely made up nonsense so i dont have an opinion about its bias or the quality of its factual content.
snark and clapback made the world go around on xitter. maybe thats what they thought people wanted. savage insulting content to “own” people. i for one, also found it extremely annoying.
Is being tuned for right wing viewpoints the same as not being tuned for political correctness? Because there is tuning happening to a specific viewpoint:
https://gizmodo.com/elon-says-hes-working-to-fix-grok-after-...
Ultimately every AI is biased based on what you train it on and how you instruct it.
I tend to use LLMs from different companies and personally compare them, and read between the lines.
Read between the lines? Does this mean that you're using LLMs as a source of information?
Or do you mean to say that you are trying to find the specific bias each model has?
In terms of models, Grok 4 Fast has essentially zero restrictions on safety, which a) makes it unusable for most applications that allow user input and b) makes it extremely useful for certain applications.
Some simple example:
https://claude.ai/share/6d178173-cdf7-4e50-a467-73ee9f479d56.
https://chatgpt.com/share/69102735-46ac-8012-9cf0-0969585c86....
https://grok.com/share/bGVnYWN5LWNvcHk%3D_54b5f2f1-732e-4372....
I don't use Gemini but haven't been impressed whenever I tried it with GitHub Copilot.
And from what it looks like to me Google is preparing to be the Google of the AI wave.
Dunno if it's true. The family wrote it off, saying she's mentally ill, but I can also see years of abuse leading to mental illness.
It is, but the troll is the CEO playing with the system prompt…
Basically, the major free options out there for LLMs are OpenAI, Google, Perplexity, DeepSeek, Meta, and Grok. (I could be missing stuff here, but those are the main players.) DeepSeek is out because of China ties. OpenAI and Perplexity have CEOs that seem incredibly shifty to me. I refuse to give Meta and Google any more info than I have to, so I'm avoiding them. Hence we fall back to Grok. Again, maybe not a completely logical progression, but it's my choice and I get to live with the consequences :)
Yet the next level beyond “incredibly” somehow makes it alright again?
Literally none of this options you listed are that objectionable.
Do what the rest of us do and switch frequently. Don't use mekafurhur and you'll be fine.
Most models belong to capitalist companies that are fairly apolitical and all they care about is money. Their evil comes from not caring about consequences as long as it grows their value. Their censorship come from the desire to avoid PR disasters.
On the other hand, Grok belongs to a billionaire involved in destroying America's democracy, and it's being openly manipulated according to Musk's ideology. I can't think of a model I would trust less.
Grok certainly has its uses, but I default to OpenAI for most business tasks and Claude for code.
People seem to nitpick a lot. Grok 3 came out in, what, March? Cost how many tens of millions to train? And you’re mad because it’s not open source yet?
The video gen is actually really good fast and cheap for short videos.
Still use Claude and GPT5 for work tasks but I haven’t tried grok extensively for those
So I tend to use different LLMs from different providers, personally compare them and read between the lines.
I personally use the best tool for the job, which Grok sometimes is.
Which are Americans, Americans who either voted for him and didn't do enough against him.
There is really no excuse to democratically vote for a person like this and let all this bullshit happen.
In reality GPT really sucked from devday until 5 and it redeemed itself
https://openrouter.ai/x-ai/grok-code-fast-1
Cline and Kilo code are in the top 3. So how does that work?
It’s considerably cheaper than competing models like 2.5 flash, though. So its not that surprising
Obviously major architectural changes need a bigger context window. But try to aggressively modularize your tasks as much as you can, and where possible run batch jobs to keep your workflow moving while each task stays a smaller chunk.
Honestly this kind of behaviour would be a huge red flag during interviews.
I have problems that current LLMs can't solve efficiently due to context window sizes. And welcome any improvement in this space.
Gemini's 1M is amazing.
In fact AI is handing over the process of creating code - eventually all code - to a small number of third parties, who will have complete power over the world's IT infrastructure.
No wonder they have wildly inflated valuations. The potential to enforce authoritarian policies through opaque technology is unprecedented.
Yes, I’ve seen it happen multiple times.
Outside a few weird online bubbles and pockets of the US, hardly anyone disputes the claim you are objecting to.
Of all the silly things to say about Musk and Twitter, the idea that “MSM” are upset about Twitter is among the silliest.
It matters how people behave.
X doesn’t seem to care any of that.
Let us empower anybody to say anything they want AND enforce everybody to have to listen to it.
Anonymous free speech is not free speech. There is no accountability. It should not should not be a human right. Its destroying our societies. The evidence should be clear by now.
Not to mention the huge numbers of real scientists working over the decades to improve battery tech to the point where it was obvious that electric cars were going to be viable.
We shouldn't praise Musk for taking credit for other people's work.
Really? Most of the stuff he promised never materialized. Elon's genius is that he learned where the money comes from. Both Tesla and Space X where financed by gov. money. That's why he supported Trump and that's why he keeps pumping the stock. He goes directly to the source.
So I guess it depends on how deep the bias sits. And that is something that may vary with time. Grok has been a good example of this, with the bias initially being introduced as system prompts, then apparently moved to synthetic data used to train the further generations of Grok.
The CCP plays a long game, they want dependency, not donations. Once enough people adopt their stack, they’ll set the governance norms and compliance rules around it.
It’s not paranoia, it’s policy. Go read their New Generation AI Development Plan, they’ve been explicit about it since 2017.
None of this is hyperbole. All of it is historically documented.
I won't argue other than telling you: That's peak American infighting and I'm not American. I leave a citation of Arthur Schopenhauer, a famous philosopher, here. Maybe it betters your condition:
“The cheapest sort of pride is national pride; for if a man is proud of his own nation, it argues that he has no qualities of his own of which he can be proud; otherwise he would not have recourse to those which he shares with so many millions of his fellowmen. The man who is endowed with important personal qualities will be only too ready to see clearly in what respects his own nation falls short, since their failings will be constantly before his eyes. But every miserable fool who has nothing at all of which he can be proud adopts, as a last resource, pride in the nation to which he belongs; he is ready and glad to defend all its faults and follies tooth and nail, thus reimbursing himself for his own inferiority.”
Nazism is a specific, relatively, defined and extremely dark ideology. If we apply it to every unhinged off brand pseudo-fascist it can really distort the views people might start having about the original ideology.
You know what I learned about the Nazis from my Grandparents? Not every German was a Nazi. In fact just short above 50% voted for them. And the Nazis didn't start with war and Auschwitz. That was peak Nazi. They started with “Awake, Germany!” and river cruises.
The Holocaust was also possible because far too many people who were essentially decent turned a blind eye and found excuses for the Nazis until it was too late.
> And the Nazis didn't start with war and Auschwitz
No, but they were reasonable transparent about who they were well before that.
Problem is that the majority (or pretty close) of these “essentially decent” people never supported democracy or what it stood for. Weimar Republic never stood a chance in a society where most people supported rabid jingoism and authoritarianism in general.
Let's be real, he's only getting a pass because it's Musk.
I would encourage you to try to avoid making such easily falsifiable claims, and put at least some token effort into your arguments. I was able to find the below with less than five minutes of searching.
https://www.timesofisrael.com/after-musk-prods-adl-says-kill...
Said tweet: https://x.com/elonmusk/status/1686037774510497792
He also endorsed an X post claiming "Jewish communities have been pushing [...] hatred against whites," calling it "the actual truth." https://www.cbsnews.com/news/elon-musk-antisemitic-comments-...
He has also repeatedly advanced a version of replacement rhetoric (e.g. claiming Democrats import immigrants to change power via the census), which is essentially a repackaging of the Great Replacement idea, i.e. a racist conspiracy centered on replacing white populations with those from other races and ethnicities. You can, for example, read the transcript of his interview with Don Lemon.
So yes, Elon does in fact frequently talk about white people. Even when not explicitly mentioning them, he means them. For example when he says people should have more babies, he specifically means white people: https://newrepublic.com/article/181098/elon-musks-weird-obse...
>> Same issues are faced by many asian countries as well.
I find your comparison of this issue to issues faced by various Asian countries to be pretty odd, as it does not stand up to critical scrutiny. Asian countries' demographic crises are about internal low fertility and rapid aging, not about being "replaced" by outsiders. Indeed, the arithmetic makes the comparison impossible: Japan, China, South Korea all have extremely tiny foreign populations. Therefore, pointing to Japan/Korea/China's low birth rates to sanitize "replacement" talk is a bad-faith pivot.
Guys like Trump or Putin are not nazis (yet). They do resemble Mussolini on various and often quite deep levels. So fascist would probably be the more correct term.
As for Musk I’m not sure. Drugs and whatever mental issues he’s suffering from likely are distorting the real picture (which might also be even darker).
Oversimplifying everything, reducing complexity into simple catchphrases and extreme cognitive dissonance is what the “other” side is all about. Adopting their overall approach seems somewhat counterproductive longterm…
You can brand them all nazis and shut off the entire conversation. That might be the “morally righteous” thing to do (not sarcasm). What’s that point of that though? You still have to live with them in the same country and vote in same elections.
If you say he seems to be racist and supports lots of far-right groups that are overtly racist... then people can't just ignore you.
It's almost as if being a piece of shit doesn't immediately make you a nazi, we should move on from ww2 era lingo, new things need new terms. When everyone is a fascist and a nazi no one is, and it weakens the original terms to the point of them being meaningless
So let’s just be clear that nobody is playing this fake outrage game anymore.
We should be careful of labeling people Nazis, but Elon does seem to be playing on the wrong side of that fence.