The six lines of Python suggested for that function can also be replaced with a simple “return max(arr)”. The suggested code works but is absolute junior level.
I am terrified of what is to come. Not just horrible code but also how people who blindly “autocomplete” this code are going to stall in their skill level progress.
You may score some story points but did you actually get any better at your craft?
I'm just not worried about this, LLMs don't ship.
But if you’re asking for something you don’t know how to do you might end up with junk and not even know it.
I think there's a whole meta level of the actual dynamic between human<>LLM interactions that is not being sufficiently talked about. I think there's, potentially, many secondary benefits that can come from using them simply due to the ways you have to react to their outputs (if a person decides to rise to that occasion).
> And in that process I've found is where the real magic happens
It might be good way to learn if there's someone who's supervising the process, so they _know_ that the code is incorrect, and tells you to figure out what's wrong and how to fixes.
If you are shipping this stuff yourself, this sounds like a way of deploying giant foot-guns into production.
I still think it's a better to learn if you try to understand the code from the beginning (in the same way that a person should try to understand code they read from tutorials and stackoverflow), rather than delaying the learning until something doesn't work. This is like trying to make yourself do reinforcement learning on the outputs of an LLM, which sounds really inefficient to me.
What I find (being in the latter category) is most LLM code output falls on the spectrum of “small snippets that work but wouldn’t have taken me long to type out anyway” to “large chunk that saves me time to write but that I have to thoroughly check/test/tweak”. In other words, the more time it saves typing the more time I have to spend on it afterwards. Novices probably spend more time on the former part of that spectrum and experienced devs on the latter. I suspect the average productivity increase across the spectrum is fairly level which means the benefits don’t really scale with user ability.
I think this tracks with the main thing people need to understand about LLMs: they are a tool. Like any tool simply having access to it doesn’t automatically make you good at the thing it helps with. It might help you learn and it might help you do the thing better, but it will not do your job for you.
Machine -> Asm -> C -> Python -> LLM (Human language)
It compiles human prompt into some intermediate code (in this case Python). Probably initial version of CPython was not perfect at all, and engineers were also terrified. If we are lucky this new "compiler" will be becoming better and better, more efficient. Never perfect, but people will be paying the same price they are already paying for not dealing directly with ASM.
Something that you neglected to mention is, with every abstraction layer up to Python, everything is predictable and repeatable. With LLMs, we can give the exact same instructions, and not be guaranteed the same code.
The question that matters is: can businesses solve their problems cheaper for the same quality, or at lower quality while beating the previous Pareto-optimal cost/quality frontier.
The question that matters is: will businesses crumble due to overproduction of same (or lower) quality code sooner or later.
Also, maybe recognizing the repetition remains the human's job, but refactoring is exponentially easier and so again we get better code as a result.
Seems to me to be pretty early to be making confident predictions about how this is all going to pan out.
but why doesn't that happen today? Cheap code can be had by hiring in cheap locations (outsourced for example).
The reality is that customers are the ultimate arbiters, and if it satisfies them, the business will not collapse. And i have not seen a single customer demonstrate that they care about the quality of the code base behind the product they enjoy paying for.
The code quality translates to speed of introduction of changes, fixes of defects and amount of user-facing defects.
While customers may not express any care about code quality directly they can and will express (dis)satisfaction with performance and defects of the product.
If you outsource and like what you get, you would assume the place you outsourced to can help provide continued support. What assurance do you have with LLMs? A working solution doesn't mean it can be easily maintained and/or evolved.
> And i have not seen a single customer demonstrate that they care about the quality of the code base behind the product they enjoy paying for.
That is true, but they will complain if bugs cannot be fixed and features are added. It is true that customers don't care, and they shouldn't, until it does matter, of course.
The challenge with software development isn't necessarily with the first iteration, but rather it is with continued support. Where I think LLMs can really shine is in providing domain experts (those who understand the problem) with a better way to demonstrate their needs.
... which is the whole idea behind training, isn't it?
The question that matters is: will businesses crumble due to overproduction of same (or lower) quality code sooner or later.
The problem is really the opposite -- most programmers are employed to create very minor variations on work done either by other programmers elsewhere, by other programmers in the same organization, or by their own younger selves. The resulting inefficiency is massive in human terms, not just in managerial metrics. Smart people are wasting their lives on pointlessly repetitive work.
When it comes to the art of computer programming, there are more painters than there are paintings to create. That's why a genuinely-new paradigm is so important, and so overdue... and it's why I get so frustrated when supposed "hackers" stand in the way.
>> Recognizable repetition can be abstracted
> ... which is the whole idea behind training, isn't it?
The comment I was answering specifically dismissed LLM's inability to answer same question with same... answer as unimportant. My point is that this ability is crucial to software engineering - answers to similar problems should be as similar as possible.Also, I bet that LLM's are not trained to abstract. In my experience they lately are trained to engage users in pointless dialogue as long as possible.
Nor is whether the implementation is the same from one build to the next.
and the GPU scheduler isn't deterministic
Unfortunately, this is only deterministic on the same hardware, but there is no reason why one couldn't write reasonably efficient LLM kernels. It just has not been a priority.
Nevertheless, I still agree with the main point that it is difficult to get LLMs to produce the same output reliably. A small change in the context might trigger all kinds of changes in the generated code.
> Something that you neglected to mention is, with every abstraction layer up to Python, everything is predictable and repeatable.
As long as you consider C and dragons flying out of your nose predictable.
(Insert similar quip about hardware)
That's something we'll have to give up and get over.
See also: understanding how the underlying code actually works. You don't need to know assembly to use a high-level programming language (although it certainly doesn't hurt), and you won't need to know a high-level programming language to write the functional specs in English that the code generator model uses.
I say bring it on. 50+ years was long enough to keep doing things the same way.
Set temperature appropriately, that problem is then solved, no?
- Are local models good enough?
- What are we giving up for deterministic behaviour?
For example, will it be much more difficult to write prompts. Will the output be nonsensical and more.
What's to say LLMs will not have a "compiler" interface in the future that will reign in their variance
With existing tools, we know if we need to do something, we can. The issue with LLMs, is they are very much black boxes.
> What's to say LLMs will not have a "compiler" interface in the future that will reign in their variance
Honestly, having a compiler interface for LLMs isn't a bad idea...for some use cases. What I don't see us being able to do is use natural language to build complex apps in a deterministic manner. Solving this problem would require turning LLMs into deterministic machines, which I don't believe will be an easy task, given how LLMs work today.
I'm a strong believer in that LLMs will change how we develop and create software development tools. In the past, you would need Google and Microsoft level of funding to integrate natural language into a tool, but with LLMs, we can easily have LLMs parse input and have it map to deterministic functions in days.
When you want to make changes to the code (which is what we spend most of our time on), you’ll have to either (1) modify the prompt and accept the risk of using the new code or (2) modify the original code, which you can’t do unless you know the lower level of abstraction.
Recommended reading: https://ian-cooper.writeas.com/is-ai-a-silver-bullet
No goal to become a programmer– But I like to build programs.
Build a rather complex AI-ecosystem simulator with me as the director and GPT-4 now Claude 3.5 as the programmer.
Would never have been able to do this beforehand.
By all means, though: if someone gets us to the point where the "code" I am checking in is a bunch of English -- for which I will likely need a law degree in addition to an engineering background to not get evil genie with a cursed paw results from it trying to figure out what I must have meant from what I said :/ -- I will think that's pretty cool and will actually be a new layer of abstraction in the same class as compiler... and like, if at that point I don't use it, it will only be because I think it is somehow dangerous to humanity itself (and even then I will admit that it is probably more effective)... but we aren't there yet and "we're on the way there" doesn't count anywhere near as much as people often want it to ;P.
https://aider.chat/docs/usage/modes.html#architect-mode-and-...
Cheering for remote work leading to loads of new positions being offered overseas opposed to domestically, and now loudly celebrating LLMs writing "boilerplate" for them.
How folks don't see the consequences of their actions is remarkable to me.
Case in point, I'm working on a game that's essentially a website right now. Since I'm very very bad with web design I'm using an LLM.
It's perfect 75% of the time. The other 25% it just doesn't work. Multiple LLMs will misunderstand basic tasks. Let's add properties and invent functions.
It's like you had hired a college junior who insists their never wrong and keeps pushing non functional code.
The entire mindset is whatever it's close enough, good luck.
God forbid you need to do anything using an uncommon node module or anything like that.
“Often wrong but never in doubt” is not proprietary to LLMs. It’s off-putting and we want them to be correct and to have humility when they’re wrong. But we should remember LLMs are trained on work created by people, and many of those people have built successful careers being exceedingly confident in solutions that don’t work.
"I don't know how to do this".
When it comes to programming. Tell me you don't know so I can do something else. I ended up just refactoring my UX to work around it. In this case it's a personal prototype so it's not a big deal.
To the other point, not admitting to gaps in knowledge or experience is also something that people do all the time. "I copied & pasted that from the top answer in Stack Overflow so it must be correct!" is a direct analog.
Also, let's not forget LLMs are a product of the internet and anonymity. Human interaction on the internet is significantly different from in person interaction, where typically people are more humble and less overconfident. If someone at my office acted like some overconfident SO/reddit/HN users I would probably avoid them like the plague.
The LLMs overconfidence is based on it spitting out the most-probable tokens based on its training data and your prompt. When LLMs learn real hubris from actual anonymous internet jackholes, we will have made significant progress toward AGI.
AI is just going to widen the skill level bell curve. Enables some people to get away with far more mediocre work than before, but also enables some people to become far more capable. You can't make someone put in more effort, but the ones who do will really shine.
But in my experience there are nuances to this. It's less about "good" vs "bad"/"sloppy" code and more about discernable. If it's discernably sloppy (i.e. the type of sloppy a beginning programmer might do which is familiar to all of us) I would say that's better than opaque "good" code (good really only meaning functional).
These things predict tokens. So when you use them, help them increase their chances of predicting the thing you want. Good comments on code, good function names, explain what you don't know, etc. etc. The same things you would ideally do if working with another person on a codebase.
Business pay big when they need to recover from that kind of thing and save face to investors.
Sure, they've gotten much cheaper on a per-token basis, but that cost reduction has come with a non-trivial accuracy/reliability cost.
The problem is, tokens that are 10x cheaper are still useless if what they say is straight up wrong.
This only holds for OpenAI.
We have seen no noticable improvements (at usable prices) for 7 months, when the original Sonnet 3.5 came out.
Maybe specialized hardware for LLM inference will improve so rapidly that o1 (full) will be quick and cheap enough a year from now, but it seems extremely unlikely. For the end user, the top models hadn't gotten cheaper for kore than a year until the release of Deepseek v3 a few weeks ago. Even that is currently very slow at non-Deepseek providers, and who knows just how subsidized the pricing and speed at Deepseek itself is, given political interests.
For my caveat "at usable prices", no, there haven't been any. o1 (full) and now o3 have been advancements, but are hardly available for real-world use given limitations and pricing.
I'm not sure this is grounded in reality. We've already seen articles related to how OpenAI is behind schedule with GPT-5. I do believe things will improve over time, mainly due to advancements in hardware. With better hardware, we can better brute force correct answers.
> junior devs will always be junior devs
Junior developers turn into senior developers over time.
Progress by Google, meta, Microsoft, Qwen and Deepseek is unhampered by OpenAI’s schedule. Their latest — including Gemini 2.0, Llama 3.3, Phi 4 — and the coding fine tunes that follow are all pretty good.
Sure, but if the advancements are to catch up to OpenAI, then major improvements by other vendors are nice and all, but I don't believe that was what the commenter was implying. Right now the leaders in my opinion are OpenAI and Anthropic and unless they are making major improvements every few months, the industry as a whole is not making major improvements.
In the space covered by Tabby, Copilot, aider, Continue and others, capabilities continue to improve considerably month-over-month.
In the segments of the industry I care most about, I agree 100% with what the commenter said w/r/t expecting major improvements every few months. Pay even passing attention to huggingface and github and see work being done by indies as well as corporate behemoths happening at breakneck pace. Some work is pushing the SOTA. Some is making the SOTA more widely available. Lots of it is different approaches to solving similar challenges. Most of it benefits consumers and creators looking use and learn from all of this.
From my experience I wouldn't even say LLMs are stupid. The LLM is a carrier and the intelligence is in the training data. Unfortunately, the training data is not going to get smarter.
If any of this had anything to do with reality then we should already have a programming specific model only trained on CS and math textbooks that is awesome. Of course, that doesn't work because the LLM is not abstracting the concepts how we normally think of in order to be stupid or intelligent.
It hardly shocking that next token prediction on math and CS textbooks is of limited use. You hardly have to think about it to see how flawed the whole idea is.
Don't worry. Like everything else in life, you get what you pay for.
This isn't far the current status quo. Good software companies pay for people who write top quality code, and the rest pay juniors to work far above their pay grade or offshore it to the cheapest bidder. Now it will be offloaded to LLM's instead. Same code, different writer, same work for a contractor who knows what they're doing to come and fix later.
And so the cycle continues.
(Not to hide your point though -- people please review your LLM-generated code!)
Tabby has undergone significant development since its launch two years ago [0]. It is now a comprehensive AI developer platform featuring code completion and a codebase chat, with a team [1] / enterprise focus (SSO, Access Control, User Authentication).
Tabby's adopters [2][3] have discovered that Tabby is the only platform providing a fully self-service onboarding experience as an on-prem offering. It also delivers performance that rivals other options in the market. If you're curious, I encourage you to give it a try!
[1]: https://demo.tabbyml.com/search/how-to-add-an-embedding-api-...
[2]: https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ
[3]: https://www.linkedin.com/posts/kelvinmu_last-week-i-introduc...
Edit: looks like there is a separate page with instructions for macbooks[2] that has more context.
> The compute power of M1/M2 is limited and is likely to be sufficient only for individual usage. If you require a shared instance for a team, we recommend considering Docker hosting with CUDA or ROCm.
[1]: https://github.com/TabbyML/tabby#run-tabby-in-1-minute
docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model StarCoder-1B --device cuda --chat-model Qwen2-1.5B-Instruct
[2]: https://tabby.tabbyml.com/docs/quick-start/installation/appl...A teeny tiny model such as a 1.5B model is really dumb, and not good at interactively generating code in a conversational way, but models in the 3B or less size can do a good job of suggesting tab completions.
There are larger "open" models (in the 32B - 70B range) that you can run locally that should be much, much better than gpt-4o-mini at just about everything, including writing code. For a few examples, llama3.3-70b-instruct and qwen2.5-coder-32b-instruct are pretty good. If you're really pressed for RAM, qwen2.5-coder-7b-instruct or codegemma-7b-it might be okay for some simple things.
> medium specced macbook pro
medium specced doesn't mean much. How much RAM do you have? Each "B" (billion) of parameters is going to require about 1GB of RAM, as a rule of thumb. (500MB for really heavily quantized models, 2GB for un-quantized models... but, 8-bit quants use 1GB, and that's usually fine.)
Cannot be turned off in the Community Edition. What does this telemetry data contain?
struct HealthState {
model: String,
chat_model: Option<String>,
device: String,
arch: String,
cpu_info: String,
cpu_count: usize,
cuda_devices: Vec<String>,
version: Version,
webserver: Option<bool>,
}
https://tabby.tabbyml.com/docs/administration/usage-collecti...LLMs - a spam bot for your codebase?
| Tabby only supports the use of a single GPU. To utilize multiple GPUs, you can initiate multiple Tabby instances and set CUDA_VISIBLE_DEVICES (for cuda) or HIP_VISIBLE_DEVICES (for rocm) accordingly.
So using 2 NVLinked GPU's with inference is not supported? Or is that situation different because NVLink treats the two GPU as a single one?
To make better use of multiple GPUs, we suggest employing a dedicated backend for serving the model. Please refer to https://tabby.tabbyml.com/docs/references/models-http-api/vl... for an example
The effectiveness of coding assistant is directly proportional to context length and the open models you can run on your computer are usually much smaller. Would love to see something more quantified around the usefulness on more complex codebases.
For extremely tiny models like you would use for tab completion, even an old AMD CPU is probably going to do okay.
[1]: https://github.com/TabbyML/tabby/tree/3bd73a8c59a1c21312e812...
It’s an extra step to install Ollama, so not as plugnplay as tfa but the license is MIT which makes it worthwhile for me.
I was wondering, how does this company make money?
From the pricing there is a free/community/opensource option, but how is the "up to 5 users" monitored?
https://www.tabbyml.com/pricing
* Up to 5 users
* Local deployment
* Code Completion, Answer Engine, In-line chat & Context Provider
What if we have more than 5 users?
If you want to drill into the details of the licenses: https://github.com/TabbyML/tabby/blob/main/LICENSE
All I want is a self-hosted AI assistant for VS2022. VS2022 supports plugins yes, so what gives?
Example: https://demo.tabbyml.com/search/how-to-configure-sso-in-tabb...
Settings page: https://demo.tabbyml.com/settings/providers/doc
Where you applying as a Software Dev.? Because that's not a software (or an interview) assignment.
Now my logic is: If a take home test is designed to take more than two hours, we need to redesign it. Two hours of interviews, two hours of take home test, that ought to suffice.
If we're still unsure after that, I sometimes offered the candidate a time limited freelance position, paid obviously. We've ended up hiring everyone who went into that process though.