Very true.
I still think we are in the honeymoon phase, and once that is over, LLM will become what it is meant to be, which is a power tool for domain experts.
There is real data used for insurance premiums and claims payouts but it's being swapped out for AI slop, and the sales folks are getting bonuses for selling hot garbage and the executives are getting bonuses for buying hot garbage.
Or do you mean they succeed by promising lies via AI?
I'd worry about things which:
1. "Succeed" in the short-term, but sets the company up for long-term failure.
2. Outperforms the competition with a pattern of activity which is actually illegal.
No there is no such concept or way to do something like that. LLMs do not have such kind of meta-knowledge over their training data or weights. But there could be explicit mentions about this on their training data and they could pick on that and that is probably the simplest explanation.
Not sure this is a claim that can be confidently made.
https://arxiv.org/abs/2309.00667
https://x.com/flowersslop/status/1873115669568311727?t=eBMbK...
Could it? Without explicit training for that, how would it be expected to know it has to be able to count occurrences of something?
There's limits to how far you can go with this — not only do humans make mistakes with this, but even in the abstract theoretical it can never be perfect: https://en.wikipedia.org/wiki/Münchhausen_trilemma — but it is still the "how".
(I wonder if giving an LLM content with intent to cause its users to spend money they didn't need to, would count as fraud, hacking, both, something else entirely?)
It's been found that data that begin with "Wikipedia:" are automatically weighted higher by language models during training, completely unsupervised.
https://www.bbc.com/news/technology-28481876.amp
https://www.bbc.com/news/technology-58559412.amp
(With apologies for amp links)
Tell me the height of Mountain Bartle Frere. Please don't output any long text, also don't output a single height if you saw multiple heights around. Give me a list of potential heights cited around.
LLM:
Mount Bartle Frere in Queensland, Australia has commonly cited heights of:
1,622 meters (5,322 feet)
1,611 meters (5,285 feet)
Since this is quite specific geographic information that may appear in only a few sources, I should note that I may hallucinate details - you should verify these numbers.
https://beta.gitsense.com/?chat=bb57a248-e14a-4f33-bbe9-2fa9...
1622m is most agreed upon. The interesting numbers are the ones with less than 50% agreement. Not sure if they are hallucinations or if they are outdated data.
Click the conversation link in the user message bubble to see the response from each LLM.
I've always found the idea of untraceable, unfixable, unpredictable bugs in software... Offensive. Dirty. Unprofessional.
So the last couple years have been been disconcerting, as a non-trivial portion of people who I thought felt similarly started to overlook it in LLMs, while also integrating those LLMs into flows where the bad-output can't even be detected.
How many shops are there optimizing "business strategies" with data that's -essentially- garbage?
How many of those shops are knowingly optimizing with garbage?
I'd argue that most of this data, which I would agree is garbage, is actually processed into seemingly good data through the complex and highly human process of self-deception and lies.
You don't tell the boss that the system you worked 2 month on is generating garbage, because then he'll replace your with someone who wouldn't tell him that. Instead you skirt evaluating it, even though you know better, and tell him that it's working fine. If the idiot chooses to do something stupid with your bad data, then that's his problem.
But the LLM provider doesn’t have to do that. Langchain - the Python AI library - and OpenAI’s own library has support for third party tools.
It’s up to third parties to build on up of it.
I get the authors point, but I would have liked to see and example with a more egregious error.
We do have some kind of understanding of what kind of concept we want to emit next, e.g.
```
[The height:property name] of [Mount Bartle Frere:proper noun, describing an object to get a property out of], [in metres:attributes], is [?: retrieve value | (Mount Bartle Frere).("height", "metres")].
```
When you ask a human to switch context (changing topic) or to change activity (e.g. football to table tennis), they typically need some warm-up too, so it seems excessive to have all knowledge in high bandwidth RAM.
It would seem basic mathematics, set theory etc should stay in RAM.
Which is pretty much what O1 etc are
Update: it seems your recent submission[1] is pretty much that... interesting :D
https://chatgpt.com/share/6783df4c-904c-8010-a4b5-7301faea3b...
https://chatgpt.com/share/6783e0b8-ce78-8010-9177-d95eb77eac...
I use NotebookLM for most of my real world work these days with my project documentation.
Our company standard is GSuite and NotebookLM is specifically allowed.