35 pointsby t552 days ago5 comments
  • acheong082 days ago
    I'm surprised there's still so much hype funding 3 years in. There's still no evidence LLMs lead to "superintelligence" - it's interesting that's the word chosen by all these new startups
    • highfrequencya day ago
      What would constitute promising evidence that LLMs may lead to superintelligence in your eyes? I mean this as a serious question.

      Suddenly we can talk to computers in plain language, they can solve a broad range of technical and non-technical problems, they get significantly better every year… it’s hard for me to imagine more promising evidence that AGI is on the horizon besides actually achieving AGI.

      • root_axisa day ago
        IMO, there are fundamental structural barriers that preclude transformers from "superintelligence". In simple terms, I think it's far-fetched to assume that "superintelligence" can emerge from a bunch of text and images. The absurd statistical power of using every available super computer for quadratic time brute force on the entire public internet produces incredibly impressive (and useful) results, but there's only so much blood you can squeeze from a stone considering that the fidelity of reality is, far and away, inconceivably more complex than a textual substrate.

        Further, I don't see strong evidence of "regular" intelligence. LLMs are like calculators for text, they have a lot of practical utility, but they don't understand anything, their output is the result of rote mechanical steps that could be executed by hand in principle. I've been using SOTA LLMs daily for years and to this day they still reliably produce nonsense and get things confidently wrong that an intelligent person generally wouldn't. Of course, intelligent people make mistakes, but if they start to hallucinate we immediately lose trust in them. Most people use LLMs in a touch and go manner, and the impressive statistical power fools us into believing we're interacting with something akin to a being, but the facade quickly breaks down the longer you try to engage with it in a manner where coherency matters.

        With all that stated, I'm not saying AGI isn't possible, but I don't see language models as a path to AGI no matter how much better we can get them to model language.

        • accruala day ago
          This is a good argument. While models for producing useful text and images will continue to improve and create evermore convincing outputs, they lack some fundamental aspects of lived experience. Imagine if we could record all (or many) aspects of our daily experience from 1M+ viewpoints over many years - that would lead to a revolutionary model and would be unfathomably expensive to train.

          The sounds heard waiting for a train. The experience of going on a date or hiking or swimming. The tastes and sensation of biting into some fruit. They each hold such a rich multi-modal experience that feels impossible to replicate in a model at at this time. An LLM can describe it but it cannot experience it.

          Not that such experience is required to do what an AGI might be asked to do, but how could something reach AGI or higher without that level of experiential detail?

          • multi-modal models trained on videos and text already learn through 3 senses. google and others already connected these to arms (with very limited tactile information though). we wouldn't deny intelligence to people with disabilities. a model which was force-fed more than any human can ever experience is an alien intelligence, but the model it creates internally seems sufficient in many ways. if you accept the ability to do 'cold reasoning' as intelligence I'd say we are pretty close.
        • kadushkaa day ago
          I’ve been using o1, and most recently gpt-4.5, and I haven’t encountered a single case of hallucinations. Could you provide an example of one?
          • a day ago
            undefined
      • missedthecuea day ago
        When an LLM cares about something would be promising evidence to me.

        As they currently exist, they are essentially a novel and extremely sophisticated method to search, derive, and understand data. In fact almost all of the data ever recorded. To OPs point, every new LLM startup is just trying to build a bigger and more sophisticated way to search, derive, and understand data. It's not clear to me that bigger and more compute intensive methods will create an LLM that cares about anything.

        • > they are essentially a novel and extremely sophisticated method to search, derive, and understand data

          What about this doesn’t sound supremely valuable?

          • jdlshorea day ago
            The debate is about superintelligence, not value.
            • a day ago
              undefined
      • acheong08a day ago
        There isn't really a universal definition to "superintelligence". I would personally put it as an entity capable of replacing any human at any task (without considering economics), essentially being superior to the best data available from its training set.

        LLMs fundamentally predict next token based on its training set. It's static and not learning by itself (context window doesn't count since improvements are not long term or generalizable).

        AGI, I do believe is reachable, and may already be partway there. "General" only needs to be better than the average human, not the best humanity has to offer.

      • vineyardmikea day ago
        Not OP, but I'll bite.

        I, firstly, don't think it's fair to say "they get significantly better every year" after 2 years. I honestly think GPT 3/3.5 (maybe optimized for cost or speed but not retrained) would have been adequate for a significant chunk of the general purpose tasks that are asked of LLMs. I think most of the other gains we've seen since are related to fine-tuning on intentional application-specific tasks. IMO the only real "significant" improvement is the native multi-modal models.

        That said, I think that RL+thinking and long-context models will start to present enough combined incremental improvements towards my next points to be even more useful and capable at a wider variety of tasks, but there is (in their current usable forms like chat bot apps and public APIs) a fundamental limit on their implementation preventing them from being AGI.

        I think that models, even the "thinking" ones, fail to truly perform novel rationalization around general-purpose task solving. We have a ton of evidence that they're useful for a ton of tasks when it's trained in. Even going a tiny bit in a unique direction drops the model output quality a ton. Available thinking models today are really good at math, because they were RLed into solving math like a school child - rote practice. But that doesn't mean they're capable of even applying the same logical tools (they should have learned along) the way to novel mathematic and logic questions.

        Another impediment to being treated even as an application-specific "mini AGI" is their naïveté and hallucinations. This makes their use as agents suspect for anything important. They can't distinguish or even output a true confidence on what they "don't know", and this blind confidence is a setback. Humans are known to say incorrect things confidently, but they're also known to reflect on their lack of knowledge and recognize their limits. Humans have real memory to be able (imprecisely) associate an event with their learning, to aid confidence in recall. Similarly, LLMs trusting nature on input (eg. prompt injection attacks) prevent them from "intelligently" acting in the real world even when they're not hallucinating. Tools like "DeepResearch" are really useful, and impressive improvements on traditional human searching for processing the vastness of the internet. BUT the model can't genuinely distinguish between good and bad sources, and often can't intelligently reflect on the patterns and social context of the sources they sell.

        I can totally see a world where an LLM can output a confidence metric which is used to drive the tokens, and potentially suppress output, and I can totally see a world where long context and thinking (w/ RL) gives it enough reflection on everything to question to function even more autonomously. But I remain skeptical that it will be able to "think" and rationalize deeply enough to be a "super intelligence" on tasks it wasn't taught.

        • garyroba day ago
          > I think most of the other gains we've seen since are related to fine-tuning on intentional application-specific tasks. IMO the only real "significant" improvement is the native multi-modal models.

          Let me say, as someone with no connection to OpenAI, that Deep Research mode is an absolute game-changer for my use and well worth the money. Obviously, YMMV, but for really does do deep research and organize it all in an excellent way.

          I haven't yet noticed an error in its output when researching general subjects. It doesn't solve the problem of not necessarily using code examples from the relevant version of a Rust library, so it has a good ways to go for Rust coding. But I am very, very impressed with its usefulness for general research.

          My point is only, if you haven't tried it, don't assume that there isn't already a game-changer out there for many uses.

    • spaceman_2020a day ago
      I don't even know why superintelligence is a goal. I'll be happy with a 120IQ digital being that handles all my busy work
    • t552 days ago
      Well, it depends. on some tasks, they surely are already super-intelligent.
      • lblumea day ago
        When talking about superintelligence it is generally clear that general intelligence is meant, as specialized superintelligence has been solved for many areas for decades now without modern AI.
        • highfrequencya day ago
          True but misses the point - surely quantity matters? A machine that can do 0.001% of basic tasks vs. one that can do 20% of basic tasks seems qualitatively different.
          • lblume18 hours ago
            How to classify that fraction? Let's say D. Gukesh (current chess champion) plays and learns chess for 50% of his time awake. Would you consider Stockfish to be a superintelligence that accounts for 50% of basic task wrt D. Gukesh? What if tasks (like some types of entertainment) usually are not relevantly attributed to intelligence at all? I would love to find answers to the questions but find them difficult to the point of being too vague to answer.
    • arisAlexisa day ago
      Actually your statement contradicts the top-3 AI h-index scientists + sama + amodei + Elon and Nobel laureates. What makes you so sure? Curious
    • arisAlexisa day ago
      Actually your statement contradicts the top-3 AI h-index scientists and Nobel laureates. What makes you so sure? Curious
  • jjtheblunta day ago
    I wonder how this compares to Generally Intelligent / Imbue (renamed), whose hiring ads were regularly appearing in HN for months, but I've not seen that lately.
  • 1stsentient14 hours ago
    1)It must be able to provide answers to Zen Koans as a test of its synthetic intuition. 2) Must solve P vs NP problem or it's equivalent. 3) Must be able to provide emotional support to a person who recently lost a loved one.
  • sunami-aia day ago
    Part of me thinks that that before they gave the $130M they knew exactly how they'd get it back (thru some pre-arranged M&A) if it doesn't work out. Or at least that would be the smart thing to do.
    • _cs2017_a day ago
      How can you get back the money that was spent? Fancy terms like M&A won't return the money that is gone.
      • sunami-ai20 hours ago
        I spent $10 on a bad apple and it got $0 back. Then I go and sell that bad apple for $10. Whomever bought it is giving me back my money. This is kindergarten level arithmetic.
        • _cs2017_10 hours ago
          I didn't mean that it's mathematically impossible. I meant that it's practically not gonna happen. Much as they'd love to, no VC gets a chance to invest in an early stage startup knowing that if the startup fails, there's a buyer ready to make the VC whole.
  • vivzkestrela day ago
    imagine what a fancy office with state of the art wallpapers, furniture and state of the art robotic coffee making machines you could add to your office with $130M funding