Meta Superintelligence Labs' first paper is about RAG(paddedinputs.substack.com)

423 pointsby skadamat4 months ago35 comments

ipsum24 months ago
This has nothing to do with superintelligence, it's just the people that were working on the paper prior to the re-org happened to publish after the name change.
Though it is notable that contrary to many (on HN and Twitter) that Meta would stop publishing papers and be like other AI labs (e.g. OpenAI). They're continued their rapid pace of releasing papers AND open source models.
- pityJuke4 months ago
  What model(s) have Meta released since the Lab re-org?
  Also, that wasn't based on purely hearsay, Zuck explicitly said:
  > We believe the benefits of superintelligence should be shared with the world as broadly as possible. That said, superintelligence will raise novel safety concerns. We'll need to be rigorous about mitigating these risks and careful about what we choose to open source. Still, we believe that building a free society requires that we aim to empower people as much as possible. [0]
  [0]: https://www.meta.com/superintelligence/
  - ipsum24 months ago
    That has always been the policy. To answer your question, Meta has released ~100 models since the Superintelligence Lab reorg.
    https://huggingface.co/facebook/models
    The most interesting ones to me are:
    - CWM (Code world model), an LLM for coding https://github.com/facebookresearch/cwm
    - DINOv3, A vision encoder https://ai.meta.com/dinov3/
    - MAPAnything, a 3d reconstruction model https://huggingface.co/facebook/map-anything
    - VJEPA v2, Self-supervised video pre-training model https://github.com/facebookresearch/vjepa2
  - parpfish4 months ago
    > We believe the benefits of superintelligence should be shared with the world as broadly as possible.
    i'd interpret that as meaning "everybody is welcome to be our customer, but we're still control all of it"
  - gessha4 months ago
    You still believe anything that comes out of his mouth?
  - PatronBernard4 months ago
    When did Zuck start caring about society?
    cwmoore4 months ago
    Is this a trick question? Probably before he was even born.
    throwaway3141554 months ago
    Is this a trick response? There's no way he ever cared about society in a way that wasn't completely plastic.
    cwmoore4 months ago
    Sure, for some variation on the meaning of “society”, or “care”, or “plastic”, and maybe all the best ones, but it’s hard to argue he had never seen value in groups of people before starting Facebook, and arguably a motivator for every human being ever born.
- ekianjo4 months ago
  Open weights models, not open source. And even their weights are under a specific license not as permissive as apache 2.
  - HPsquared4 months ago
    This is the right terminology. Model weights are literally compiled binary data; they are the output of an algorithm run on a bunch of source data. That training dataset is the "source" of the model. Training data (or the scripts used to generate it) is human-readable and modifiable, like source code. Binary weights are not.
    carom4 months ago
    Just to note though, source copyright extends to its compiled form. There is probably an analogue there for model weights.
    jeremyjh4 months ago
    Tell me about the companies that own the copyrights to their training data.
    phkahler4 months ago
    Binary weights can still be "edited" with additional training.
    HPsquared4 months ago
    Binary executables can also be edited after compilation.
    ekianjo4 months ago
    Binary images can also be edited after they are created.
  - sdeframond4 months ago
    I propose that from now on we call freewares "open binaries".
  - drexlspivey4 months ago
    Does an “open source” model the way you describe it exist or is it a mythical creature?
    qcoret4 months ago
    Unicorns also don't exist, but we don't change the definition to include horses.
    jakupovic4 months ago
    Prove to me that unicorns don't exist, first level arguments only!
    aerhardt4 months ago
    The first level argument is that old horse, burden of proof.
    ayewo4 months ago
    An open source model does exist now [1] and is multilingual. Previous discussion [2].
    [1] https://ethz.ch/en/news-and-events/eth-news/news/2025/07/a-l...
    [2] https://news.ycombinator.com/item?id=44535637
    CaptainOfCoit4 months ago
    It does, but does it matter? Even if every software released in 2025 was proprietary, doesn't make their published binaries "open source" because no other software could be classified as "open source".
    We name things based on what they are, not based on the lack of other things.
    omneity4 months ago
    There aren’t many but they do exist. OLMo for example.
    Rattled4 months ago
    Olmo by AllenAI and Pythia by EleutherAI.
    kaufmann4 months ago
    Apertus by EPFL and ETH Zürich.
    ekianjo4 months ago
    I believe it does. OLMO.
  - hippo224 months ago
    I'm not a lawyer, but I believe that the weights aren't subject to copyright. So, you can use them outside of Meta's license agreement provided you get them from somewhere else.
- RataNova4 months ago
  Still, I think the optics matter... the fact that Meta's still putting out technical work (and open sourcing it) after the restructure says a lot about where they want to position themselves
- Zacharias0304 months ago
  Should be the top comment.
  MSL is not only those few high profile hires.
godelski4 months ago
It's kinda funny, Meta has long had some of the best in the field, but left them untapped. I really think if they just took a step back and stop being so metric focused and let their people freely explore then they'd be winning the AI race. But with this new team, I feel like meta mostly hired the people who are really good at gaming the system. The people that care more about the money than the research.
A bit of this is true at every major lab. There's tons of untapped potential. But these organizations are very risk adverse. I mean why not continue with the strategy that got us to the point we're at in the first place. Labs used to hire researchers and give them a lot of free reign. But those times ended and AI progress also slowed down. Maybe if you want to get ahead you gotta stop thinking like everyone else
Well meta... you can "hold me hostage" for a lot cheaper than those guys. I'm sure this is true for hundreds of passionate ML researchers. I'd take a huge pay cut to have autonomy and resources. I know for a fact there's many working at Mets right now that would do the same. Do maybe if you're going to throw money at the problem, diversify a bit and look back at what made SV what it is today and what made AI take leaps forward
- hamasho4 months ago
  My theory is that as more people compete, the top candidates become those who are best at gaming the system rather than actually being the best. Someone has probably studied this. My only evidence is job applications for GAFAM and Tinder tho.
  - crystal_revenge4 months ago
    I've spent most of my career working, chatting and hanging out with what might be best described as "passionate weirdos" in various quantitative areas of research. I say "weirdos" because they're people driven by an obsession with a topic, but don't always fit the mold by having the ideal combination of background, credentials and personality to land them on a big tech company research team.
    The other day I was spending some time with a researcher from Deep Mind and I was surprised to find that while they were sharp and curious to an extent, nearly every ounce of energy they expended on research was strategic. They didn't write about research they were fascinated by, they wrote and researched on topics they strategically felt had the highest probability getting into a major conference in a short period of time to earn them a promotion. While I was a bit disappointed, I certainly didn't judge them because they are just playing the game. This person probably earns more than many rooms of smart, passionate people I've been in, and that money isn't for smarts alone; it's for appealing to the interests of people with the money.
    You can see this very clearly by comparing the work being done in the LLM space to that being done in the Image/Video diffusion model space. There's much more money in LLMs right now, and the field is flooded with papers on any random topic. If you dive in, most of them are not reproducible or make very questionable conclusions based on the data they present, but that's not of very much concern so long as the paper can be added to a CV.
    In the stable diffusion world it's mostly people driven by personal interest (usually very non-commericial personal interests) and you see tons of innovation in that field but almost no papers. In fact, if you really want to understand a lot of the most novel work coming out of the image generation world you often need to dig into PRs made by an anonymous users with anime themed profile pic.
    The bummer of course is that there are very hard limits on what any researcher can do with a home GPU training setup. It does lead to creative solutions to problems, but I can't help but wonder what the world would look like if more of these people had even a fraction of the resources available exclusively to people playing the game.
    kcexn4 months ago
    This is such a nuanced problem. Like any creative endeavour, the most powerful and significant research is driven by an innate joy of learning, creating, and sharing ideas with others. How far the research can be taken is then shaped by resource constraints. The more money you throw at the researchers, the more results they can get. But there seems to be a diminishing returns kind of effect as individual contributors become less able to produce results independently. The research narrative also gets distorted by who has the most money and influence, and not always for the better (as recent events in Alzheimer's research has shown).
    The problem is once people's livelihoods depend on their research output rather than the research process, the whole research process becomes steadily distorted to optimise for being able to reliably produce outputs.
    Anyone who has invested a great deal of time and effort into solving a hard problem knows that the 'eureka' moment is not really something that you can force. So people end up spending less time working on problems that would contribute to 'breakthroughs' and more time working on problems that will publish.
    RataNova4 months ago
    The tragedy is exactly what you said: all that energy, creativity, and deep domain obsession locked out of impact because it’s not institutionally “strategic.”
    smokel4 months ago
    > I certainly didn't judge them because they are just playing the game.
    Please do judge them for being parasitical. They might seem successful by certain measures, like the amount of money they make, but I for one simply dislike it when people only think about themselves.
    As a society, we should be more cautious about narcissism and similar behaviors. Also, in the long run, this kind of behaviour makes them an annoying person at parties.
    danielmarkbruce4 months ago
    There is an implication that passionate weirdos are good by nature. You either add value in the world or you don't. A passionate, strange actor or musician who continues trying to "make it" who isn't good enough to be entertaining is a parasite and/or narcissist. A plumber who is doing the job purely for money is a value add (assuming they aren't ripping people off) - and they are playing the game - the money for work game.
    smokel4 months ago
    I'm not a plumber, but I think the analogy would require the plumber to be strategical about every job they take on. Within a few months, our plumber would only be plumbing for millionaires, installing golden faucets at extreme price points. I would then stop befriending said plumber.
    danielmarkbruce4 months ago
    You have a friend who is a plumber, and he figures a way to serve customers with money at high price points with an honest service, and you kick him to the curb?
    smokel4 months ago
    Extreme price points, not honest service. It's all fairly hypothetical. I don 't know exactly where you'd like the discussion to go. In reality things are always more complicated. I just think that it is a generally a good idea to call out people on anti-social behavior. When, where and how to exactly do that differs in every situation.
    idiotsecant4 months ago
    This take is simply wrong in a way that I would normally just sigh and move on, but it's such a privileged HN typical pov that I feel like I need to address it. If a plumber did plumbing specifically because someone needed it and he would be paid, would you call them a narcissist? If a gardener built a garden how their customer wanted would you call them a narcissist? Most of the world doesn't get to float around in a sea of VC money doing whatever feels good. They find a need, address it, and get to live another day. Productively addressing what other people need and making money from it isn't narcissism, it's productivity.
    lkey4 months ago
    You are comparing a skilled trade that commands ~100k annual compensation to positions that have recently commanded 100 million dollars in compensation upon signing, no immediate productivity required, as this talent denial is considered strategic.
    You consider the person who expects eventual ethical behavior from people that have 'won' capitalism (never have to labour again) to be privileged.
    szundi4 months ago
    [dead]
    bradleyjg4 months ago
    but I for one simply dislike it when people only think about themselves
    The key word there is only. Nothing in the post you suggested only. You have one vignette about one facet of this guy’s life.
    I really dislike the resurgence in Puritanism.
    smokel4 months ago
    Please don't read too much into this single word. The comment above mentioned "nearly every ounce of energy they expended on research was strategic", and I was keeping that in mind while writing my remark.
    Please read my sibling comment where I expand a bit on what I meant to say.
    4 months ago
    undefined
    what-the-grump4 months ago
    But this is in itself selfish right?
    You dislike them because they don’t benefit you indirectly by benefiting society at large.
    The incentive structure is wrong, incentivizing things that benefit society would be the solution not judging those that exist in the current system by pretending altruism is somehow not part of the same game.
    smokel4 months ago
    I agree that the system itself is dysfunctional, and I understand the argument that individuals are shaped or even constrained by it. However, in this case, we are talking about people who are both exceptionally intelligent and materially secure. I think it's reasonable to expect such individuals to feel some moral responsibility to use their abilities for broader good.
    As for whether that expectation is "selfish" on my part, I think that question has been debated for centuries in ethics, and I'm quite comfortable landing on the side that says not all disapproval is self-interest. In my own case, I'm not benefiting much either :)
    what-the-grump4 months ago
    I just don't think so, these exceptionally intelligent people are masters at pattern recognition, logic, hyper-focus, task completion in a field. Every single thing will tell them don't go against the flow, don't stick your neck out, don't be a hero, don't take on risk. Or you will end up nailed to a cross.
    To me this is an insane position to take or to expect from anyone, its some just world fallacy thing perpetuated by too much Hollywood.
    I am going to flip the script for a minute. I am a killer, driver, pilot, mechanic one the best ones out there, I beat the game, I won. So let me just stop and change the world, for what?
    godelski4 months ago
    > Every single thing will tell them don't go against the flow, don't stick your neck out, don't be a hero, don't take on risk. Or you will end up nailed to a cross.
    Except the situation is more like monkeys and a ladder. The ones "nailing them to the cross" are the same ones in those positions. This is the same logic as "life was tough for me, so life should be tough for you." It's idiotic!
    > So let me just stop and change the world, for what?
    This is some real "fuck you, I got mine" attitude. Pulling the ladder up behind you.
    We have a long history in science of seeing that sticking your neck out, taking risks, and being different are successful tools to progressing science[0]. Why? Because you can't make paradigm shifts by maintaining the current paradigm. We've also seen that this behavior is frequently combated by established players. Why? Because of the same attitude, ego.
    So we've created this weird system where we tell people to think different and then punish them for doing so. Yeah, people are upset about it. I find that unsurprising. So yeah, fuck you, stop pulling the ladder up behind you. You're talking as if they just leave the ladder alone, but these are the same people who end up reviewing papers, grants, and are thus the gatekeepers of progress. Their success gives them control of the ladders and they make the rules.
    [0] Galileo, Darwin, Gauss, Kepler, Einstein, and Turing are not the only members of this large club. Even more recently we have Karikó who ended up getting the 2023 Nobel prize in Medicine and Akerlof, Spence, Stiglitz who got the 2001 Nobel prize in economics for their rejected work. This seems to even be more common among Nobel laureates!
    Eisenstein4 months ago
    There is a difference between being selfish in the sense that you want others to contribute back to the society that we are all part of, and being selfish in the sense that you want to compete for exclusive rewards.
    You can call this difference whatever you want, don't pretend that they are morally or effectively equivalent.
    esafak4 months ago
    Reciprocal altruism, and inclusive fitness.
    kakacik4 months ago
    Selfish for the long term future and prosperity of mankind? Thats some good selfishness all right.
  - godelski4 months ago
    > Someone has probably studied this
    There's even a name for it
    https://en.wikipedia.org/wiki/Goodhart%27s_law
    ivanbelenky4 months ago
    Thanks for sharing. I did not know this law existed and had a name. I know nothing about nothing but it appears to be the case that the interpretation of metrics for policies assume implicitly the "shape" of the domain. E.g. in RL for games we see a bunch of outlier behavior for policies just gaming the signal.
    There seems to be 2 types
    - Specification failure: signal is bad-ish, a completely broken behavior --> local optimal points achieved for policies that phenomenologically do not represent what was expected/desired to cover --> signaling an improvable reward signal definition
    - Domain constraint failure: signal is still good and optimization is "legitimate", but you are prompted with the question "do I need to constraint my domain of solutions?"
    - finding a bug that reduces time to completion of a game in a speedrun setting would be a new acceptable baseline, because there are no rules to finishing the game earlier - shooting amphetamines on a 100m run would probably minimize time, but other factors will make people consider disallowing such practices.
    Eisenstein4 months ago
    I view Goodhart's law more as a lesson for why we can never achieve a goal by offering specific incentives if we are measuring success by the outcome of the incentives and not by the achievement of the goal.
    This is of course inevitable if the goal cannot be directly measured but is composed of many constantly moving variables such as education or public health.
    This doesn't mean we shouldn't bother having such goals, it just means we have to be diligent at pivoting the incentives when it becomes evident that secondary effects are being produced at the expense of the desired effect.
    godelski4 months ago
    > This is of course inevitable if the goal cannot be directly measured
    It's worth noting that no goal can be directly measured[0].
    I agree with you, this doesn't mean we shouldn't bother with goals. They are fantastic tools. But they are guides. The better aligned our proxy measurement is with the intended measurement then the less we have to interpret our results. We have to think less, spending less energy. But even poorly defined goals can be helpful, as they get refined as we progress in them. We've all done this since we were kids and we do this to this day. All long term goals are updated as we progress in them. It's not like we just state a goal and then hop on the railroad to success.
    It's like writing tests for code. Tests don't prove that your code is bug free (can't write a test for a bug you don't know about: unknown unknown). But tests are still helpful because they help evidence the code is bug free and constrain the domain in which bugs can live. It's also why TDD is naive, because tests aren't proof and you have to continue to think beyond the tests.
    [0] https://news.ycombinator.com/item?id=45555551
    esafak4 months ago
    You can measure revenue exactly; it has limited precision.
    julienreszka4 months ago
    It’s a false law tho. Collapses under scrutiny
    NBJack4 months ago
    If I hadn't seen it in action countless times, I would belive you. Changelists, line counts, documents made, collaborator counts, teams lead, reference counts in peer reviewed journals...the list goes on.
    You are welcome to prove me wrong though. You might even restore some faith in humanity, too!
    godelski4 months ago
    Sorry, remind me; how many cobras are there in India?
    bandrami4 months ago
    The Zoological Survey of India would like to know but hasn't figured out a good way to do a full census. If you have any ideas they would love to hear them.
    Naja naja has Least Concern conservation status, so there isn't much funding in doing a full count, but there are concerns as encroachment both reduces their livable habitat and puts them into more frequent contact with humans and livestock.
    oblio4 months ago
    The comment was a joke.
    https://en.wikipedia.org/wiki/Perverse_incentive
    epwr4 months ago
    Could you elaborate or link something here? I think about this pretty frequently, so would love to read something!
    vasco4 months ago
    Metric: time to run 100m
    Context: track athlete
    Does it cease to be a good metric? No. After this you can likely come up with many examples of target metrics which never turn bad.
    noosphr4 months ago
    If it were a good metric there wouldn't be a few phone books worth of regulations on what you can do before and during running 100 meters. From banning rocket shoes, to steroids, to robot legs the 100 meter run is a perfect example of a terrible metric both intrinsically as a measure of running speed and extrinsically as a measure of fitness.
    4 months ago
    undefined
    AnthonyMouse4 months ago
    > Metric: time to run 100m
    > Context: track athlete
    > Does it cease to be a good metric? No.
    What do you mean? People start doping or showing up with creatively designed shoes and you need to layer on a complicated system to decide if that's cheating, but some of the methods are harder to detect and then some people cheat anyway, or you ban steroids or stimulants but allow them if they're by prescription to treat an unrelated medical condition and then people start getting prescriptions under false pretexts in order to get better times. Or worse, someone notices that the competition can't set a good time with a broken leg.
    godelski4 months ago
    So what is your argument, that it doesn't apply everywhere therefore it applies nowhere?
    You're misunderstanding the root cause. Your example works as the the metric is well aligned. I'm sure you can also think of many examples where the metric is not well aligned and maximizing it becomes harmful. How do you think we ended up with clickbait titles? Why was everyone so focused on clicks? Let's think about engagement metrics. Is that what we really want to measure? Do we have no preference over users being happy vs users being angry or sad? Or are those things much harder to measure, if not impossible to, and thus we focus on our proxies instead? So what happens when someone doesn't realize it is a proxy and becomes hyper fixated on it? What happens if someone does realize it is a proxy but is rewarded via the metric so they don't really care?
    Your example works in the simple case, but a lot of things look trivial when you only approach them from a first order approximation. You left out all the hard stuff. It's kinda like...
    Edit: Looks like some people are bringing up metric limits that I couldn't come up with. Thanks!
    vasco4 months ago
    > So what is your argument, that it doesn't apply everywhere therefore it applies nowhere?
    I never said that. Someone said the law collapses, someone asked for a link, I gave an example to prove it does break down in some cases at least, but many cases once you think more about it. I never said all cases.
    If it works sometimes and not others, it's not a law. It's just an observation of something that can happen or not.
    godelski4 months ago
    > I never said all cases.
    You're right. My bad. I inferred that through the context of the conversation.
    > If it works sometimes and not others, it's not a law.
    I think you are misreading and that is likely what lead to the aforementioned misunderstanding. You're right that it isn't a scientific law, but the term "law" gets thrown around a lot in a more colloquial manner. Unfortunately words are overloaded and have multiple meanings. We do the same thing to "hypothesis", "paradox", and lots of other things. I hope this clarifies the context. (even many of the physics laws aren't as strong as you might think)
    But there are many "laws" used in the same form. They're eponymous laws[0], not scientific ones. Read "adage". You'll also find that word used in the opening sentence on the Wiki article I linked as well as most (if not all) of them in [0]
    [0] https://en.wikipedia.org/wiki/List_of_eponymous_laws
    exe344 months ago
    it doesn't break down - see comments about rules above. it was the perfect example to prove yourself wrong.
    vasco4 months ago
    I disagree with all of those examples, they are misunderstanding what it means for the metric to break down in the context of the law, but alas. "If you run a different race" lol.
    godelski4 months ago
    > in the context of the law
    That's the key part. The metric has context, right?
    And that's where Goodhart's "Law" comes in. A metric has no meaning without context. This is why metrics need to be interpreted. They need to be evaluated in context. Sometimes this context is explicit but other times it is implicit. Often people will hack the metric as the implicit rule is not explicit and well that's usually a quick way to make those rules explicit.
    Here's another way to think about it: no rule can be so perfectly written that it has no exceptions.
    exe344 months ago
    could you explain what you think the difference is?
    a metric is chosen, people start to game the system by doing things that make the metric improve but the original intent is lost. increasingly specific rules/laws have to be made up to make the metric appear to work, but it becomes a lost cause as more and more creative ways are found to work around the rules.
    vasco4 months ago
    Exactly, that's the definition. It doesn't apply to timing a 100m race. There's many such situations that are simple enough and with perfect information available where this doesn’t break down and a metric is just a metric and it works great.
    Which is not to the detriment of the observation being true in other contexts, all I did was provide a counter example. But the example requires the metric AND the context.
    godelski4 months ago
    Do you know certain shoes are banned in running competitions?
    There's a really fine line here. We make shoes to help us run faster and keep our feet safe, right? Those two are directly related, as we can't run very fast if our feet are injured. But how far can this be taken? You can make shoes that dramatically reduce the impact when the foot strikes the ground, which reduces stress on the foot and legs. But that might take away running energy, which adds stresses and strains to the muscles and ligaments. So you modify your material to put energy back into the person's motion. This all makes running safer. But it also makes the runner faster.
    Does that example hack the metric? You might say yes but I'm certain someone will disagree with you. There's always things like this where they get hairy when you get down to the details. Context isn't perfectly defined and things aren't trivial to understand. Hell, that's why we use pedantic programming languages in the first place, because we're dealing with machines that have to operate void of context[0]. Even dealing with humans is hard because there's multiple ways to interpret anything. Natural language isn't pedantic enough for perfect interpretation.
    [0] https://www.youtube.com/watch?v=FN2RM-CHkuI
    exe344 months ago
    it wasn't a very good counter example.
    ccortes4 months ago
    > Does it cease to be a good metric?
    Yes if you run anything other than the 100m
    MR_Bulldops4 months ago
    Do you have an example that doesn't involve an objective metric? Of course objective metrics won't turn bad. They're more measurements than metrics, really.
    godelski4 months ago
    > an objective metric
    I'd like to push back on this a little, because I think it's important to understanding why Goodhart's Law shows up so frequently.
    *There are no /objective/ metrics*, only proxies.
    You can't measure a meter directly, you have to use a proxy like a tape measure. Similarly you can't measure time directly, you have to use a stop watch. In a normal conversation I wouldn't be nitpicking like this because those proxies are so well aligned with our intended measures and the lack of precision is generally inconsequential. But once you start measuring anything with precision you cannot ignore the fact that you're limited to proxies.
    The difference of when we get more abstract in our goals is not too dissimilar. Our measuring tools are just really imprecise. So we have to take great care to understand the meaning of our metrics and their limits, just like we would if we were doing high precision measurements with something more "mundane" like distance.
    I think this is something most people don't have to contend with because frankly, very few people do high precision work. And unfortunately we often use algorithms as black boxes. But the more complex a subject is the more important an expert is. It looks like they are just throwing data into a black box and reading the answer, but that's just a naive interpretation.
    AnthonyMouse4 months ago
    This isn't what Goodhart's law is about.
    Sure, if you get a ruler from the store it might be off by a fraction of a percent in a way that usually doesn't matter and occasionally does, but even if you could measure distance exactly that doesn't get you out of it.
    Because what Goodhart's law is really about is bureaucratic cleavage. People care about lots of diverging and overlapping things, but bureaucratic rules don't. As soon as you make something a target, you've created the incentive to make that number go up at the expense of all the other things you're not targeting but still care about.
    You can take something which is clearly what you actually want. Suppose you're commissioning a spaceship to take you to Alpha Centauri and then it's important that it go fast because otherwise it'll take too long. We don't even need to get into exactly how fast it needs to go or how to measure a meter or anything like that, we can just say that going fast is a target. And it's a valid target; it actually needs to do that.
    Which leaves you already in trouble. If your organization solicits bids for the spaceship and that's the only target, you better not accept one before you notice that you also need things like "has the ability to carry occupants" and "doesn't kill the occupants" and "doesn't cost 999 trillion dollars" or else those are all on the chopping block in the interest of going fast.
    So you add those things as targets too and then people come up with new and fascinating ways to meet them by sacrificing other things you wanted but didn't require.
    What's really happening here is that if you set targets and then require someone else to meet them, they will meet the targets in ways that you will not like. It's the principal-agent problem. The only real way out of it is for principals to be their own agents, which is exactly the thing a bureaucracy isn't.
    godelski4 months ago
    I agree with you, in a way.
    I've just taken another step to understand the philosophy of those bureaucrats. Clearly they have some logic, right? So we have to understand why they think they can organize and regulate from the spreadsheet. Ultimately it comes down to a belief that the measurements (or numbers) are "good enough" and that they have a good understanding of how to interpret them. Which with many bureaucracies that is the belief that no interpretation is needed. But we also see that behavior with armchair experts who try to use data to evidence their conclusion rather than interpret data and conclude from that interpretation.
    Goodhart had focused on the incentive structure of the rule, but that does not tell us how this all happens and why the rule is so persistent. I think you're absolutely right that there is a problem with agents, and it's no surprise that when many introduce the concept of "reward hacking" that they reference Goodhart's Law. Yes, humans can typically see beyond the metric and infer the intended outcome, but ignore this because they don't care and so fixate on the measurement because that gives them the reward. Bureaucracies no doubt amplify this behavior as they are well known to be soul crushing.
    But we should also be asking ourselves if the same effect can apply in settings where we have the best of intentions and all the agents are acting in good faith and trying to interpret the measure instead of just game it. The answer is yes. Idk, call it Godelski's Corollary if you want (I wouldn't), but it this relates to Goodhart's Law at a fundamental level. You can still have metric hacking even when agents aren't aware or even intending to do so. Bureaucracy is not required.
    AnthonyMouse4 months ago
    In a sense you can do the same thing to yourself. If you self-impose a target and try to meet it while ignoring a lot of things that you're not measuring even though they're still important, you can unintentionally sacrifice those things. But there's a difference.
    In that case you have to not notice it, which sets a much lower cap on how messed up things can get. If things are really on fire then you notice right away and you have the agency to do something different.
    Whereas if the target is imposed by a far-off hierarchy or regulatory bureaucracy, the people on the ground who notice that things are going wrong have no authority to change it, which means they carry on going wrong.
    Or put it this way: The degree to which it's a problem is proportional to the size of the bureaucracy. You can cause some trouble for yourself if you're not paying attention but you're still directly exposed to "hear reason or she'll make you feel her". If it's just you and your boss who you talk to every day, that's not as good but it's still not that bad. But if the people imposing the target aren't even in the same state, you can be filling the morgue with bodies and still not have them notice.
    godelski4 months ago
    > In a sense you can do the same thing to yourself.
    Of course. I said you can do it unknowingly too.
    > The degree to which it's a problem is proportional to the size of the bureaucracy.
    Now take a few steps more and answer "why". What are the reasons this happens and what are the reasons people think it is reasonable? Do you think it happens purely because people are dumb? Or smart but unintended. I think you should look back at my comment because it handles both cases.
    To be clear, I'm not saying you're wrong. We're just talking about the concept at different depths.
    AnthonyMouse4 months ago
    I don't think the premise that everything is a proxy is right. We can distinguish between proxies and components.
    A proxy is something like, you're trying to tell if hiring discrimination is happening or to minimize it so you look at the proportion of each race in some occupation compared to their proportion of the general population. That's only a proxy because there could be reasons other than hiring discrimination for a disparity.
    A component is something like, a spaceship needs to go fast. That's not the only thing it needs to do, but space is really big so going fast is kind of a sine qua non of making a spaceship useful and that's the direct requirement rather than a proxy for it.
    Goodhart's law can apply to both. The problem with proxies is they're misaligned. The problem with components is they're incomplete. But this is where we come back to the principal-agent problem.
    If you could enumerate all of the components and target them all then you'd have a way out of Goodhart's law. Of course, you can't because there are too many of them. But, many of the components -- especially the ones people take for granted and fail to list -- are satisfied by default or with minimal effort. And then enumerating the others, the ones that are both important and hard to satisfy, gets you what you're after in practice.
    As long as the person setting the target and the person meeting it are the same person. When they're not, the person setting the target can't take anything for granted because otherwise the person meeting the target can take advantage of that.
    > What are the reasons this happens and what are the reasons people think it is reasonable? Do you think it happens purely because people are dumb? Or smart but unintended.
    In many cases it's because there are people (regulators, corporate bureaucrats) who aren't in a position to do something without causing significant collateral damage because they only have access to weak proxies, and then they cause the collateral damage because we required them to do it regardless, when we shouldn't have been trying to get them to do something they're in no position to do well.
    godelski4 months ago
    > I don't think the premise that everything is a proxy is right.
    I said every measurement. That is a key word.
    I know we're operating at a level that most people never encounter, but you cannot in fact measure a meter. You can use a reference tool like a ruler to try to measure distance which is calibrated. But that's a proxy. You aren't measuring a meter, you're measuring with a tool that is estimating a meter. You can get really precise and use a laser. But now you're actually doing a time of flight measurement, where a laser is bouncing off of something and you're measuring the time it takes to come back. Technically you're always getting 2x the measurement but either way you're actually not measuring distance you're measuring a light impulse (which is going to have units like candles or watts) and timing it, which we then convert those units to meters. You can continue this further to even recognize the limits of each of those estimates and this is an important factor if you're trying to determine the sensitivity (and thus error) of your device.
    So I think you really aren't understanding this point. There is no possible way you can directly measure even the most fundamental scientific units (your best chance is going to probably be a mole but quantum mechanics is going to fuck you up).
    > The problem with proxies is they're misaligned. The problem with components is they're incomplete.
    If you pay close attention to what I'm talking about then you might find that these aren't as different as you think they are.
    > If you could enumerate all of the components and target them all then you'd have a way out of Goodhart's law.
    Which is my point. It isn't just that you can't because they are abstract, you can't because the physical limits of the universe prevent you to in even the non-abstract cases.
    I am 100% behind you in that we should better define what we're trying to measure. But this is no different than talking about measuring something with higher precision. Our example above moved from a physical reference device to a laser and a stopwatch. That's a pretty dramatic shift, right? Uses completely different mechanisms. So abstract what you're thinking just a little so we can generalize the concept. I think if you do then we'll be on the same page.
    > In many cases
    I think you misunderstood my point here. Those were rhetorical questions and the last sentence tells you why I used them. They were not questions I needed answering. Frankly, I believe something similar is happening throughout our conversation since you are frequently trying to answer questions that don't need answering and telling me things which I have even directly acknowledged. It's creating a weird situation where I don't know how to answer because I don't know how you'll interpret what I'm saying. You seem to think that I'm disagreeing with you on everything and that just isn't true. For the most part I do agree. But to get you on the same level as me I need you to be addressing why these things are happening. Keep asking why until you don't know. That exists at some depth, right? It's true for everyone since we're not omniscient gods. My conclusion certainly isn't all comprehensive, but it does find this interesting and critical part where we run into something you would probably be less surprised about if you looked at my name.
  - t_serpico4 months ago
    But there is no way to know who is truly the 'best'. The people who position and market themselves to be viewed as the best are the only ones who even have a chance to be viewed as such. So if you're a great researcher but don't project yourself that way, no one will ever know you're a great researcher (except for the other great researchers who aren't really invested in communicating how great you are). The system seems to incentivize people to not only optimize for their output but also their image. This isn't a bad thing per se, but is sort of antithetical to the whole shoulder of giants ethos of science.
    kcexn4 months ago
    The problem is that the best research is not a competitive process but a collaborative one. Positioning research output as a race or a competition is already problematic.
    bwfan1234 months ago
    right. Also, the idea that there is a "best" researcher is already problematic. You could have 10 great people in a team, and it would be hard to rank them. Rating people in order of performance in a team is contradictory to the idea of building a great team. ie, you could have 10 people all rated 10 which is really the goal when building a team.
  - bjornsing4 months ago
    Yeah I think this is a general principle. Just look at the quality of US presidents over time, or generations of top physicists. I guess it’s just a numbers game: the number of genuinely interested people is relatively constant while the number of gamers grows with the compensation and perceived status of the activity. So when compensation and perceived status skyrockets the ratio between those numbers changes drastically.
    godelski4 months ago
    I think the number of generally interested people goes up. Maybe the percent stays the same? But honestly, I think we kill passion for a lot of people. To be cliche, how many people lose the curiosity of a child? I think the cliche exists for a reason. It seems the capacity is in all of us and even once existed.
    bjornsing4 months ago
    To some extent I think that’s just human nature, or even animal nature. The optimal explore / exploit tradeoff changes as we age. When we’re children it’s beneficial to explore. As adults it’s often more beneficial to exploit. But you need cultural and organizational safeguards that protect those of us who are more childish and explorative from those that are more cynical and exploitative. Otherwise pursuits of truth aren’t very fruitful.
  - xvector4 months ago
    I have seen absolutely incredible, best in the world type engineers, much smarter than myself, get fired from my FAANG because of the performance games.
    I persist because I'm fantastic at politics while being good enough to do my job. Feels weird man.
  - nathan_compton4 months ago
    It is pretty simple - if the rewards are great enough and the objective difficult enough, at some point it becomes more efficient to kneecap your competitors rather than to try to outrun them.
    I genuinely thing science would be better served if scientist got paid modest salaries to pursue their own research interests and all results became public domain. So many Universities now fancy themselves startup factories, and startups are great for some things, no doubt, but I don't think pure research is always served by this strategy.
    godelski4 months ago
    > if scientist got paid modest salaries to pursue their own research interests and all results became public domain
    I would make that deal in a heartbeat[0,1].
    We made a mistake by making academia a business. The point was that certain research creates the foundation for others to stand on, but it is difficult to profit off those innovations and by making those innovations public then the society at large will profit by several orders of magnitude more than you would have if you could have. Newton and Leibniz didn't become billionaires by inventing calculus, yet we wouldn't have the trillion dollar businesses and half the technology we have today if they hadn't. You could say the same about Tim Burner Lee's innovation.
    The idea that we have to justify our research and sell it as profitable is insane. It is as if being unaware of the past itself. Yeah, there's lots of failures in research, it's hard to push the bounds of human knowledge (surprise?). But there are hundreds, if not millions, of examples where that innovation results in so much value that the entire global revenue is not enough. Because the entire global revenue stands on this very foundation. I'm not saying scientists need to be billionaires, but it's fucking ridiculous that we have to fight so hard to justify buying a fucking laptop. It is beyond absurd.
    [0] https://news.ycombinator.com/item?id=45422828
    [1] https://news.ycombinator.com/item?id=43959309
  - bwfan1234 months ago
    I would categorize people into 2 broad extremes. 1) those that care two hoots about what others or the system expects of them and in that sense are authentic and 2) those that only care about what others or the system expects of them, and in that sense are not authentic. There is a spectrum in there.
    godelski4 months ago
    If you haven't heard this already, you might be interested in Pournelle's Iron Law of Bureaucracy
    https://jerrypournelle.com/reports/jerryp/iron.html
  - RataNova4 months ago
    Anytime a system gets hyper-competitive and the stakes are high, it starts selecting for people who are good at playing the system rather than just excelling at the underlying skill
  - b00ty4breakfast4 months ago
    that's what happens at the top of most competitive domains. Just take a look at pro sports; guys are looking for millimeters to shave off and they turn to "playing the game" rather than merely improving athletic performance. Watching a football game (either kind) and a not-small portion of the action is guys trying to draw penalties or exploit the rules to get an edge.
  - rightbyte4 months ago
    This is an interesting theory. I think there is something to it. It is really hard to do good in a competitive environment. Very constrained.
  - meindnoch4 months ago
    Goodhart's law
- contrarian12344 months ago
  > Labs used to hire researchers and give them a lot of free reign.
  I can't think of it ever really paying off. Bell Labs is the best example. Amazing research that was unrelated to the core business off the parent company. Microsoft Research is another great one. Lots of interesting research that .. got MS some nerd points? But has materialized into very very few actual products and revenue streams. Moving AI research doesn't help Meta build any motes or revenue streams. It just progresses our collective knowledge.
  On the "human progress" scale it's fantastic to put lots of smart people in a room and let them do their thing. But from a business perspective it seems to almost never pay off. Waiting on the irrational charity of businesses executive is probably not the best way to structure thing.
  I'd tell them to go become academics.. but all the academics I know are just busy herding their students and attending meetings
  - Gigachad4 months ago
    Perhaps these companies just end up with so much money that they can't possibly find ways to spend all of it rationally for purely product driven work and just end up funding projects with no clear business case.
    trenchpilgrim4 months ago
    Or they hire researchers specifically so a competitor or upstart can't hire them and put them to work on something that disrupts their cash cow.
  - gopher_space4 months ago
    The problem here is management expecting researchers to dump out actionable insights like a chicken laying eggs. Researchers exist so that you can rifle through their notes and steal ideas.
    cindyllm4 months ago
    [dead]
  - iisan74 months ago
    It paid off for PARC, iirc the laser printer justified lots of other things that Xerox didn't profit from but turned out to be incredibly important.
  - zipy1244 months ago
    W.l gore and similar companies are excellent examples, of goretex fame and other chemicals. Super interesting management structure called open allocation which is exactly this, employees get to choose what they work on. Valve is similar but slightly less formal.
  - whiplash4514 months ago
    Indeed. And it feels like there is this untold in-between where if you belong to an unknown applied AI team, you don’t have to deal with academia’s yak shaving, you don’t have to deal with Meta’s politics and you end up single handedly inventing TRMs.
  - heavyset_go4 months ago
    How many patents did that research result in that paid off in terms of use, licensing and royalties?
  - godelski4 months ago
    > I can't think of it ever really paying off
    Sure worked for Bell Labs
    Also it is what big tech was doing until LLMs hit the scene
    So I'm not sure what you mean by it never paying off. We were doing it right up till one of those things seemed to pay off and then hyper focused on it. I actually think this is a terrible thing we frequently do in tech. We find promise in a piece of tech, hyper focus on it. Specifically, hyper focus on how to monetizing it which ends up stunting the technology because it hasn't had time to mature and we're trying to monetize the alpha product instead of trying to get that thing to beta.
    > But from a business perspective it seems to almost never pay off.
    So this is actually what I'm trying to argue. It actually does pay off. It has paid off. Seriously, look again at Silicon Valley and how we got to where we are today. And look at how things changed in the last decade...
    Why is it that we like off the wall thinkers? That programmers used to be known as a bunch of nerds and weirdos. How many companies were started out of garages (Apple)? How many started as open source projects (Android)? Why did Google start giving work lifestyle perks and 20% time?
    So I don't know what you're talking about. It has frequently paid off. Does it always pay off? Of course not! It frequently fails! But that is pretty true for everything. Maybe the company stocks are doing great[0], but let's be honest, the products are not. Look at the last 20 years and compare it to the 20 years before that. The last 20 years has been much slower. Now maybe it is a coincidence, but the biggest innovation in the last 20 years has been in AI and from 2012 to 2021 there were a lot of nice free reign AI research jobs at these big tech companies where researchers got paid well, had a lot of autonomy in research, and had a lot of resources at their disposal. It really might be a coincidence, but a number of times things like this have happened in history and they tend to be fairly productive. So idk, you be the judge. Hard to conclude that this is definitely what creates success, but I find it hard to rule this out.
    > I'd tell them to go become academics.. but all the academics I know are just busy herding their students and attending meetings
    Same problem, different step of the ladder
    [0] https://news.ycombinator.com/item?id=45555175
- didip4 months ago
  I always wonder about that. Those $100m Mathematicians... how can they have rooms to think under Meta's crushing IMPACT pressure?
  - trhway4 months ago
    For just 10% of those money a $100M mathematician can hire 10 $1M mathematicians or a whole math dept in some European university to do the work and the thinking for them and thus beat any pressure while resting and vesting on the remaining 90%.
    lblume4 months ago
    Sure, but they weren't hired as managers, right?
    vasco4 months ago
    Ok ok, another $1m/year to hire a manager.
- RataNova4 months ago
  The money chase is real. You can kind of tell who's in it for the comp package vs. who'd be doing the same work on a laptop in their garage if that's what it took
- zer0zzz4 months ago
  > I really think if they just took a step back and stop being so metric focused and let their people freely explore then they'd be win..
  This is very true, and more than just in ai.
  I think if they weren’t so metric focused they probably wouldn’t have hit so much bad publicity and scandal too.
- bboygravity4 months ago
  AI progress has slowed down?! By what metric?
  Quite the statement for anybody who follows developments (without excluding xAI).
- rhetocj234 months ago
  "Maybe if you want to get ahead you gotta stop thinking like everyone else"
  Well for starters you need a leader who can rally the troops who "think(s) different" - something like a S Jobs.
  That person doesnt seem to exist in the industry right now.
- ProofHouse4 months ago
  winning the AI race? Meta? Oh that was a good one. Zuck is a follower not a leader. It is in his DNA
- bobxmax4 months ago
  I thought Alex Wang was a very curious choice. There are so many foundational AI labs with interesting CEOs... I get that Wang is remarkable in his own right, but he basically just built MTurk and timed the bubble.
  Doesn't really scream CEO of AGI to me.
  - godelski4 months ago
    A lot of people also don't know that many of the well known papers are just variations on small time papers with a fuck ton more compute thrown at the problem. Probably the strongest feature that correlates to successful researcher is compute. Many have taken this to claim that the GPU poor can't contribute but that ignores so many other valid explanations... and we wonder why innovation has slowed... It's also weird because if compute was all you need then there's a much cheaper option than Zuck paid. But he's paying for fame.
    crystal_revenge4 months ago
    > A lot of people also don't know that many of the well known papers are just variations on small time papers with a fuck ton more compute thrown at the problem.
    I worked for a small research heavy AI startup for a bit and it was heart breaking how many people I would interact with in that general space with research they worked hard and passionately on only to have been beaten to the punch by a famous lab that could rush the paper out quicker and at a larger scale.
    There were also more than a few instances of high-probability plagiarism. My team had a paper that had been existing for years basically re-written without citation by a major lab. After some complaining they added a footnote. But it doesn't really matter because no big lab is going to have to defend themselves publicly against some small startup, and their job at the big labs is to churn out papers.
    godelski4 months ago
    > only to have been beaten to the punch by a famous lab that could rush the paper out quicker and at a larger scale.
    This added at least a year to my PhD... Reviewers kept rejecting my works saying "add more datasets" and such comments. That's nice and all, but on the few datasets I did use I beat out top labs and used a tenth of the compute. I'd love to add more datasets but even though I only used a tenth of the compute I blew my entire compute budget. Guess state of the art results, a smaller model, higher throughput, and 3rd party validation were not enough (use an unpopular model architecture).
    I always felt like my works were being evaluated as engineering products, not as research.
    > a few instances of high-probability plagiarism
    I was reviewing a work once and I actually couldn't tell if the researchers knew that they ripped me off or not. They compared to my method, citing, and showing figures using it. But then dropped the performance metrics from the table. So I asked. I got them in return and saw that there was no difference... So I dove in and worked out that they were just doing 99% my method with additional complexity (computational overhead). I was pretty upset.
    I was also upset because otherwise the paper was good. The results were nice and they even tested our work in a domain we hadn't. Were they just upfront I would have gladly accepted the work. Though I'm pretty confident the other reviewers wouldn't have due to "lack of novelty."
    It's a really weird system that we've constructed. We're our own worst enemies.
    > their job at the big labs is to churn out papers.
    I'd modify this slightly. Their job is to get citations. Churning out papers really helps with that, but so does all the tweeting and evangelizing of their works. It's an unfortunate truth that as researchers we have to sell our works, and not just by the scientific merit that they hold. People have to read them after all. But we should also note that it is easier for some groups to get noticed more than others. Prestige doesn't make a paper good, but it sure acts as a multiplying factor for all the metrics we use for determining if it is good.
    BobbyTables24 months ago
    It’s funny.
    I learnt the hard way that communications/image/signal processing research basically doesn’t care about Computer Architecture at the nuts and bolts level of compiler optimization and implementation.
    When they encounter a problem whose normal solution requires excessive amounts of computation, they reduce complexity algorithmically using mathematical techniques, and quantify the effects.
    They don’t quibble about a 10x speed up, they reduce the “big O()” complexity. They could care less whether it was implemented in interpreted Python or hand-optimized assembly code.
    On one hand, I know there’s a lot of talent in AI today. But throwing hardware at the problem is the dumbest way forward.
    WiFI adapters would be wheeled luggage if we had the same mentality during their development.
    shwaj4 months ago
    At some point it becomes difficult to improve the O() complexity. How do you do better that the O(n-squared) of the Transformer, with acceptable tradeoffs? Many big brains in all the big labs are very aware of the importance of algorithmic advances. There is no low hanging fruit, but they're doing their best.
    Then in parallel to that looking at compiler optimizations, and other higher-level algorithmic innovations such as Flash Attention (a classic at this point) which had a drastic impact on performance due to cache awareness, without changing the O() complexity.
    tomrod4 months ago
    Sometimes it's the theory, sometimes it's the engineering, and often it's both.
    godelski4 months ago
    > They don’t quibble about a 10x speed up, they reduce the “big O()” complexity. They could care less whether it was implemented in interpreted Python or hand-optimized assembly code.
    I can at least say that's not all of us. But you're probably right that this is dominating. I find it so weird since everyone stresses empirics yet also seems to not care about them. It took me my entire PhD to figure out what was really going on. I've written too many long winded rants on this site though
    helix2784 months ago
    You make it sound like reducing the big O complexity is a dumb thing to do in research, but this is really the only way to make lasting progress in computer science. Computer architectures become obsolete as hardware changes, but any theoretical advances in the problem space will remain true forever.
    BobbyTables24 months ago
    No, my point was the opposite, I agree with you. But the commercial focus on throwing hardware at the problem seems to have gotten entirely out of hand.
    rhetocj234 months ago
    Frankly this is the reason why Im not convinced the current movement of LLMs will yield anything close to the dream.
    The right people to deliver immense progress dont exist right now.
    godelski4 months ago
    > The right people to deliver immense progress dont exist right now.
    I wouldn't go this far. But I would say that we're not giving them a good shot.
    The people are always there, you just need to find them and enable them.
    How do you manage genius? You don’t. — Mervin Kelly
  - thereitgoes4564 months ago
    The reportings at the time said that he was Mark’s 5th choice or similar. It is fairly clear he would prefer Ilya, Murati, Mark Chen, and perhaps others, but they said no, and Alex Wang was the first one to say yes.
    tsunamifury4 months ago
    Why in the world would he want Murati? She has absolutely no technical chops and was not functionally CTO of OpenAI.
    hn_throwaway_994 months ago
    > was not functionally CTO of OpenAI.
    Why do you say that?
    tsunamifury4 months ago
    Her history was entirely non technical up until openAI.
    hn_throwaway_994 months ago
    I think that's total BS, based on this article about her, https://fortune.com/2025/10/03/mira-murati-career-ai-thinkin...
    1. She has 2 BAs, one in math and one in mechanical engineering.
    2. She was an "Advanced Concepts Engineer at Zodiac Aerospace from 2012 to 2013".
    3. She was a product manager at Tesla on the Model X
    4. She was VP of product and engineering at Leap Motion.
    Going from that fact that she wasn't a deep learning researcher to "her history was entirely non technical up until Open AI" is plain false. And plus, the job of CTO is 90%+ people management, and she appears more than smart enough and experienced enough to evaluate technical decisions of her team.
    tsunamifury4 months ago
    I think you havent been in tech long enough to know what that resume is.
    bobxmax4 months ago
    [dead]
    shuckles4 months ago
    Because she was CTO of OpenAI.
    CuriouslyC4 months ago
    Pretty ironic when access to trade secrets and people skills is seen as more important in a technical field than technical competence.
    shuckles3 months ago
    For the record, I doubt the CTO of OpenAI is the best person to fund if you're looking for trade secrets on training and deploying SOTA LLMs. They are two levels too far from reality to know anything useful.
    bobxmax4 months ago
    What technical chops does Sam Altman have?
    seanmcau4 months ago
    He started coding at age 8
    tsunamifury4 months ago
    [flagged]
    bobxmax4 months ago
    [dead]
    arthurcolle4 months ago
    The self-supervised mesa-optimizer strikes again
  - tsunamifury4 months ago
    Alexandr Wang is not interesting and a few steps short of a fraud that Mark had to bail out because he was so co invested.
    Shareholders should be livid if they knew a single thing about what was going on.
    typpilol4 months ago
    Tell me more
    tsunamifury4 months ago
    Scale promised cutting-edge data pipelines and model-training infra but mostly sold outsourced labeling with a tech veneer. Great margins, weak moat — classic Valley overclaim, not outright fraud.
mark_l_watson4 months ago
A great idea, bypassing as much conversion as possible between vector space and natural language tokens. Reminds me of a discussion of having AI’s “talk” to each other using vector space.
There was an interesting quote “plain old BM25 from 1994 outperforms vector search on recall” and super relevant to what I did yesterday. I am trying to use small local models more often and yesterday I wrote Common Lisp code that uses a large corpus of text and a user query or prompt to construct a fairly concise one-shot prompt with select context from the text corpus. This is RAG, and I used both BM25 and vector embeddings matching. I added the code and an example as a new chapter in my CL book (link directly to new material: https://leanpub.com/lovinglisp/read#leanpub-auto-autocontext...) yesterday afternoon. BM25 is fast. This is new code, and I will certainly be experimenting more with it, but as-is it is useful when working with small local LLMs.
schmorptron4 months ago
One thing I don't get about the ever-reoccuring RAG discussions and hype men proclaiming "Rag is dead", is that people seem to be talking about wholly different things? My mental model is that what is called RAG can either be:
- a predefined document store / document chunk store where every chunk gets a a vector embedding, and a lookup decides what gets pulled into context as to not have to pull whole classes of document, filling it up
- the web search like features in LLM chat interfaces, where they do keyword search, and pull relevant documents into context, but somehow only ephemerally, with the full documents not taking up context in the future of the thread (unsure about this, did I understand it right?) .
with the new models with million + tokens of context windows, some where arguing that we can just throw whole books into the context non-ephemerally, but doesnt that significantly reduce the diversity of possible sources we can include at once if we hard commit to everything staying in context forever? I guess it might help with consistency? But is the mechanism with which we decide what to keep in context not still some kind of RAG, just with larger chunks of whole documents instead of only parts?
I'd be extatic if someone who really knows their stuff could clear this up for me
- kgeist4 months ago
  Technically, RAG is anything that augments generation with external search. However, it often has a narrower meaning: "uses a vector DB."
  Throwing everything into one large context window is often impractical - it takes much more time to process, and many models struggle to find information accurately if too much is going on in the context window ("lost in the middle").
  The "classic" RAG still has its place when you want low latency (or you're limited by VRAM) and the results are already good enough.
- impossiblefork4 months ago
  We can't throw in infinite things in the context though.
  My impression is that GPT-5 gets confused, not quite right away, but after a couple of pages it has no idea. It doesn't take pages on pages before it forgets things.
  - aerhardt4 months ago
    I’m currently experimenting with prompts of ~300k tokens for a certain classification task and I think I might be able to make it work. GPT5 chokes but Gemini 2.5 Pro is showing promise. Jury’s still out and I might change my tune in a couple of weeks.
    impossiblefork4 months ago
    It should also be said, that what I say here is focused on things where these models have problems.
    For example, I consider the model confused when it starts outputting stereotyped or cliche responses, and I intentionally go at problems that I know that the models have problems with (I already know they can program and do some maths, but I want to see what they can't do). But if you're using them for things they're made for, and which aren't confusing, such as people arguing with each other, then you are probably likely to succeed.
    Prompts with lots of examples are reasonable and I know they can get very long.
- GistNoesis4 months ago
  The answer is adaptability.
  In both cases for "Question Answering" it's about similarity search but there are two main orthogonal differences between RAG and Non-RAG :
  -Knowing the question at the time of index building
  -Higher order features : the ability to compare fetched documents with one another and refine the question
  Non-RAG, aka multi-layer (non-causal) transformer with infinite context, is the more generic version, fully differentiable meaning you can use machine learning to learn how to Non-RAG better. Each layer of the transformer can use the previous layer to reason and refine the similarity search. (A causal transformer know the question at the time when it is feed the question, and can choose to focus it's attention on different part of the previously computed features of the provided documents but may benefit from having some reflection token, or better : be given the question before being presented the documents (provided you've trained it to answer it like that).)
  RAG is an approximation of the generic case to make it faster and cheaper. Usually it breaks end-to-end differentiability by using external tools, so this mean that if you want to use machine learning to learn how to RAG better you will need to use some variant of Reinforcement Learning which is slower to learn things. RAG usually don't know the question at the time of index building, and documents are treated independently of each other, so no (automatic) higher order features (embeddings are fixed).
  A third usual approximation, is to feed the output of RAG into Non-RAG, to hopefully get the best of both world. You can learn the Non-RAG given RAG with machine learning (if you train it with some conversations where it used RAG), but the RAG part won't improve by itself.
  Non-RAG need to learn so it needs a big training dataset, but fortunately it can pick-up question answer pair in an unsupervised fashion when you feed it the whole web, and you only need a small instruction training and preference optimization dataset to shape it to your need. If performance isn't what you expect in a specific case, you can provide more specific examples and retrain the model until it gets it and you get better performance for the case you were interested in. You can improve the best case but it's hard to improve the worst case.
  RAG has more control on what you feed it but content should be in a more structured way. You can prevent worst cases more easily but it's hard to improve good case.
- edanm4 months ago
  > My mental model is that what is called RAG can either be:
  RAG is confusing, because if you look at the words making up the acronym RAG, it seems like it could be either of the things you mentioned. But it originally referred to a specific technique of embeddings + vector search - this was the way it was used in the ML article that defined the term, and this is the way most people in the industry actually use the term.\
  It annoys me, because I think it should refer to all techniques of augmenting, but in practice it's often not used that way.
  There are reasons that specifically make the "embeddings" idea special - namely, it's a relatively new technique that actually fits LLM very well, because it's a semantic search - meaning, it works on "the same input" as LLMs do, which is a free-text query. (As opposed to a traditional lookups that work on keyword search or similar.)
  As for whether RAG is dead - if you mean specifically vector-embeddings and semantic search, it's possible - because you could theoretically use other techniques for augmentation, e.g. an agent that understands a user question about a codebase and uses grep/find/etc to look for the information, or composes a search to search the internet for something. But it's definitely not going to die in that second sense of "we need some way to augment LLMs knowledge before text generation", that will probably always be relevant, as you say.
  - schmorptron4 months ago
    Okay yeah that makes sense, thanks!
- make34 months ago
  no one is saying rag is dead, you're never going to put the whole Internet in the context of the model, & the more you put the more expensive it is.
  - viraptor4 months ago
    Lots of people say rag is dead: https://kagi.com/search?q=rag+is+dead&r=au&sh=g52XEb93vx691I...
zem4 months ago
this was really weird to read:
> But RAG is a very real world, practical topic for something as significant as a new lab’s first paper.
I would expect exactly the opposite - that a new lab would put out a few random papers that happen to be in areas their researchers were interested in and already working on, and once people had been working together a while and developed some synergy they would maybe come out with something really groundbreaking.
do people really view a "first paper" as something deeply significant and weighty? because that just seems like a good way to get bogged down in trying to second guess whether any given paper was good enough to be your all-important debut!
- Al-Khwarizmi4 months ago
  As an academic I would expect the same as you, and no, to my knowledge "first paper" is meaningless, at least in academia. Most people's first paper is some small contribution to what their PhD supervisor is doing at the time, where the student tries their best at writing but it ends up so heavily edited that probably 90% of the final text comes from the supervisor :) So typically first papers don't define or represent a researcher. When you start you just don't have the experience to have a great idea and carry it through to a good paper.
  Of course here we are talking about a lab, not an individual person, but still I haven't heard of first papers being considered special in any way, even for labs.
- 4 months ago
  undefined
elyobo4 months ago
Can we have a more informative, less clickbaity, title?
- dang4 months ago
  What would a more informative, less clickbaity title be?
  (preferably using representative language from the article)
  - airstrike4 months ago
    Meta Superintelligence Labs' first paper is about RAG
    dang4 months ago
    Ok thanks! Belatedly updated.
- smeeger4 months ago
  there should be a guideline to get rid of clickbait titles. its an epidemic here
  - dang4 months ago
    There is of course such a guideline: https://news.ycombinator.com/newsguidelines.html
    We don't catch every case, but if you're talking about the frontpage, I'm surprised to hear you say "epidemic". What are some recent examples?
    lanyard-textile4 months ago
    I wouldn’t give much weight to the person that had an opinion about the guidelines without reading them :)
    dang4 months ago
    Acknowledging their existence in principle is already a lot!
jongjong4 months ago
Interesting. All developers I know who tinkered around with embeddings and vector similarity scoring were instantly hooked. The efficiency of computing the embeddings once and then reusing as many times as needed, comparing the vectors with a cheap <30-line function is extremely appealing. Not to mention the indexing capabilities to make it work at scale.
IMO vector embedding is the most important innovation in computing of the last decade. There's something magical about it. These people deserve some kind of prize. The idea that you can reduce almost any intricate concept including whole paragraphs to a fixed-size vector which encapsulates its meaning and proximity to other concepts across a large number of dimensions is pure genius.
- _jayhack_4 months ago
  Vector embedding is not an invention of the last decade. Featurization in ML goes back to the 60s - even deep learning-based featurization is decades old at a minimum. Like everything else in ML this became much more useful with data and compute scale
  - senderista4 months ago
    Yup, when I was at MSFT 20 years ago they were already productizing vector embedding of documents and queries (LSI).
    jongjong4 months ago
    Interesting. Makes one think.
    senderista4 months ago
    To be clear, LSA[1] is simply applied linear algebra, not ML. I'm sure learned embeddings outperform the simple SVD[2] used in LSA.
    [1] https://en.wikipedia.org/wiki/Latent_semantic_analysis
    [2] https://en.wikipedia.org/wiki/Singular_value_decomposition
- liampulles4 months ago
  If you take the embedding for king, subtract the embedding for male, add the embedding for female, and lookup the closest embedding you get queen.
  The fact that dot product addition can encode the concept of royalty and gender (among all other sorts) is kind of magic to me.
  - puttycat4 months ago
    This was actually shown to not really work in practice.
    intelkishan4 months ago
    I have seen this particular work example to work. You don't get the exact match but the closest one is indeed Queen.
    godelski4 months ago
    Yes but it doesn't generalize very well. Even on simple features like gender. If you go look at embeddings you'll find that man and woman are neighbors, just as king and queen are[0]. This is a better explanation for the result as you're just taking very small steps in the latent space.
    Here, play around[1]
    mother - parent + man = woman father - parent + woman = man father - parent + man = woman mother - parent + woman = man woman - human + man = girl
    Or some that should be trivial
    woman - man + man = girl man - man + man = woman woman - woman + woman = man
    Working in very high dimensions is funky stuff. Embedding high dimensions into low dimensions results in even funkier stuff
    [0] https://projector.tensorflow.org/
    [1] https://www.cs.cmu.edu/~dst/WordEmbeddingDemo/
    liampulles4 months ago
    Thank you for the comment!
    This led me to do a bit more research, and I see indeed the queen result is in itself infact "cheating" a bit: https://blog.esciencecenter.nl/king-man-woman-king-9a7fd2935...
    #TheMoreYouKnow
    yellowcake04 months ago
    so addition is not associative?
    godelski4 months ago
    I think you're missing the point
    yellowcake04 months ago
    It's a pretty exotic type of addition that would lead to the second set of examples, just trying to get an idea of its nature.
    godelski4 months ago
    Calling it addition is hairy here. Do you just mean an operator? If so, I'm with you. But normally people are expecting addition to have the full abelian group properties, which this certainly doesn't. It's not a ring because it doesn't have the multiplication structure. But it also isn't even a monoid[0] since, as we just discussed, it doesn't have associativity nor unitality.
    There is far less structure here than you are assuming, and that's the underlying problem. There is local structure and so the addition operation will work as expected when operating on close neighbors, but this does greatly limit the utility.
    And if you aren't aware of the terms I'm using here I think you should be extra careful. It highlights that you are making assumptions that you weren't aware were even assumptions (an unknown unknown just became a known unknown). I understand that this is an easy mistake to make since most people are not familiar with these concepts (including many in the ML world), but this is also why you need to be careful. Because even those that do are probably not going to drop these terms when discussing with anyone except other experts as there's no expectation that others will understand them.
    [0] https://ncatlab.org/nlab/show/monoid
    yellowcake04 months ago
    I think you misinterpreted the tone of my original comment as some sort of gotcha. Presumably you're overloading the addition symbol with some other operational meaning in the context of vector embeddings. I'm just calling it addition because you're using a plus sign and I don't know what else to call it, I wasn't referring to addition as it's commonly understood which is clearly associative.
    danielmarkbruce4 months ago
    You guys are debating this as though embedding models and/or layers work the same way. They don't.
    Vector addition is absolutely associative. The question is more "does it magically line up with what sounds correct in a semantic sense?".
    yellowcake04 months ago
    I'm just trying to get an idea of what the operation is such that man - man + man = woman, but it's like pulling teeth.
    danielmarkbruce4 months ago
    It's just plain old addition. There is nothing fancy about the operation. The fancy part is training a model such that it would produce vector representations of words which had this property of conceptually making sense.
    If someone says: "conceptually, what is king - man + woman". One might reasonably say "queen". This isn't some well defined math thing, just sort of a common sense thing.
    Now, imagine you have a function (lets call it an "embedding model") which turns words into vectors. The function turns king into [3,2], man into [1,1], woman into [1.5, 1.5] and queen into [3.5, 2.5].
    Now for king - man + woman you get [3,2] - [1,1] + [1.5,1.5] = [3.5, 2.5] and hey presto, that's the same as queen [3.5, 2.5].
    Now you have to ask - how do you get a function to produce those numbers? If you look at the word2vec paper, you'll come to see they use a couple of methods to train a model and if you think about those methods and the data, you'll realize it's not entirely surprising (in retrospect) that you could end up with a function that produced vectors which had such properties. And, if at the same time you are sort of mind blown, welcome to the club. It blew Jeff Dean's big brain too.
    godelski4 months ago
    > It's just plain old addition
    I'm sorry, but I think you are overestimating your knowledge.
    Have you gone through abstract algebra? Are you familiar with monoids, groups, rings, fields, algebras, and so on?
    Because it seems you aren't aware that these structures exist and area critical part of mathematics. It's probably why you're not understanding the conversation. @yellocake seems to understand that "addition" doesn't mean 'addition' (sorry, I assumed you meant how normal people use the word lol). You may not realize it, but you're already showing that addition doesn't have a single meaning. 1+1 = 2 but [1,0] + [0, 1] = [1,1] and 1+0i + 0+i = 1 + i. The operator symbol is the same but the operation actually isn't.
    > Now for king - man + woman you get [3,2] - [1,1] + [1.5,1.5] = [3.5, 2.5] and hey presto, that's the same as queen [3.5, 2.5].
    The same as? Or is queen the closest?
    If it were just "plain old addition" then @yellowcake (or me![0]) wouldn't have any confusion. Because
    man - man + man = (man - man) + man = 0 + man = man != woman
    We literally just proved that it isn't "plain old addition". So stop being overly confident and look at the facts.
    >>> Vector addition is absolutely associative
    This is commonly true, but not necessarily. Floating point arithmetic is not associative.
    > you'll realize it's not entirely surprising that you could end up with a function that produced vectors which had such properties
    Except it doesn't work as well as you think, and that's the issue. There are many examples of it working and this is indeed surprising, but the effect does not generalize. If you go back to Jeff's papers you'll find some reasonable assumptions that are also limiting. Go look at "Distributed Representations of Words and Phrases and their Compositionality"[1] and look at Figure 2. See anything interesting? Notice that the capitals aren't always the closest? You might notice Ankara is closer to Japan than Tokyo. You'll also notice that the lines don't all point in the same direction. So if we made the assumption that the space was well defined then clearly we aren't following the geodesic. But you probably didn't realize a second issue, PCA only works on linear representations. Yet the model is not linear. Now there aren't many details on what they did for the PCA, but it is easy to add information implicitly and there's a good chance that happened here. The model definitely still is facing the challenges of metrics in high dimensional spaces, where notions such as distance become ill-defined.
    I've met Jeff and even talked with him at length. He's a brilliant dude and I have no doubt about that. But I don't believe he thinks this works in general. I'm aware he isn't a mathematician, but anyone who plays around with vector embeddings will experience the results I'm talking about. He certainly seems to understand that there are major limits to these models but also that just because something has limits doesn't mean it isn't useful. The paper says just as much and references several works that go into that even further. If you've misinterpreted me as saying embeddings are not useful then you're sorely mistaken. But neither should we talk about tools as if they are infallible and work perfectly. All that does is makes us bad tool users.
    [0] I also have no idea what mathematical structure vector embeddings follow. I'm actually not sure anyone does. This is definitely an under researched domain despite it being very important. This issue applies to even modern LLMs! But good luck getting funding for that kind of research. You're going to have a hard time getting it at a big lab (despite having high value) and you don't have the time in academia unless you're tenured, but then you got students to prioritize.
    [1] https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec...
    danielmarkbruce4 months ago
    Maybe spend more time reading a response than writing. Yellowcake doesn't know what you are talking about either (note the "pulling teeth" comment).
    The examples you gave are a result of the embedding model in question not producing vectors which would map to most peoples conceptual view of the world. Go through that website you quote from and see for yourself - it's just element wise addition.
    The examples I gave are entirely made up, 2 dimensional vectors to explain what plain old addition means (ie, plain old "add the vectors element wise") in the context of embedding models. And yes, it's the same as, because I defined it that way. Your website uses 300 dimensions, not 2.
    As I mentioned, not all embedding models work the same way (or, as you've said, "this doesn't generalize"). They get trained differently, on different data. The word "similar" is used very loosely.
    You even directly quote me and don't seem to be able to read the quote. The word "could" is there. You could end up with a model which had these nice properties.
    The entire point of my post was to highlight that yellowcake's confusion arises because he assumes it's an esoteric definition of addition that results in your examples, when it's not that.
    godelski4 months ago
    > Maybe spend more time reading a response than writing.
    Quite ironic considering
    > Yellowcake doesn't know what you are talking about either
    I actually said
    >> @yellocake seems to understand that "addition" doesn't mean 'addition'
    Which is entirely based off of
    >>>>>> Presumably you're overloading the addition symbol
    I didn't assume their knowledge, they straight up told me and I updated my understanding based on that. That's how conversations work. And the fact that they understand operator overloading doesn't mean they understand more either. Do they understand monoids, fields, groups, and rings? Who knows? We'll have to let yellowcake tell us.
    Regardless, what you claim I assumed about yellowcake's knowledge is quite different than what I actually said. So maybe take your own advice.
    I write a lot because, unlike you, I understand these things are complex. Were it simpler, I would not need as many words.
    danielmarkbruce4 months ago
    Yeah except addition does mean addition in this case - ask anyone what plain old addition means for a vector, and they'll tell you element wise addition. The website you quoted is for a simple example using element wise addition and you made it sound as complex as possible because you are desperate to sound smart.
    godelski4 months ago
    > you are desperate to sound smart.
    Because I don't think 1-1+1=7?
    Whatever you say man
    danielmarkbruce4 months ago
    You really don't understand that the illogical sounding results from that website are due to the vectors themselves huh. It has zero to do with the definition of +.
    godelski4 months ago
    Please, tell me more. I was naively under the impression that normal addition had Abelian group properties[0]. Maybe you can inform me as what the inverse element is. That will get me to change my mind
    [0] https://en.wikipedia.org/wiki/Abelian_group
    danielmarkbruce4 months ago
    You’re lost in abstractions. ‘King’ and ‘queen’ and 'man' etc etc aren’t algebraic symbols, they’re mapped to vectors of real numbers. The model learns those mappings, then we just add and subtract numbers element wise. That’s it. You’re giving a group theory lecture about an operation that’s literally just a[i] + b[i]. The semantics come from training, not from some deep mathematical revelation you think everyone missed.
    godelski4 months ago
    > they’re mapped to vectors of real numbers
    Yes, I'm in agreement here. But you need to tell me how
    a - a + a = b
    Use what ever the fuck you want for a. A vector (e.g. [1,2,3]), a number (e.g. 1), an embedding (e.g. [[1,2,3],[4,5,6]]), words (e.g. "man"), I really don't give a damn. You have to tell me why b is a reasonable answer to that equation. You have to tell me how a==b while also a!=b.
    Because I expect the usual addition to be
    a - a + a = a
    This is the last time I'm going to say this to you.
    You're telling me I'm lost in abstraction and I'm telling you is not usual addition because a != b. That's it! That's the whole fucking argument. You literally cannot see the contradiction right in front of you. The only why it is usual addition is if you tell me "man == woman" because that is literally the example from several comments ago. Stop being so smart and just read the damn comment
    danielmarkbruce4 months ago
    a - a + a = b when a and b map to the same vector (or in practice, extremely close together). Your assumptions about invertibility etc don't hold in this world.... embeddings are just a bunch of empirically learned coordinates in a dense space.
    So an example: a maps to [1,2,3] and b maps to [1,2,3] . Again in practice b could map to [1,2,3.0001] or something.
    To summarize: king, man etc aren't symbols, they get mapped to vectors. + is element wise addition. = is "equal to or very close in multi dimensional space".
    Maybe tone down the attitude. You clearly aren't in this field. The properties you have assumed to be true are not. People in AI/ML are using terms and conventions differently than you assume. When someone says "vector addition" they really do mean just element wise addition in practically every case. You are the fool here.
    godelski4 months ago
    man - man + man = woman woman - woman + woman = man => man = woman > Your assumptions about invertibility etc don't hold in this world
    Yes? Thats what I've said lol. That's what the above example shows. THAT WAS THE ENTIRE POINT
    > So an example: a maps to [1,2,3] and b maps to [1,2,3] . Again in practice b could map to [1,2,3.0001] or something. >>>>>>>>>> Floating point arithmetic is not associative.
    I'm glad you finally decided to agree with me. But it would have been a lot faster had you actually read my comments.
    danielmarkbruce4 months ago
    Except you were suggesting it's due to the definition of +, and your silly, irrelevant rant about abstract algebra started when I noted it's plain old addition.
    It holds for integers too, floating point arithmetic quirks are irrelevant.
    You are applying a bunch of ideas that are irrelevant because you don't have any idea how embedding models actually work.
    mirekrusin4 months ago
    Shouldn't this itself be a part of training?
    Having set of "king - male + female = queen" like relations, including more complex phrases to align embeddings.
    It seems like terse, lightweight, information dense way to address essence of knowldge.
- ekidd4 months ago
  Vector embeddings are slightly interesting because they come pre-trained with large amounts of data.
  But similar ways to reduce huge numbers of dimensions to a much smaller set of "interesting" dimensions have been known for a long time.
  Examples include principal component analysis/single value decomposition, which was the first big breakthrough in face recognition (in the early 90s), and also used in latent semantic indexing, the Netflix prize, and a large pile of other things. And the underlying technique was invented in 1901.
  Dimensionality reduction is cool, and vector embedding is definitely an interesting way to do it (at significant computational cost).
- CuriouslyC4 months ago
  Vector embeddings are so overhyped. They're decent as a secondary signal, but they're expensive to compute and fragile. BM25 based solutions are more robust and WAY lower latency, at the cost of some accuracy loss vs hybrid solutions. You can get the majority of the lift from hybrid solutions with ingest time semantic expansion/reverse hyde type input annotation with a sparse embedding BM25 at a fraction of the computational cost.
  - jongjong4 months ago
    But it's much cheaper to compute than inference, and also you only have to compute once for any content and reuse multiple times.
- calf4 months ago
  The idea of reducing language to mere bits, in general, sounds like it would violate the Godel/Turing theorems about computability.
Imnimo4 months ago
I'm curious whether this is work that was specifically begun under the "superintelligence" umbrella, or if it's just that the people who were working on it had been shifted to the Superintelligence team by the time they wrote the paper. I would guess the former?
- lblume4 months ago
  Another commenter claims the latter: https://news.ycombinator.com/item?id=45554169
pbd4 months ago
https://github.com/simulanics/REFRAG
Palmik4 months ago
The observation about the "block-diagonal patterns" in RAG isn't new and has been exploited / explored before:
- https://arxiv.org/abs/2410.07590 (literally titled "Block-Attention for Efficient RAG")
- https://arxiv.org/abs/2409.15355v3
- https://arxiv.org/abs/2212.10947
The REFRAG paper does not cite any of these.
CShorten4 months ago
Here is a video I made diving into the paper, hopefully helpful!
https://www.youtube.com/watch?v=Ek0tZootK00
- htk4 months ago
  I like your style, subscribed!
  - CShorten4 months ago
    Thank you so much!
mountainriver4 months ago
This was a very obvious next step, I played around with implementing something similar at one point.
In general we need to make it simpler for LLMs to take in different forms of embeddings. At least frameworks that simplify it.
yalogin4 months ago
I am not surprised because the culture at meta is not at all, even in the slightest, to focus on science for the sake of it. It’s actively actively purged out of you. The focus is on metrics and how the bottom line is impacted. So this is in line with that
- georgeburdell4 months ago
  It’s not that simple. I worked at a supplier of Meta and they paid us large NREs to fund our exploratory work
- rhetocj234 months ago
  Yeah and this problem is near impossible to fix once it has infested into the culture of the firm.
  - DangitBobby4 months ago
    It's not always a bad thing though, like in this case they looked for a practical win and found one because impractical wins can't make them money.
- alex11384 months ago
  "People are using our service more!" turns out to be a horrible metric when they outright lie to you (x has sent you a message! - when no message exists)
nmca4 months ago
This is not work by any of the high profile new hires, in case folks are confused.
puttycat4 months ago
Seems very incremental and very far from the pompous 'superintelligence' goal.
- antonvs4 months ago
  It’s unlikely that the existing LLM architecture will evolve into anything that resembles superintelligence any more than it does already.
  Which means that modifications to the architecture, and combining it with other components and approaches, are the next likely step. This paper fits that.
- btilly4 months ago
  If you can collapse "retrieve this complex chunk when it is needed" into a single token, what else can you put into a token?
  "Send this through the math coprocessor." "Validate against the checklist." "Call out to an agent for X." "Recheck against input stream Y." And so on.
  Retrieval augmentation is only one of many uses for this. If this winds up with better integration with agents, it is very possible that the whole is more than the sum of its parts.
- lukev4 months ago
  Think about it this way; they are encoding whole "thoughts" or "ideas" as single tokens.
  It's effectively a multimodal model, which handles "concept" tokens alongside "language" tokens and "image" tokens.
  A really big conceptual step, actually, IMO.
- naasking4 months ago
  A 30 fold improvement seems a tad more than incremental.
  - vasco4 months ago
    I can start brushing my teeth 30 times faster but it won't change my life. This is nice for RAG but it's a very localized improvement. And 30× sounds big but is just an order of magnitude improvement also.
    naasking4 months ago
    Brushing your teeth is not central to your life, recalling facts correctly is, and a 30 fold improvement in the latter very well could change your life. I'll leave it to you to figure out which is a better analogy to RAG.
    vasco4 months ago
    Just remember that in this example you don't remember 30x more things, you just remember the same things 30x faster. That is a significant difference.
bigcat123456784 months ago
https://docs.lamini.ai/memory_rag/ Similar approaches have been tried before already
macleginn4 months ago
So this looks essentially like continuous prompting (see prefix tuning) with RL-driven selection of what to present as tokens and what as continuous inputs (embeddings).
SknCode4 months ago
I am not sure if I understand things correctly.
I came to believe the LLMs work with token embeddings. Is then the REFRAG only "something" in front of the LLM, and the decoder is the RL policy which expands only some token chunk embeddings into token embeddings feedable to LLM? Or the REFRAG needs you to 'tune' the LLM to be able to work with both token embeddings and token chunk embeddings?
armcat4 months ago
I couldn't immediately see in their graphs/tables any comparison against simple lexical/statistical based context compression, such as candidate selection of chunks using TF-IDF, word overlap etc. For most of us in the industry we need to find these quick wins that give us equivalent performance to sending huge amount of information to the LLM, while compressing by 10x.
naasking4 months ago
> the core insight here is actually: if embeddings are generated by layers within the LLM, it makes no sense to convert them back to natural language, just for another LLM to compress those tokens back to embeddings.
Doesn't this tie the two layers together in a way that they can't evolve separately?
asim4 months ago
This was inevitable. You can't keep training LLMs and expect that's the answer to the evolution of AI. Yes it'll happen and we'll keep creating new more refined and bigger models but it's like DNA or something like the cortex of the brain. After that you need these systems that essentially "live" for years digesting information and develop a more refined way to process, store and retrieve the information. Compression of RAG was also inevitable. It's like the btree index of a database. The thing is, we're probably one or two iterations away from being good enough on the RAG pipeline and then we'll need to focus more on the other pieces of sensory input that need to be connected and processed at higher throughput. Right now it's not fast or efficient enough. This is where the likes of Google will shine. They are probably two decades ahead of everyone on internal technology and there is some team with the breakthrough but it hasn't seen the light of day yet. What's coming out of DeepMind is really a forced effort in productization and publication of work in a consumable format but internally they are likely way ahead. I don't have as much faith in Meta's efforts despite seeing things like this. Quite frankly those people, the ones doing the work should move to more honourable companies. Not feed crack addiction in the form of Meta's universe.
- smeeger4 months ago
  exactly. the real focus internally is working on new architectures. there is no other possibility.
koolala4 months ago
Did a "superintelligence" lab publish a superintelligence related paper with no results for intelligence? What measured improvements did this proposal make in their LLM's intelligence?
aurohacker4 months ago
Figure 1 in the paper is all about the encoder and how the context and query is packaged and sent to the decoder. I wish it were more complete...
bigyabai4 months ago
> Long awaited first paper from Meta Superintelligence Labs is not a model layer innovation. What does this mean?
It means you're reading into it too much and need to be let down, gently, from the hype train.
mikepalmer4 months ago
I hate articles that don't define their acronyms! Lazy? Intentionally exclusive?
So that others don't also have to look it up, it's Retrieval-Augmented Generation (RAG).
They even say it's "a topic that we didn’t expect"... so... perhaps many people wouldn't have heard of it?
4 months ago
undefined
RataNova4 months ago
Refreshing (and slightly unexpected) to see Meta Superintelligence start with something this practical instead of a headline-grabbing new model
singularity20014 months ago
somewhere in my hacker news comment history I presented this very idea
foldl20224 months ago
So, show me the model weights, please.
i5heu4 months ago
Can we please get rid of the clickbait titles?
pppoe4 months ago
I find it absurd that, compared to the past, large companies now have more abundant stock prices and cash than ever before, yet nearly every AI Lab in these companies is facing greater pressure than ever, being asked to generate short-term profits. In the midst of AI's unprecedented boom, the research environment and atmosphere in the industry seem to have worsened compared to the past.
- signatoremo4 months ago
  Is this Meta’s lab pressured to generate short term profits?
  Which other under pressure labs are you talking about?
  - inquirerGeneral4 months ago
    [dead]
- sefrost4 months ago
  Is it because of the "winner takes all" and "lock-in effects" of being the first to market?
cm20124 months ago
At first I thought the super intelligence wrote a novel scientific paper
dangsecondalt4 months ago
[dead]
nine_k4 months ago
A great post, it starts with this:
TL;DR
• MSI’s first paper, REFRAG, is about a new way to do RAG.
• This slightly modified LLM converts most retrieved document chunks into compact, LLM-aligned chunk embeddings that the LLM can consume directly.
• A lightweight policy (trained with RL) decides which chunk embeddings should be expanded back into full tokens under a budget; the LLM runs normally on this mixed input.
• The net effect is far less KV cache and attention cost, much faster first-byte latency and higher throughput, while preserving perplexity and task accuracy in benchmarks.
I wish more long posts followed this model of a scientific paper.
xvector4 months ago
Working in big tech it's pretty wild to see how integral AI has become to our work internally, vs the public perception of it. People are NOT prepared.
- terminalshort4 months ago
  1. Hyperbolic statement about LLM capabilities with no concrete examples
  2. Wild claim that the companies that sell LLMs are actually downplaying their capabilities instead of hyping them
  - crorella4 months ago
    Personal experience here in a FAANG, there has been a considerable increase in: 1. Teams exploring how to leverage LLMs for coding. 2. Teams/orgs that already standardized some of the processes to work with LLMs (MCP servers, standardized the creation of the agents.md files, etc) 3. Teams actively using it for coding new features, documenting code, increasing test coverage, using it for code reviews etc.
    Again, personal, experience, but in my team ~40-50% of the PRs are generated by Codex.
    ruszki4 months ago
    “Teams exploring how to leverage [AI]s for [anything]” is true for about a decade now in every large multinational companies at every level. It’s not new at all. AI is the driving buzzword for a while now, even well before ChatGPT. I’ve encountered many people who just wanted the stamp that they use AI, no matter how, because my team was one of the main entry point to achieve this at that specific company. But before ChatGPT and co, you had to work for it a lot, so most of them failed miserably, or immediately backtracked when they realized this.
    rhetocj234 months ago
    Im sure the MBA folks love stats like that - theres plenty that have infested big tech. I mean Pichai is an MBA+Mckinsey Alumni.
    Ready for the impending lay off fella?
    alex-nt4 months ago
    There are places that offer Copilot to any team that wants it, and then behind the scenes they informed their managers that if the team (1+ persons) adopts it they will have to shed 10%+ human capacity (lose a person, move a person, fire a person) in the upcoming quarters next year.
  - danielmarkbruce4 months ago
    Yup, he's totally lying. Not happening. Just carry on.
    BoorishBears4 months ago
    Agreed, but why are they lying?
    danielmarkbruce4 months ago
    That was sarcasm. He's not lying.
    BoorishBears4 months ago
    Didn't read any sarcasm in what he said?
    danielmarkbruce4 months ago
    Sorry, I meant my comment was sarcasm. I was being sarcastic. The original comment was sincere, I'm quite certain. And, they are right - there are some companies that really are getting a lot of value out of LLMs already. I'd guess that the more folks who actually understand how LLMs work, the more a company can do. There just isn't a neat abstraction layer to be had, so folks who don't have a detailed mental model get caught up applying them poorly or to the wrong things.
- incompatible4 months ago
  I've heard of one study that said AI slows developers down, even when they think it's helping.
  https://www.infoworld.com/article/4061078/the-productivity-p...
  - xvector4 months ago
    AI may slow coding a bit but dramatically reduces cognitive load.
    The real value of AI isn't in helping coding. It's in having a human-like intelligence to automate processes. I can't get into details but my team is doing things that I couldn't dream of three years ago.
    qingcharles4 months ago
    It does dramatically reduce cognitive load. I think that part is understated and lost to the headline of how it writes two thousand lines of code in 30 seconds.
  - naasking4 months ago
    It is true sometimes, but other times it saves hours. We're all still in the learning stage of how best to use these new tools, and their capabilities are growing constantly.
- fishmicrowaver4 months ago
  Not prepared for what? Seems like the rest of the world is desperate to be shown the way to unlock something of value?
  - Workaccount24 months ago
    I think at this point it's software devs looking for the value unlock.
    Non-software devs are actually making functional programs for themselves for the first time ever. The value is crazy.
    ceejayoz4 months ago
    It’s not the first time ever. People did the same with Access and HyperCard in the 90s.
    fishmicrowaver4 months ago
    Sure but I'm the real world do you think businesses are going to deploy piles of code into production generated this way? No, non technical people will continue to whip up MS PowerApps. AI generated code has no value to many businesses.
    xvector4 months ago
    The value of AI is not in generating code. That's just a "nice-to-have."
    The value of AI is in having a scalable, human-like decision maker that you can plug into anything, anywhere. This has unlocked countless use cases for my team, that we could scarcely imagine a few years ago.
    cbg04 months ago
    "Human-like decision maker" except it's just as if not more unpredictable than a human, has no understanding of what it's actually outputting or the impact of it, and it isn't concerned with losing their job or facing legal repercussions for their actions.
    xvector4 months ago
    There are plenty of ways to manage those drawbacks, and a mind-boggling number of use cases where it's "good enough" already.
    But it's not my job to convince you, my lived experience working with the tech is enough to convince me, and that's all I care about, to be honest. Everyone else will get there sooner or later.
    Workaccount24 months ago
    You don't need production level code to make your life easier.
    You're missing the forest for the trees. Most people can't even make a block diagram, but they can explain what they have and what they want to do with it.
    fishmicrowaver4 months ago
    I think the market reveals itself. Perhaps you're right, but it's been years, and where's the value? No offense, and it might seem cool to build an app, but that's been possible for decades.
- gdulli4 months ago
  Not everyone has given in to the crutch.
  - xvector4 months ago
    That's why I still use an abacus.
    gdulli4 months ago
    The abacus skills are safely obsolete, the skills of general thinking and creativity must not become that. This couldn't be more specious.
    Meme thinking like this, repeating something you've heard as reflex without regard to whether it fits a situation, is the exact kind of unoriginality we can't allow to become the default mode of thinking.
    xvector4 months ago
    I am not the one being unoriginal here. You are thinking that AI will obsolete critical thinking, so there's no point developing with it.
    However, in your moral crusade against using AI you are missing the big picture. No one is making you code with AI. But there are many things that you can only build if you use AI as a component.
    The ability to plug a human-like decisionmaker into anything, anywhere massively expands what we can build. There are applications and use cases that you cannot even conceptualize without having the ability to plug AI in. This does not impacting critical thinking whatsoever.
    Be original. Put your engineer hat on and think on what this new tool lets you build, that you couldn't beforehand.
    throw_this_one4 months ago
    I find the AI can make me more creative. I don't have to waste mental energy on boilerplate or straightforward stuff that would take me typing through some event processing loop etc. I can extract out and reuse components easier and focus on big picture design. Or build more bespoke admin tools that I wouldn't have wanted to waste time building some JS stuff before.