181 pointsby pember7 hours ago14 comments
  • mark_l_watson4 hours ago
    I am rooting for Mistral with their different approach: not really competing on the largest and advanced models, instead doing custom engineering for customers and generally serving the needs of EU customers.
    • jerrygoyalan hour ago
      their ocr model is goated
      • stavros4 minutes ago
        Better than Qwen? I guess the best overall is Gemini, right?
    • w4yai2 hours ago
      Go Mistral !
  • roxolotl4 hours ago
    Mistral has been releasing some cool stuff. Definitively behind on frontier models but they are working a different angle. Was just talking at work about how hard model training is for a small company so we’d probably never do it. But with tools like this, and the new unsloth release, training feels more in reach.
  • upghost42 minutes ago
    > Pre-training allows organizations to build domain-aware models by learning from large internal datasets.

    > Post-training methods allow teams to refine model behavior for specific tasks and environments.

    How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?

    There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.

    • anon37383911 minutes ago
      I think they are referring to “continued pretraining”.
    • stingraycharles30 minutes ago
      I can imagine that, as usual, you start with a few examples and then instruct an LLM to synthesize more examples out of that, and train using that. Sounds horrible, but actually works fairly well in practice.
  • dmix3 hours ago
    This is definitely the smart path for making $$ in AI. I noticed MongoDB is also going into this market with https://www.voyageai.com/ targeting business RAG applications and offering consulting for company-specific models.
  • csunoser4 hours ago
    Huh. I initially thought this is just another finetuning end point. But apparently they are partnering up with customers on the pretraining side as well. But RL as well? Jeez RL env are really hard to get right. Best wishes I guess.
  • hermit_devan hour ago
    The future of AI is specialization, not just achieving benevolent knowledge as fast as we can at the expense of everything and everyone along the way. I appreciate and applaud this approach. I am looking into a similar product myself. Good stuff.
  • ryeguy_242 hours ago
    How many proprietary use cases truly need pre-training or even fine-tuning as opposed to RAG approach? And at what point does it make sense to pre-train/fine tune? Curious.
    • baby2 hours ago
      RAG is dead
      • charcircuit2 hours ago
        Using tools and skills to retrieve data or files is anything but dead.
      • loeg2 hours ago
        Is it??
      • bigyabai2 hours ago
        In what, X's hype circles? Embeddings are used in production constantly.
      • CharlesW2 hours ago
        And yet your blog says you think NFTs are alive. Curious.

        But seriously, RAG/retrieval is thriving. It'll be part of the mix alongside long context, reranking, and tool-based context assembly for the forseeable future.

        • elicashan hour ago
          I have no interest in anything crypto, but they are making a proposal about NFTs tied to AI (LLMs and verifiable machine learning) so they can make ownership decisions.

          So it'd be alive in the making decisions sense, not in a "the technology is thriving" sense.

        • strongly-typed2 hours ago
          Wait, what does NFTs have to do with RAG?
          • panarky2 hours ago
            I, for one, find NFT-shilling to be a strong signal that I should downgrade my trust in everything else a person says.
          • LoganDark2 hours ago
            Nothing, I think they're just pointing out a seeming lack of awareness of what really is or isn't dead.
  • rorylawless3 hours ago
    The fine tuning endpoint is deprecated according to the API docs. Is this the replacement?

    https://docs.mistral.ai/api/endpoint/deprecated/fine-tuning

    • aavcian hour ago
      Interesting to see. I thought they were promoting fine tuning
  • andai3 hours ago
    They mention pretraining too, which surprises me. I thought that was prohibitively expensive?

    It's feasible for small models but, I thought small models were not reliable for factual information?

    • simslaan hour ago
      Typical stages of training for these models are:

      Foundational:

      - Pretraining - Mid/post-training (SFT) - RLHF or alignment post-training (RL)

      And sometimes...

      - Some more customer-specific fine-tuning.

      Note that any supervised fine-tuning following the Pretraining stage is just swapping the dataset and maybe tweaking some of the optimiser settings. Presumably they're talking about this kind of pre-RL fine-tuning instead of post-RL fine-tuning, and not about swapping out the Pretraining stage entirely.

  • aavcian hour ago
    How does this compare to fine tuning?
  • bsjshshsb3 hours ago
    Id training or FT > context? Anyone have experience.

    Is it possible to retrain daily or hourly as info changes?

  • codance2 hours ago
    [dead]
  • shablulman6 hours ago
    [dead]
  • gpubridgean hour ago
    [flagged]