71 pointsby ikessler9 hours ago7 comments
  • dabrez2 minutes ago
    I have this written a a project I will attempt to do in the future, I also call it "weapons grade unemployment" in the notes I was proposing to use granite but the principle still stands. You beat me to it.
  • avaer6 hours ago
    There's also the Prompt API, currently in Origin Trial, which supports this api surface for sites:

    https://developer.chrome.com/docs/ai/prompt-api

    I just checked the stats:

      Model Name: v3Nano
      Version: 2025.06.30.1229
      Backend Type: GPU (highest quality)
      Folder size: 4,072.13 MiB
    
    Different use case but a similar approach.

    I expect that at some point this will become a native web feature, but not anytime soon, since the model download is many multiples the size of the browser itself. Maybe at some point these APIs could use LLMs built into the OS, like we do for graphics drivers.

    • veunes2 hours ago
      That’s exactly where we’re headed. Architecturally it makes zero sense to spin up an LLM in every app's userspace. Since we have dedicated NPUs and GPUs now, we need a unified system-level orchestrator to balance inference queues across different programs - exactly how the OS handles access to the NIC or the audio stack. The browser should just be making an IPC call to the system instead of hauling its own heavy inference engine along for the ride
    • sheept3 hours ago
      The Summarizer API is already shipped, and any website can use it to quietly trigger a 2 GB download by simply calling

          Summarizer.create()
      
      (requires user activation)
    • oyebenny3 hours ago
      Interesting!
  • veunes2 hours ago
    It’s a neat idea, but giving a 2B model full JS execution privileges on a live page is a bit sketchy from a security standpoint. Plus, why tie inference to the browser lifecycle at all? If Chrome crashes or the tab gets discarded, your agent's state is just gone. A local background daemon with a "dumb" extension client seems way more predictable and robust fwiw
    • shawabawa323 minutes ago
      > but giving a 2B model full JS execution privileges on a live page is a bit sketchy from a security standpoint.

      Every webpage I've ever visited has full JS execution privileges and I trust half of them less than an LLM

    • jillesvangurp2 hours ago
      There's indexed db, opfs, etc. Plenty of ways to store stuff in a browser that will survive your browser restarting. Background daemons don't work unless you install and start them yourself. That's a lot of installation friction. The whole point of a browser app is that you don't have to install stuff.

      And what you call sketchy is what billions of people default to every day when they use web applications.

  • emregucerr5 hours ago
    I would love to see someone build it as some kind of an SDK. App builders could use it as a local LLM plugin when dealing with data involving sensitive information.

    It's usually too much when an app asks someone to setup a local LLM but this I believe could solve that problem?

    • jillesvangurpan hour ago
      It's not too hard to code together with an LLM. I've been playing with small embeddings models in browsers in the last weeks. You don't really need that much. The limitation is that these things are fairly limited and slow to begin with and they run slower in a browser even with webgpu. But you can do some cool stuff. Adding an LLM is just more of the same.

      If you want to see an example of this, https://querylight.tryformation.com/ is where I put my search library and demo. It does vector search in the browser.

    • winstonp4 hours ago
      Which apps have you seen ask for someone to setup a local LLM? Can't recall having ever seen one
  • eric_khun2 hours ago
    it would be awesome if a local model would be directly embeded to chrome and developer could query them.

    Anyone know if this is somehow possible without going through an extension?

  • montroser5 hours ago
    Not sure if I actually want this (pretty sure I don't) -- but very cool that such a thing is now possible...
  • Morpheus_Matrix5 hours ago
    [flagged]