6 pointsby bear_with_me2 hours ago2 comments
  • dvtan hour ago
    I experimented with this a few years ago, and token-level search-and-replace can do wonders here. For example, you instruct the model to, whenever it wants to reference the time, to just say "[time]." Then, you intercept the token stream, and always replace it with the proper time; inject/reconstruct the stream and then continue the output. This is sort-of analogous to tool calling, but with much less overhead.

    (Obviously this only works when running local models, as you don't have access to raw token streams in the cloud.)

  • 2 hours ago
    undefined