dvtan hour ago
I experimented with this a few years ago, and token-level search-and-replace can do wonders here. For example, you instruct the model to, whenever it wants to reference the time, to just say "[time]." Then, you intercept the token stream, and always replace it with the proper time; inject/reconstruct the stream and then continue the output. This is sort-of analogous to tool calling, but with much less overhead.
(Obviously this only works when running local models, as you don't have access to raw token streams in the cloud.)
2 hours ago
undefined