Show HN: Pocache, preemptive optimistic caching for Go(github.com)

97 pointsby bnkamalesh9 months ago6 comments

serialx9 months ago
PSA: You can also use singleflight[1] to solve the problem. This prevents the thundering herd problem. Pocache is an interesting/alternative way to solve thundering herd indeed!
[1]: https://pkg.go.dev/golang.org/x/sync/singleflight
- kbolino9 months ago
  I'm confused by the decision in DoChan to return a channel (instead of accepting one supplied by the caller) and then, given that, also not to close that channel (is something else going to be sent to the channel in the future?). Both seem like strange/unnecessary design decisions.
  - neild9 months ago
    Returning a channel avoids questions of what happens if sending to a caller-supplied channel blocks. DoChan returns a channel with a single-element buffer, so a single send to the channel will always succeed without blocking, even if the caller has lost interest in the result and discarded the channel.
    DoChan doesn't close the channel because there isn't any reason to do so.
    kbolino9 months ago
    A non-blocking send would work just as well for that issue, is a standard part of the language, and would support user-supplied channels, but it would still be at risk of panicking when sending to a closed channel. I think there ought to be a safe way to send to a closed channel, but the language authors disagree, so that's not really on the library authors (though they could still recover from the panic).
    However, not closing the channel you specifically chose to control all sending to is just lazy/rude. Even though the caller should receive from the channel once and then forget about it, closing the channel after sending would prevent incorrect subsequent receives from hanging forever.
    All this having been said, contributing to these libraries seems better than complaining about them, but I don't know how the golang.org/x stuff is maintained; looks like this one is here: https://github.com/golang/sync
    dlock179 months ago
    Closing the channel is pointless. I don't understand why people get obsessive about closing channels.
    It's not needed by the garbage collector, it's not good practice. It's explicitly called out in the official go guide as unnecessary most of the time. [0]
    If you have a channel that is only used a single time and then discarded, closing it is literally just wasting CPU cycles. And definitely not "lazy/rude".
    [0] https://go.dev/tour/concurrency/4
    kbolino9 months ago
    I illustrated why closing the channel is beneficial: the consumer of the channel may not be using it properly. Reading the unclosed channel more than once will hang. A stuck goroutine is rarely desirable. The cost of closing a channel is similar to the cost of bounds checking; it may not be free, but it's usually worth it. Agreed that this has no benefit to the garbage collector. I also think this is a pretty clear example of when you should close a channel, as pointed out by the Tour: to inform the consumer that no more values will ever be forthcoming.
    neild9 months ago
    A non-blocking send doesn't work in this case. Consider: User provides DoChan an unbuffered channel, and then reads a value from it. If the send is nonblocking and occurs before the user reads from the channel, the value is lost.
- bnkamalesh9 months ago
  thank you for the recommendation, was a good read as well. I could even use it to replace how I'm handling the call suppression/debounce mechanism. Though I think Pocache does 1 extra thing, which is to keep the cache updated before it expires, i.e. for keys which are frequently fetched it'd serve up to date data always from the cache. If we only relied on call suppression, then the concurrent requests would just have to wait during the update stage, or the read-through mechanism would keep hitting the main database.
- bnkamalesh9 months ago
  interesting, thanks for that. I'll check it out
bww9 months ago
You may be interested in Groupcache's method for filling caches, it solves the same problem that I believe this project is aimed at.
Groupcache has a similar goal of limiting the number of fetches required to fill a cache key to one—regardless of the number of concurrent requests for that key—but it doesn't try to speculatively fetch data, it just coordinates fetching so that all the routines attempting to query the same key make one fetch between them and share the same result.
https://github.com/golang/groupcache?tab=readme-ov-file#load...
- Thaxll9 months ago
  It's using singleflight which was later on added to the Go std lib:
  https://github.com/golang/groupcache/tree/master/singlefligh...
  https://pkg.go.dev/golang.org/x/sync/singleflight
  - ecnahc5159 months ago
    Just a note that `x/sync` is not part of the Go std lib.
- iudqnolq9 months ago
  Is groupcache suitable for current use? I don't see commits in years and the issues have reports of panics due to bugs.
  - stock_toaster9 months ago
    Indeed. It also looks like there is a maintained fork[1], but no clue with regards to the quality.
    [1]: https://github.com/golang/groupcache/issues/158#issuecomment...
  - derekperkins9 months ago
    Another maintained fork https://github.com/mailgun/groupcache
- bnkamalesh9 months ago
  hey thank you for sharing this. Based on what I understand, this package focuses on the underlying storage mechanism itself rather than helping with the cache strategy. It seems like a solid storage extension which can be used Pocache!
indulona9 months ago
I have implemented my own SIEVE cache, with TTL support. It solves all these issues and requires no background workers. Author, or anyone else interested in this, should read the SIEVE paper/website and implement their own.
- bnkamalesh9 months ago
  that was an interesting read (https://junchengyang.com/publication/nsdi24-SIEVE.pdf), thanks for the recommendation. It looks like a good fit for replacing the underlying storage mechanism of Pocache, instead of the LRU. Though I do not think it addresses the thundering herd problem, where the underlying database would be flooded with calls when the cache expires. I think Pocache is focusing more on the caching strategy itself rather than the cache eviction or storage mechanisms. Hence the store is configurable for Pocache.
  - indulona9 months ago
    SIEVE is merely eviction strategy.if you need to put cache above the database, that has little to do with the cache itself. thundering herd is again different thing altogether and can be easily mitigated by simple queue/custom logic. usually none of these things belong into cache but next to it. meaning, you "get" value and if there is no entry, you proceed to fetch it from wherever you need and set it. to avoid concurrent fetching, you synchronize your code with mutex or queue or whatever. all these things belong into your code, not the cache.
- latchkey9 months ago
  https://github.com/opencoff/go-sieve
  https://github.com/scalalang2/golang-fifo
- bnkamalesh9 months ago
  interesting, will check out
pluto_modadic9 months ago
so.... if I initially got the key "foo" at time T=00:00:00, this library would re-query the backing system until time T=00:00:60? even if I requery it at T=:01? vs... being a write-through cache? I guess you're expecting other entries in the DB to go around the cache and update behind your back.
if you are on that threshold window, why not a period where the stale period is okay? T0-60 seconds, use the first query (don't retrigger a query) T60-120 seconds, use the first query but trigger a single DB query and use the new result. repeat until the key is stale for 600 seconds.
that is, a minimum of 2 queries (the first preemptive one at 60 seconds, (in the cache for 10 minutes total)
and a maximum of 11 queries (over 10 minutes) (the initial one that entered the key, and if people ask for it once a minute, a preemptive one at the end of those minutes, for 20 minutes total in the cache).
- zimpenfish9 months ago
  > if I initially got the key "foo" at time T=00:00:00, this library would re-query the backing system until time T=00:00:60? even if I requery it at T=:01?
  From what I understood of the README (10 minute expiry, 1 minute window) only cache requests between 09:00 to 09:59 will trigger a pre-emptive backing fetch.
  ie. T0-539 seconds uses the first query (no re-fetch), T540-599 does a pre-emptive re-fetch (as long as no-one else is currently doing that), T600- would do a fetch and start the whole timer again.
  - bnkamalesh9 months ago
    @zimpenfish yes you are right. refetch is initiated on the first Get between 9-10mins, and the timer is reset as soon as the back fetch is successful
    NovaX9 months ago
    One optimization for background refresh is coalescing the individual reloads into a batch operation based on a time/space window. Here is how we do it in the Java world. [1]
    [1] https://github.com/ben-manes/caffeine/tree/master/examples/c...
    mh-9 months ago
    Thank you for your OSS work! I used Caffeine many years ago.
    NovaX9 months ago
    Oh thank you, I’m glad it’s been helpful.
    bnkamalesh9 months ago
    aha yes! It's in my todo list to introduce bulk updates. On the other hand, I'll be publishing a batcher package soon which does something very close to what you suggested here. Thank you
tbiehn9 months ago
Interesting idea - do you handle ‘dead keys’ as well? Let’s say you optimistically re-fetch a few times, but no client re-requests?
- bnkamalesh9 months ago
  since the underlying storage is an LRU, I just ignored dead keys. Nope there's no client re-request or retries. That is left upto the "updater" function
sakshamhhf9 months ago
Not work
- bnkamalesh9 months ago
  ??