the perf differences are kinda wild though, didn’t expect gaps that big
would be nice to see a bit more detail on the setup, feels like that could change the results a lot
overall looks like a solid start, curious how it evolves
Part of the problem with timing variety is frameworks not always picking the right gpu/backend.
If you want to inspect or tweak the setup, be my guest at https://github.com/kvark/inferena