- batch-first instead of real-time
- scoring layer to filter out bad outputs and retry for quality
- simple heartbeat-based scheduling so jobs recover if a node dies
- 4-bit quantization to get models like FLUX.1 onto 8GB cards
Right now it’s only a PoC focused on image generation, but for the long term, I’m interested in whether something like a "scheduler for home GPUs" could actually work for broader models (LLMs, etc). Curious how people think about this tradeoff. Would you use something slower but cheaper for background jobs, or is low latency still non-negotiable? Would love to hear if this "batch + filtered" approach solves a real pain point for you.
Link: https://runfra.com/