Regarding failures - Wool workers are simple gRPC services under the hood, and connections are long-lived HTTP2 connections that persist for the life of the request. Worker-side failures simply manifest as Python exceptions on the client side, with the added nicety of preserving the FULL stack trace across worker boundaries (achieved with tbpickle). A core tenet of Wool is that it makes no assumptions about your workload - I leave it up to you to write a try-catch block and handle exceptions in a manner appropriate to your use case. The goal is to keep Wool as unopinionated about this sort of thing as possible.
I'm not sure about your specific needs, but I'm considering adding a simple CLI-based worker management tool for users that don't want or need a full service orchestrator like Kubernetes in their stack.