It’s a LoRA fine-tune of Qwen3-Reranker (0.6B and a 4B). You provide it with an instruction and two findings; it returns sigmoid(logit_yes − logit_no) ∈ [0,1]. “Warranted” isn’t a fixed rule, so you program it per graph with a sentence, and it generalizes to rules it hasn’t been trained on. The same structure applies to RAG filtering, deduplication, and moderation—the graph is just where it initially posed a problem for us.
We also open-sourced WarrantBench for evaluation, as FollowIR only includes relevance instructions. An honest limitation is that every rule it judges today is symmetric—it doesn’t handle direction (“A causes B”) yet; that’s our next step.
Weights: https://huggingface.co/Hanno-Labs/bosun-xs · Bosun-4B: https://huggingface.co/Hanno-Labs/bosun-4b · WarrantBench: https://github.com/Hanno-Labs/warrantbench · writeup: https://hannolabs.ai/field-notes/introducing-bosun. Feel free to ask any questions.