2 pointsby trashhalo4 hours ago2 comments

trashhalo4 hours ago
I built Bosun to address a specific issue: an agent’s memory expands as a knowledge graph, and without a mechanism to evaluate each new edge—whether it’s supported, non-redundant, or still true—the graph deteriorates into noise that overwhelms the model attempting to read it. Since no cost-effective method existed for that step, we developed one.
It’s a LoRA fine-tune of Qwen3-Reranker (0.6B and a 4B). You provide it with an instruction and two findings; it returns sigmoid(logit_yes − logit_no) ∈ [0,1]. “Warranted” isn’t a fixed rule, so you program it per graph with a sentence, and it generalizes to rules it hasn’t been trained on. The same structure applies to RAG filtering, deduplication, and moderation—the graph is just where it initially posed a problem for us.
We also open-sourced WarrantBench for evaluation, as FollowIR only includes relevance instructions. An honest limitation is that every rule it judges today is symmetric—it doesn’t handle direction (“A causes B”) yet; that’s our next step.
Weights: https://huggingface.co/Hanno-Labs/bosun-xs · Bosun-4B: https://huggingface.co/Hanno-Labs/bosun-4b · WarrantBench: https://github.com/Hanno-Labs/warrantbench · writeup: https://hannolabs.ai/field-notes/introducing-bosun. Feel free to ask any questions.
fjwood694 hours ago
[flagged]