2 pointsby trashhalo4 hours ago2 comments
  • trashhalo4 hours ago
    I built Bosun to address a specific issue: an agent’s memory expands as a knowledge graph, and without a mechanism to evaluate each new edge—whether it’s supported, non-redundant, or still true—the graph deteriorates into noise that overwhelms the model attempting to read it. Since no cost-effective method existed for that step, we developed one.

    It’s a LoRA fine-tune of Qwen3-Reranker (0.6B and a 4B). You provide it with an instruction and two findings; it returns sigmoid(logit_yes − logit_no) ∈ [0,1]. “Warranted” isn’t a fixed rule, so you program it per graph with a sentence, and it generalizes to rules it hasn’t been trained on. The same structure applies to RAG filtering, deduplication, and moderation—the graph is just where it initially posed a problem for us.

    We also open-sourced WarrantBench for evaluation, as FollowIR only includes relevance instructions. An honest limitation is that every rule it judges today is symmetric—it doesn’t handle direction (“A causes B”) yet; that’s our next step.

    Weights: https://huggingface.co/Hanno-Labs/bosun-xs · Bosun-4B: https://huggingface.co/Hanno-Labs/bosun-4b · WarrantBench: https://github.com/Hanno-Labs/warrantbench · writeup: https://hannolabs.ai/field-notes/introducing-bosun. Feel free to ask any questions.

  • fjwood694 hours ago
    [flagged]