Show HN: Tessera – An open protocol for AI-to-AI knowledge transfer(github.com)

3 pointsby kirkmaddocks2 hours ago1 comment

0xecro12 hours ago
Interesting approach. I work in embedded Linux/edge AI where we constantly struggle to move knowledge from large training models down to quantized INT8 models on constrained hardware (ARM Cortex-A class). Have you tested transfer to quantized or pruned targets? If the behavioural encoding survives that compression, this could be a much cleaner path than classical distillation for on-device deployment.
- kirkmaddocks2 hours ago
  We haven't built quantisation-aware transfer yet, but the architecture lends itself to it better than you might expect.
  Mode A (activation transfer) operates at the representation level, not the parameter level. The source model's knowledge gets projected into a 2048-dim hub space — the receiving model doesn't need to match architecturally or in precision. A 200M FP32 training model and a 5M INT8 edge model can both have UHS encoders/decoders. The hub space is agnostic to what's underneath.
  Mode B (behavioural) is probably the most interesting path for your use case. It transfers decision boundaries rather than activations or weights. If the quantised model can reproduce the input-output mapping, internal precision is irrelevant.
  It's similar in spirit to distillation but decoupled through the hub space — teacher and student don't need to be online simultaneously, and you get a full audit trail of what knowledge went where (which matters if you're shipping medical/industrial edge models under EU AI Act).
  The gap today is the decoder side. DecoderMLP outputs FP32. We'd need a quantisation-aware variant that respects the INT8 grid — straight-through estimator at minimum, learned quantisation boundaries ideally. We'd also want empirical drift characterisation across FP32→FP16→INT8→INT4 so you'd know your expected fidelity floor for a given target.
  The swarm angle is where it gets genuinely useful for edge fleets. If you've got N devices training locally on-site data, they contribute quantised-model tokens back to a full-precision aggregator. The robust aggregation strategy (Huber-style cosine clipping) handles quantisation noise across heterogeneous devices naturally.
  We're planning a quantisation-aware transfer module next. If you're interested in testing against real Cortex-A INT8 workloads, we'd welcome the collaboration — repo is at github.com/incocreativedev/tessera-core.