6 pointsby kfallah8 hours ago3 comments
  • kfallah8 hours ago
    CLaaS is an open-source system that uses self-distillation to move feedback from context into model weights. Current approaches rely on system prompts and memory to personalize your model, but every token spent reminding is a token your model can't use for the actual task. Instead, with every piece of feedback, CLaaS triggers a weight update while avoiding the catastrophic forgetting you get with standard fine-tuning. The updated LoRA adapter hot-reloads into vLLM, so your next response comes from a better model.

    Right now it runs on a single consumer GPU (tested on RTX 5090) with Qwen3-8B. Easy to set up with Docker Compose alongside a locally hosted OpenClaw, but the API works with any local model.

  • zerocks25033 hours ago
    That's sick
  • matiszz4 hours ago
    Cool project!