2 pointsby fredmendoza3 hours ago1 comment
  • fredmendoza3 hours ago
    We put all 4 Gemma 4 models in one Telegram bot. Text it, send voice memos, send docs, send photos. Switch between the 2B and the 31B (ranked #3 worldwide) mid-conversation with a slash command.

    Each model runs its own script on its own hardware. The 2B doesn't burn A100 hours, the 31B doesn't get squeezed onto a tiny card. We keep everything in FP16 for full quality, but if you go INT4 you can run these on CPUs for basically nothing.

    The Telegram thread persists your conversation and we force that context into the prompt every call. So when you come back 2 days later it actually remembers who you are and what you were talking about. No vector database, no fancy memory system. Just the chat doing what chat already does.

    Hardware spins up when you message, shuts down when done. No idle cost. The 31B costs about a penny per message. The 2B costs basically nothing.

    We built this on SeqPU mainly to show how fast you can go from "new model just dropped" to "anyone can text it and try it." Idea to shareable product in 10 minutes. Works with any model, open source or API.

    Try it: t.me/OpenGemma4Bot (grab a free key at seqpu.com)

    Full writeup: https://seqpu.com/UseGemma4In60Seconds

    Our Stabe At Safe Agent Systems: https://seqpu.com/Encapsulated-Agentics