1 pointby minotauris2 hours ago1 comment
  • minotauris2 hours ago
    Let’s be real: writing a novel with an LLM is a nightmare.

    You start with a great premise. The first 3 chapters are fire. Then, around Chapter 7, the AI forgets your protagonist has a scar on their left eye. By Chapter 12, the villain’s motivation has completely flipped, and the tone shifts from “Dark Fantasy” to “YA Romance.”

    The problem isn’t the model’s intelligence. It’s the Context Window Trap.

    We treat LLMs like chat bots. We feed them the last 20k tokens and hope they “get it.” But that’s not how stories work. Stories have State.

    Inventory: Does the hero have the key? (True/False) Relationships: Is X angry at Y? (0–100 scale) Timeline: Did event A happen before event B? If you were coding, you wouldn’t ask an LLM to “guess” the value of a variable based on the last 500 lines of comments. You would just read the variable.

    So, I decided to build an IDE (Integrated Development Environment) for writers. We call it Minotauris.

    The Architecture: “Cursor” for Prose I realized that to fix long-form generation, we needed to separate the Creative Agent (the writer) from the State Manager (the librarian).

    Most tools (Jasper, Sudowrite, ChatGPT) combine these. They try to be creative and consistent in the same pass. That’s too much cognitive load for one inference.

    Our Solution: Parallel Agentic Workflow

    We built a system where every time you write (or the AI generates), a secondary “Watcher Agent” runs in the background.

    The User Input: You type a scene.

    The Watcher (Parallel): This agent analyzes the new text for factual assertions.

    “John picked up the rusty sword.” -> Inventory Update: John +1 Rusty Sword “Sarah glared at him, still furious about the betrayal.” -> Relationship Update: Sarah -> John (Status: Hostile) The Diff: The system compares this new state against the Global Story Map. If there’s a conflict (e.g., John already lost his hands in Chapter 4), it flags it immediately.

    Why Speed is Everything (Enter Minimax 2.5) This architecture sounds great, but it requires 2–3 inferences for every single user turn.

    Inference 1: Generate text. Inference 2: Extract entities. Inference 3: Check consistency. If we used GPT-4o, this would cost a fortune and take 5 seconds per message. That breaks the flow state.

    We switched to Minimax 2.5.

    The latency is effectively zero. We are seeing sub-200ms response times for the “Watcher” agents. This allows us to update the Live Story Map in real-time on the sidebar without the user ever feeling a lag.

    The Feature: The Live Story Map Instead of a static “World Bible” that you have to manually update (which no writer does), Minotauris builds the wiki for you.

    It tracks antagonists. It tracks plot threads. It tracks character physical traits. If you try to write a scene that contradicts the map, the AI nudges you. It’s like a compiler error, but for plot holes.

    The “Unlimited” Problem We are currently testing a unique rate-limit approach. Because Minimax and our Llama 8b (via Groq) implementation are so efficient, we can offer “True Unlimited” tiers for pro users. We don’t want to punish you for being in the flow state.

    We’re experimenting with a high-volume daily bucket rather than an hourly cap, so if you want to binge-write for 8 hours on a Saturday, you can.

    What’s Next? We are currently in private beta with a small waitlist. We aren’t trying to scale to 1M users overnight. We are looking for power users — writers who are tired of “chatting” with their stories and want to “build” them.

    If that’s you, grab a spot.

    https://www.minotauris.app/waitlist