1 pointby mx-Liu1235 hours ago2 comments
  • mx-Liu1235 hours ago
    I built AgentCommander to automate the manual "trial-and-error" loops in my PhD Physics/ML research.

    While tools like OpenEvolve (population evolution) and RD-Agent (Kaggle-style automation) exist, I found them difficult to customize for specific, multi-step research workflows. I needed a system that allowed granular control over the agent's decision process—specifically, how it learns from errors and inherits code states.

    AgentCommander solves this by providing:

    Visual Graph Execution: Workflows are defined as directed graphs, allowing for complex loops, conditional branches, and human-in-the-loop checkpoints.

    Evolutionary Tree Tracking: It treats every iteration as a node in a tree. The agent automatically branches off the current "global optimum" rather than a linear history, preventing regression.

    Snapshot Integrity: To prevent LLM hallucination or "cheating" (e.g., modifying test cases), the system uses filesystem snapshots to enforce strict read-only permissions on evaluation logic.

    Native CLI Wrapper: Built on top of Gemini/Qwen CLI to leverage their native tool-use capabilities while enforcing a sandboxed working directory.

    The project is open source (Apache 2.0) and written in Python.

    Repo: https://github.com/mx-Liu123/AgentCommander

  • mx-Liu1235 hours ago
    Author's Note:

    A few technical details for those looking to try AgentCommander:

    Why Gemini/Qwen CLI?: I chose these as backends because they offer robust directory isolation. I tried integrating Claude Code, but found it difficult to restrict its file-system reach. Qwen CLI is a great alternative if you want an OpenAI-compatible API with a generous free tier (2,000 requests/day).

    Environment: Ensure you have Python 3.10+ and the latest Node.js for the Gemini CLI. If you see Node version warnings, please upgrade to the latest LTS to avoid CLI instability.

    Verification: You can audit the agent's "thought process" by running gemini -r inside any generated experiment directory. It’s crucial for verifying that the agent isn't hallucinating its research logic.

    I'm currently in Singapore (SGT). I'll stay online for as long as I can to discuss architecture or implementation details, but I'll catch up on all pending questions first thing in the morning!

    Repo: https://github.com/mx-Liu123/AgentCommander