1 pointby ashmil2 hours ago1 comment
  • ashmil2 hours ago
    Hi HN,

    I was spending over 5 hours manually testing my Agentic AI application before every patch and release. While automating my API and backend tests was straightforward, testing the actual chat UI was a massive bottleneck. I had to sit there, type out prompts, wait for the AI to respond, read the output, and ask follow-up questions. As the app grew, releases started taking longer just because of manual QA.

    To solve this, I built Mantis. It’s an automated UI testing tool designed specifically to evaluate LLM and Agentic AI applications right from the browser.

    Here is how it works under the hood:

    Define Cases: You define the use cases and specific test cases you want to evaluate for your LLM app.

    Browser Automation: A Chrome agent takes control of your application's UI in a tab.

    Execution: It simulates a real user by typing the test questions into the chat UI and clicking send.

    Evaluation: It waits for the response, analyzes the LLM's output, and can even ask context-aware follow-up questions if the test case requires it.

    Reporting: Once a sequence is complete, it moves to the next test case. Everything is logged and aggregated into a dashboard report.

    The biggest win for me is that I can now just kick off a test run in a background Chrome tab and get back to writing code while Mantis handles the tedious chat testing.

    I’d love to hear your thoughts. How are you all handling end-to-end UI testing for your chat apps and AI agents? Any feedback or questions on the approach are welcome!