I Reduced 5 hours of Testing my Agentic AI applcaition to 10 mins(github.com)

1 pointby ashmil2 hours ago1 comment

ashmil2 hours ago
Hi HN,
I was spending over 5 hours manually testing my Agentic AI application before every patch and release. While automating my API and backend tests was straightforward, testing the actual chat UI was a massive bottleneck. I had to sit there, type out prompts, wait for the AI to respond, read the output, and ask follow-up questions. As the app grew, releases started taking longer just because of manual QA.
To solve this, I built Mantis. It’s an automated UI testing tool designed specifically to evaluate LLM and Agentic AI applications right from the browser.
Here is how it works under the hood:
Define Cases: You define the use cases and specific test cases you want to evaluate for your LLM app.
Browser Automation: A Chrome agent takes control of your application's UI in a tab.
Execution: It simulates a real user by typing the test questions into the chat UI and clicking send.
Evaluation: It waits for the response, analyzes the LLM's output, and can even ask context-aware follow-up questions if the test case requires it.
Reporting: Once a sequence is complete, it moves to the next test case. Everything is logged and aggregated into a dashboard report.
The biggest win for me is that I can now just kick off a test run in a background Chrome tab and get back to writing code while Mantis handles the tedious chat testing.
I’d love to hear your thoughts. How are you all handling end-to-end UI testing for your chat apps and AI agents? Any feedback or questions on the approach are welcome!