3 pointsby tornikeo10 days ago2 comments

jackfranklyn10 days ago
Playwright + something like Claude with computer use is probably closest to what you're describing. Though I'd push back a bit on the approach.
The flaky test problem usually comes from either race conditions (waiting for wrong things) or environmental differences. Adding AI vision on top often adds another layer of flakiness - now you're debugging "why did the model misread this button" on top of "why did the test timeout."
For mocking external services specifically - tools like MailHog (email) or mock OAuth providers tend to be more reliable than screenshot-based approaches. The determinism matters.
That said, if you genuinely need to test against production-like visual state - Playwright's screenshot comparison (toHaveScreenshot) combined with proper wait strategies has gotten pretty solid. The visual regression approach catches layout bugs that functional tests miss.
solusmukess10 days ago
i think you are looking something like seekbytes.com please let me your interest. As i am the owner of this work.