I was reading the blog post about bot detection with browsers. The first layer being the IP address of the browser.
One rather unique scenario I've been trying to work out for a scraper is eliminating network latency. My use of the site is enhanced by the request from the browser having the lowest RTT latency to the webserver as possible. This means being in the same cloud provider.
To do this right now I manually navigate to the site and have a browser extension that clicks at just the right time.
I'd really like to eliminate that manual navigation but every time I've tried adding browser automation outside of the single click from the extension, I'm immediately met with bot detection.
Obviously adding a residential proxy step completely defeats the purpose of the RTT latency optimization.
Do modified browsers drive the overall bot detection heuristic low enough that the cloud IP address itself isn't a red flag? I've seen Camoufox and will try it at some point. What other options are available to drive down the overall "score" so I can still automate the browser but keep the latency low?
Also, one of our engineers did a write up on bot detection systems and how they work - https://intunedhq.com/blog/how-bot-detection-works
On your automation, your tool fed back to me as follows after 3 submissions:
> The CAPTCHA is persistently blocking now — Prosopo's widget appears to have flagged the session/IP due to the repeated submissions. The checkbox won't reset this time. This is expected behavior from their bot protection product. To submit again, you'd likely need to wait a while for the rate limit to clear, or submit manually from your own browser.
I feel that you'll end up being an automation agency (you mentioned UiPath), companies who have the skills and capacity to build, will not need your service. But those who want the full service, you might fill a gap.
I wish you all the best.
Based on your YC page, you went through a couple of pivots over the last years:
- 4 years ago: Intuned - The data assistant for engineering leaders [0]
- 2 years ago: Intuned - The browser automation platform for developers and product teams [1]
- 1 year ago: Intuned Auth Sessions - Build authenticated scrapers and RPA [2]
What was kind of the evolution from YC S22 4 years ago till you arrived at today's launch? How did you find your differentiation in a highly commoditized space? Even within YC, there are many competitors like Firecrawl, Reworkd, BrowserUse, NotteLabs, Browserbase, etc.
Another thing that might interest HN: AI crawlers come with negative side effects for website owners (costs, downtime, etc.), as repeatedly reported here on HN (and experienced myself).
Does Intuned respect robots.txt directives and do you disclose the identity of your crawlers via user-agent header?
[0] https://www.ycombinator.com/launches/Gqr-intuned-the-data-as...
[1]https://www.ycombinator.com/launches/LGE-intuned-the-browser...
[2] https://www.ycombinator.com/launches/Lpq-intuned-auth-sessio...
For your question about how is this different - I think if you dig into those product you will see that our focus is different, many of the companies mentioned are focused on powering agents via APIs, some are focused on enabling users to use AI at runtime, we do feel that our product is somewhat differentiated - the closest one is possibly Reworkd and I would still say the product is somewhat different. Now, the hardest part is actually commenting this with customers and the market in general - and there, we have a lot to figure out!
For robots.txt and user-agents question, we think of ourselves as providing infrastructure and flexibility for our customers to do what they want - we do encourage in our docs that they respect robots.txt but we don't enforce it on a platform level.
Appreciate you taking the time to leave this comment - very thoughtful
Also, imagine that you have a case where you want to scrape 10,000 records from a website, why have AI navigate to every page to do this? why not write the code, run it, and get consistent and fast result? its also predictable, if it messes up, you know what happened and you can trace it to the exact line of code.
We are actually working on open sourcing a plugin that you can use with any coding harness!
Intuned as a platform to deploy browser automation adds a lot - anti-bot detection, jobs, observability and more.
for jobs/durability/obs i have sqlite and had codex generate an ugly but functional dashboard
im just curious to know what intune does that is different
that sounds interesting
I am happy to give you a demo over a call as well
If a customer, doesn't want to use playwright, they don't have to given CDP but we most of our templates use playwright.