25 pointsby molly_radstowe6 hours ago6 comments
  • CqtGLRGcukpy5 hours ago
    They also posted about this on Mastodon / Fedi: https://en.osm.town/@osm_tech/115968544599864782
  • molly_radstowe6 hours ago
    #OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks.
    • Bender6 hours ago
      Looks like it is hosted in Equinix in NL? Or just part of it maybe? Is it behind a load balancer, maybe something like HAProxy? If so were stick tables set up to limit rates by cookie and require people be logged in on unique accounts and limit anonymous access after so many requests? I know limiting anonymous access is not great but that is something that could be enabled when under a high load so that instead of the site going offline for everyone it would just be limited for the anonymous users. Degradation vs critical outage

      On a separate note have tcpdump captures been done on these excessive connections? Minus the IP, what do their SYN packets look like? Minus the IP what do the corresponding log entries look like in the web server? Are they using HTTP/1.1 or HTTP/2.0? Are they missing any expected headers for a real person such as cors, no-cors, navigate, accept_language?

          tcpdump -p --dont-verify-checksums -i any -NNnnvvv -B32768 -c32 -s0 port 443 and 'tcp[13] == 2'
      
      Is there someone at OpenStreetMap that can answer these questions?
      • KomoD4 hours ago
        I think it could be worth trying to block them with TLS fingerprinting, or since they think it's residential proxies they are being hammered by, https://spur.us could be worth a try.
    • direwolf206 hours ago
      More like hammered by Google and Apple so you'll use their apps instead.
      • petre11 minutes ago
        Unlikely. The data is freely available for download from geofabrik and other sources.
      • wiredpancake3 hours ago
        [dead]
  • phillipseamore6 hours ago
    The number of idiotic vibe coded repos I've seen on GH lately that are doing things like crawling OSM for POI data is mindboggling!
  • dzhiurgis2 hours ago
    I'll ask dumb question - if they are "open source" then why they are bothered by it? Is it scraping itself? Are their data not freely available for download?
    • zeeZ2 hours ago
      Their data is freely available to download. There are weekly dumps of the entire planet and several sources for partial data. There's no need for most legitimate use cases to scrape their API.
    • wodenokoto2 hours ago
      Someone has to pay for bandwidth. And that someone would like the bandwidth to go to human users.
  • solaris2007an hour ago
    Make the data available through bit-torrent and IPFS. Redirect IPs that make excessive requests to response only kilobytes in size "use the torrents and IPFS".

    As an SRE, the only legitimate concern here could be the bandwidth costs. But QoS tuning should solve that too.

    Supposedly technical people crying out for a journalist to help them is super lame. Everything about this looks super lame.