The options how to find it are basically limitless. Best source is probably Certificate Transparency project as others suggested. But it does not end there, some other things that we do are things like internet crawl, domain bruteforcing on wildcard dns, dangling vhosts identification, default certs on servers (connect to IP on 443 and get default cert) and many others.
Security by obscurity does not work. You can not rely on "people won't find it". Once it's online, everyone can find it. No matter how you hide it.
Use it as the last thing to do, not the 1st. If I run SSH on say 42531 it will be found, absolutely.... But 99%+ of automated scans will never see it: benefit me. But that is after all the sshd_config, PAM stuff, patching, misc hardening, etc is done 1st.
That's a worn out example, and just a point (I run on 22)... The benefit was to me that most skiddy scanners will never see it, and if I avoid the one actor out there that's looking to mass exploit an unpublished 0day, then if it's the last thing I did, I may have bought some extra time, cause they're going for 22.
Google had a "gotcha" moment when Microsoft responded basically with "yeah we didn't steal it from Google, you had telemetry enabled"
Total shitshow
Dozens of others will also find it.
Really, it's this simple today.
Finding all things about domains is one of the things that we do. And yes, it's very easy.
There are many services like subdomainfinder - i.e. dnsdumpster and merklemap. We built our own as well on https://search.reconwave.com/. But it's a side project and it does not pay our bills.
Especially do not name your domainnames in a way that leaks MNPI! Like, imagine if publicly traded companies A and B were discussing a merger or acquisition, do not name your domainname A-and-B.com, m'kay?
I don’t recall if anybody noticed before they went public, but as this thread shows, today it would be noticed for sure.
Disable direct IP access. Use wildcard certificates. Don't use guessable subdomains like www or mail.
I hope the last bit is not leaked somehow (?)
Btw, we need a "falsehoods programmers believe about URLs" ...
Although there is: https://www.netmeister.org/blog/urls.html
I think the section named "Pathname" is wrong. It describes the path of an URL as if every server was Apache serving static files with its default configuration. It should describe how the path is converted into a HTTP request.
For instance, the article states that "all of these go to the same place : https://example.org https://example.org/ https://example.org// https://example.org//////////////////". That's wrong. A web client send a distinct HTTP request for each case, e.g starting with `GET // HTTP/1.1`. So the server will receive distinct paths. The assertion of "going to the same place" makes no sense in the general case.
I.e.: if the host is listening only to some specific host header but registered with a wildcard prefix, then drive-by attackers have no trivial way to guess the prefix.
I would never rely on this for security, but it does help cut down on the “spam” in the request logs so that I can focus on the real errors.
This works best for API endpoints not used by browsers or embedded into web pages.
It’s also my current preferred setup for Internet-facing non-production sites. Otherwise they get so much attack traffic that the real log entries might be less than 0.1% of the total.
We built global reverse-DNS dataset solely from cert transparency logs. Our active scanning/bruteforcing runs only for assets owned by our customers.
In what way is what he’s describing not obscurity?
1. Encrypted data is not hidden. You still know that there is data, it's just in a form that you can't understand. Just as difficult higher-level math isn't "obscured" from a non-mathematician (who knows that it is math, but can't decode it), encrypted data is not obscured.
2. You could make the argument that the data is actually hidden, but the fact that data is there is not hidden. This is pointless pedantry, though. It is both contrary to the way that everybody uses the word and stretches the meaning of the word to the point that it's not useful. There is a common understanding of what "Security through obscurity" means ( https://en.wikipedia.org/wiki/Security_through_obscurity ) and interpreting it far beyond that is not useful. It simply breaks down communication into annoying semantic arguments. I enjoy semantic arguments, but not tedious, pedantic ones where one person just argues that a word isn't what everybody understands it to mean.
More specifically, it's about WHAT is being obscured. "Security through obscurity" is about trying to be secure by keeping the details or mechanisms of a system secret, not the data itself.
Port knocking isn't, I don't think.
But the phrase “security through obscurity” is an industry term that refers to keeping things secure purely by not letting people know they exist.
In contrast with encryption, where I can tell you exactly where the encrypted data is, but you can’t access it.
Security through obscurity is hiding a bicycle in a bush and hoping no one notices it, encryption is more like locking it to a bike rack with a very good lock.
The opposite of "bad security through obscurity" is using completely public and standard mechanisms/protocols/algorithms such as TLS, PGP or pin tumbler locks. The security then comes from the keys and other secrets, which are chosen from the space permitted by the mechanism with sufficient entropy or other desirable properties.
The line is drawn between obscuring the mechanism, which is designed to have measurable security properties (cryptographic strength, enumeration prevention, lock security pins), and obscuring the keys that are essentially just random hidden information.
Obscuring the mechanism provides some security as well, sure, but a public mechanism can be publicly verified to provide security based only on secret keys.
> To make so confused or opaque as to be difficult to perceive or understand
https://www.thefreedictionary.com/obfuscate
obscuring data is different, it’s about hiding it from view or minimising the likelihood of it being found.
> To make dim, indistinct, or impossible to see
https://www.thefreedictionary.com/obscure
they are two wholly different actions.
—
> Tiered access controls obscure who can do what in the system.
i’ve seen plenty of examples where an access control system explicitly says what role/tier is required. access control is for “trust” management (who do we trust with what).
> Encryption obscures data.
I don't think you understand what "security through obscurity" means. What encryption does is literally the opposite of obscure, in this context. It is out in the open and documented. And the same with the rest of your examples.
Which basically means it was always a shit saying, like most fancy quips were.
This is one of those false voyeur OS internet tennets designed to get people to publish their stuff.
Obscurity is a fine strategy, if you don't post your source that's good. If you post your source, that's a risk.
The fact that you can't rely on that security measure is just a basic security tennet that applies to everything: don't rely on a single security measure, use redundant barriers.
Truth is we don't know how the subdomain got leaked. Subdomains can be passwords and a well crafted subdomain should not leak, if it leaks there is a reason.
I disagree. A subdomain is not secret in any way. There are many ways in which it is transmitted unencrypted. A couple:
- DNS resolution, multiple resolvers and authoritative servers - TLS SNI - HTTP Host Header
There are many middle boxes that could perform safety checks on behalf of the client, and drop it into a list to be rescanned.
- Virus Scanners - Firewalls - Proxies
Agree.
But who said that all passwords or shiboleths should all be encrypted in transit?
It can serve as a canary for someone snooping your traffic. Even if you encrypt it, you don't want people snooping.
To date of my subdomains that I never publish, I haven't had anyone attempting to connect with them.
It's one of those redundant measures.
And it's also one of those risks that you take, you can maximize security by staying at home all day, but going out to take the trash is a calculated risk that you must take or risk overfocusing on security.
It's similar to port knocking. If you are encrypting it, it's counterproductive, it's a low effort finishing touch, like a nice knot.
> Subdomains can be passwords and a well crafted subdomain should not leak, if it leaks there is a reason.
The problem with this theory is that DNS was never designed to be secret and private and even after DNS over HTTPS it's still not designed to be private for the servers. This means that getting to "well crafted" is an incredibly difficult task with hundreds of possible failure modes which need constant maintenance and attention—not only is it complicated to get right the first time, you have to reconfigure away the failure modes on every device or even on every use of the "password".
Here are just a few failure modes I can think of off the top of my head. Yes, these have mitigations, but it's a game of whack-a-mole and you really don't want to try it:
* Certificate transparency logs, as mentioned.
* A user of your "password" forgets that they didn't configure DNS over HTTPS on a new device and leaves a trail of logs through a dozen recursive DNS servers and ISPs.
* A user has DNS over HTTPS but doesn't point it at a server within your control. One foreign server having the password is better than dozens and their ISPs, but you don't have any control over that default DNS server nor how many different servers your clients will attempt to use.
* Browser history.
Just don't. Work with the grain, assume the subdomain is public and secure your site accordingly.
Something many people don't expect is that the IPv6 space is also tiny and trivial to scan, if you follow certain patterns.
For example, many server hosts give you a /48 or /64 subnet, and your server is at your prefix::1 by default. If they have a /24 and they give you a /48, someone only has to scan 2^24 addresses at that host to find all the ones using prefix::1.
AWS only allows routing /80 to EC2 instances making a huge difference.
It doesn't mean that we should rely on obscurity, but the entire space is not tiny as IPv4 was.
But if you just choose a random address you would enjoy a bit more immunity from brute force scanners here.
Just try leaving a User Talk page message on Wikipedia, and good luck if the editor even notices, or anyone finds that talk page again, before the MediaWiki privacy measures are implemented.
> Subdomains can be passwords and a well crafted subdomain should not leak
Your comment is really odd to read I'm not sure I understand you, but I'm sure you don't mean it like that. Just to re-iterate the important points:
1. Do not rely on subdomains for security, subdomains can easily leak in innumerable ways including in ways outside of your control.
2. Security by obscurity must never be relied on for security but can be part of a larger defense in depth strategy.
---
https://cwe.mitre.org/data/definitions/656.html
> This reliance on "security through obscurity" can produce resultant weaknesses if an attacker is able to reverse engineer the inner workings of the mechanism. Note that obscurity can be one small part of defense in depth, since it can create more work for an attacker; however, it is a significant risk if used as the primary means of protection.
"The product uses a protection mechanism whose strength depends heavily on its obscurity, such that knowledge of its algorithms or key data is sufficient to defeat the mechanism."
If you can defeat the mechanism, that's not very impactful if it's one stage of a multi-round mechanism. Especially if vulnerating or crossing that perimeter alerts the admin!
Lots of uncreative blue teamers here
People consistently misuse the Swiss cheese security metaphor to justify putting multiple ineffective security barriers in place.
The holes in the cheese are supposed to represent unknown or very difficult to exploit flaws in your security layers, and that's why you ideally want multiple layers.
You can't just stack up multiple known to be broken layers and call something secure. The extra layers are inconvenient to users and readily bypassed by attackers by simply tackling them one at a time.
Security by obscurity is one such layer.
Even if you have tons and tons of layers of seasoning, you still don't put tomato sauce or whatever on it.
Security does not consist only of 100% or 99.99% effective mechanisms, there needs to be a flow of information and an inherent risk, if you are only designing absolute barriers, then you are rarely considering the actual surface of relevant user interactions. A life form consisting only of skin might be very secure, but it's practically useless.
The saying is "security by obscurity is not security" which is absolutely true.
If your security relies on the attacker not finding it or not knowing how it works, it's not actually secure.
Obscurity has its own value of course, I strongly recommend running any service that's likely to be scanned for regularly on non-standard ports wherever practical simply to reduce the number of connection logs you need to sort through. Obscurity works for what it actually offers. That has nothing to do with security though, and unfortunately it's hard in cases where a human is likely to want to type in your service address because most user-facing services have little to no support for SRV records.
Two of the few services that do have widespread SRV support are SIP VoIP and Minecraft, and coincidentally the former is my day job while I've also run a personal Minecraft server for over a decade. I can say that the couple of systems I still have running public-facing SIP on port 5060 get scanned tens of thousands of times per hour while the ones running on non-standard ports get maybe one or two activations of fail2ban a month. Likewise my Minecraft server has never seen a single probe from anyone other than an actual player.
Again, if your security relies on any one thing, it's a problem. A secure system needs redundant mechanisms.
Can you think of a single mechanism that if implemented would make a system secure? I think not.
Why not use letters and packages which is the literal metaphor these services were built on?
It's like relying on public header information to determine whether an incoming letter or package is legitimate.
If it says: To "Name LastName" or "Company", then it's probably legitimate. Of course it's no guarantee, but it filters the bulk of Nigerian Prince spam.
It gets you past the junk box, but you don't have to trust it with your life.
Nuance.
Great example is port knocking - it hides your open port from random nmap, but would you leave it as the only mechanism preventing people getting to your server? No. So does it make sense to have it? Well maybe, it's a layer.
Kerckhoffs' principle comes to my mind as well here.
So while I agree with you on that's obscurity is fine strategy, you can never depend on it ever.
Right, I'm arguing that this is a property of all security mechanisms. You can never depend on a single security mechanism. Obscurity is no different. You cannot depend only on encryption, you cannot depend only on air gaps, you cannot depend only on obscurity, you cannot depend only on firewalls, you cannot depend only on user permissions, you cannot depend only on legal deterrents, you cannot depend only on legal threats, etc..
Or in other words, if you place absolutely zero trust in it, consider it as good as broken by every single script kid, and publicly known, then yeah, it's fine.
But then, why are you investing time into it? Almost everybody that makes low-security barriers is relying on it.
Depends on the context and exposure. Sometimes a key under a rock is perfectly fine.
I used to work for a security company that REALLY oversold security risks to sell products.
The idea that someone was going to wardrive through your suburban neighborhood with a networked cluster of GPUs to crack your AES keys and run a MITM attack for web traffic is honestly pretty far fetched unless they are a nation-state actor.
One of my favorite patterns for sending large files around is to drop them in a public blob storage bucket with a type 4 guid as the name. No consumer needs to authenticate or sign in. They just need to know the resource name. After a period of time the files can be automatically expired to minimize the impact of URL sharing/stealing.
I suppose if it's encrypted, no. Like the pastebin service I run, it's encrypted at rest. It doesn't even touch disks, so I mean, that's a decent answer to mine own question.
you dont put an unauthenticated thing in a difficult to find subdomain and call it secure. but your nicely secured page is more secure if its also very tedious to find. its a less low hanging fruit.
as you state also there is always a leak needed. but dns system is quite leaky. and often sources wont fix or wont admit its even broken by their design.
strong passwords are also insecure if they leak, so you obscure them from prying eyes, securing it by obscurity.
The possibility that I'm adding this feature to something that would otherwise have been published on a public domain does not cross people's mind, so it is not thought of an additional security measure, but a removal of a security feature.
Similarly it is assumed that there's an unauthenticated or authentication mechanism behind the subdomain. There may be a simple idempotent server running, such that there is no concern for abuse, but it may be desirable to reduce the code executed by random spearfishing scanners that only have an IP.
This brings me again to the competitive economic take on the subject, that people believe that this wisdom nugget they hold "that security by obscurity" is a valuable tennet, and they bet on it and desperately try to find someone to use it on. You can tell when a meme is overvalued because they try to use it on you even if it doesn't fit, it means they are dying to actually apply it.
My bet is that "Security through obscurity" is undervalued, not as a rule or law, or a definite thing, but as a basic correlation: keep a low profile, and you'll be safer. If you want to get more sales, you will need to be a bit more open and transparent and that will expose you to more risk, same if you want transparency for ethical or regulation reasons. You will be less obscure and you will need to compensate with additional security mechanisms.
But it seems evident to me that if you don't publish your shit, you are going to have much less risk, and need to implement less security mechanisms for the same risks as compared to voicing your infrastructure and your business, duh.
Is my threat model a network of dumb nodes doing automatic port scanning? Tucking a system on an obscure IPv6 address and never sharing the address may work OK. Running some bespoke, unauthenticated SSH-over-Carrier-Pigeon (SoCP) tunnel may be fine. The adversaries in the model are pretty dumb, so intrusion detection is also easy.
But if the threat model includes any well-motivated, intelligent adversary (disgruntled peer, NSA, evil ex-boyfriend), it will probably just annoy them. And as a bonus, for my trouble, it will be harder to maintain going forward.
Even when considering hi sophistication attackers, and perhaps especially with regards to them, you may want to leave some breadcrumbs for them to access your info.
If the deep state wants my company's info, they can safely get it by subpoenaing my provider's info, I don't need to worry about them as an attacker for privacy, as they have the access to the information if needed.
If your approach to security is to add cryptography everywhere and make everything as secure as possible and imagine that you are up against a nation-state adversary (or conversely, that you add security until you satisfy a requirement conmesurate with your adversary), then you are literally reducing one of the most important design requirements of your system to a single scalar that you attempt to maximize while not compromising other tradeoffs.
A straightforward lack of nuance. It's like having a tax strategy consisting of number go down, or pricing strategy of price go up, or cost strategy of cost go down, or risk strategy of no risk for me, etc...
Obscurity helps cut down on noise and low effort attacks and scans. It only helps as a security mechanism in that the remaining access/error logs are both fewer and more interesting.
However on more advanced levels, a more common error is to ignore the risks of open source and being public. If you don't publish your source code, you are massively safer, period.
I guess your view on the subject depends on whether you think you are ahead of the curve by taking the naive interpretation. It's like investing in the stock market based on your knowledge of supply and demand.
No it isn’t, it’s a push to get people to login protect whatever they want to keep to themselves.
It’s silly to say informing people that security through obscurity is a weak concept is trying to convince them to publish their stuff.
No one is saying that obfuscation should be the only layer. Your defense should never hinge on any single protection layer.
I setup a set of scripts to log all "uninvited activity" to a couple of my systems, from which I discovered a whole bunch of these scanner "security" companies. Personally, I treat them all as malicious.
There are also services that track Newly Registered Domains (NRDs).
Tangentially:
NRD lists are useful for DNS block lists since a large number of NRDs are used for short term scam sites.
My little, very amateur, project to block them can be found here: https://github.com/UninvitedActivity/UninvitedActivity
Edited to add: Direct link to the list of scanner IP addresses (although hasn't been updated in 8 months - crikey, I've been busy longer than I thought): https://github.com/UninvitedActivity/UninvitedActivity/blob/...
In fact, the scanners are simply searching the IP address space and simply sending GET requests to any IP address they find. No DNS discovery needed.
My guess is OP is using a public DNS server that sells aggregated user requests. All it takes is one request from their machine to a public machine on the internet, and it’s now public knowledge.
$ host 209.216.230.207
207.230.216.209.in-addr.arpa domain name pointer news.ycombinator.com.
74.231.187.81.in-addr.arpa. 3600 IN PTR ns2.nogoodnamesareleft.com.
in the zone file for that IPv4, but unless they've explicitly configured, or are using a hosting service that does it without asking, this it won't be what is happening.It isn't practical to do a reverse lookup from “normal” name-to-address records like
ns2.nogoodnamesareleft.com. IN A 81.187.231.74
(it is possible to build a partial reverse mapping by collecting a huge number of DNS query results, but not really practical unless you are someone like Google or Cloudflare running a popular resolution service)To not appear on the radar is to not invite investigation; if they can't see the door they won't try to pry it open.
If you're already on their radar, or if they already know the door is there (even if they can't directly see it), then it's less effective.
Doing something like this can prevent you from showing up on Shodan.io which is used by many users/bots to find servers without running massive scans themselves.
If ports 80 or 443 are open and there's a web server fingerprint (Apache, nginx, caddy, etc) then they could use further tools to try to discover domain names etc.
[1] Turns out you can port-scan the entire internet in under 5 minutes: https://github.com/robertdavidgraham/masscan
I'm quite sure OP meant a virtual host only reachable with the correct Host: header.
You can often decloak servers behind Cloudflare because of this.
But OP's post already answered their question: someone scanned ipv4 space. And what they mean is that a server they point to via DNS is receiving requests, but DNS is a red herring.
If you're deploying a service behind a reverse proxy, it either must be only accessible from the reverse proxy via an internal network, or check the IP address of the reverse proxy. It absolutely must not trust X-Forwarded-For: headers from random IPs.
I have a DNS client that feeds into my passive DNS database by reading CT logs and then trying to resolve them.
The OP and all the people talking about certificates are making the same assumption. Namely that the scanning company discovered the DNS name for the server and tried to connect. When, if fact, they simply iterate through IP address blocks and make get requests to any listening web servers they find.
> Web sites prove their identity via certificates. Firefox does not trust this site because it uses a certificate that is not valid for 138.68.161.203. The certificate is only valid for the following names: exhaust.lewiscollard.com, www.exhaust.lewiscollard.com
That said, while I think it's not the case here, using Cloudflare doesn't mean the underlying host is accessible, as even on the free tier you can use Cloudflare Tunnels, which I often do.
No, you make HTTP requests to an IP, not a domain. You convert the domain name to an IP in an earlier step (via a DNS query). You can connect to servers using their raw IPs and open ports all day if you like, which is what's happening here. Yes servers will (likely) reject the requests by looking at the host header, but they will still receive the request.
~5 billion scans in a few hours is nothing for a company with decent resources. OP: in case you didn't follow, they're literally trying every possible IPv4 address and seeing if something exists on standard ports at that address.
I believe it would be harder to find out your domain that way if you were using SNI and only forwarded/served requests that used the correct host. But if you aren't using SNI, your server is probably just responding to any TLS connect request with your subdomain's cert, which will reveal your hostname.
That it was in fact mentioned many hours earlier, in more than one top level comment.
It's very common for people to read only up to the point they feel they can comment, then skip immediately to the comment. So, basically, noone read it.
1. Not using SNI, and all https requests just respond with the same cert. (Example, go to https://209.216.230.207/ and you'll get a certificate error. Go to the cert details and you'll see the common name is news.ycombinator.com).
2. http upgrades to https with a redirect to the hostname, not IP address. (Example, go to http://209.216.230.207/ and you get a 301 redirect to https://news.ycombinator.com)
I actually had a job once a few years ago where I was asked to hide a web service from crawlers and so I did some of these things to ensure no info leaked about the real vhost.
They sell you security but provide you with CVEs en masse.
https://www.cybersecuritydive.com/news/palo-alto-networks--h...
The only proper response to OP's question is to ask for clarification: is the subdomain pointing to a separate IP? Are the logs vhost-specific or not?
If you don't get the answers, all you can do is to assume, and both assumptions may end up being right or wrong (with varying probability, perhaps).
Subfinder uses different public and private sources to discover subdomains. Certificate Transparency logs are a great source, but it also has some other options.
Wildcard certs can hide the subdomains, but then your cert works on all subdomains. This could be an issue if the certs get compromised.
Usually there isn’t sensitive information in subdomain names, but i suspect it often accidentally leaks information about infrastructure setups. "vaultwarden.example.com" existing tells you someone is probably running a vaultwarden instance, even if it’s not publicly accessible.
The same kind of info can leak via dns records too, I think?
That's correct "passive DNS" is sold by many large public DNS providers. They tell you (for a fee) what questions were asked and answered which meet your chosen criteria. So e.g. maybe you're interested, what questions and answers matched A? something.internal.bigcorp.example in February 2025.
They won't tell you who asked (IP address, etc.) but they're great for discovering that even though it says 404 for you, bigcorp.famous-brand-hr.example is checked regularly by somebody, probably BigCorp employees who aren't on their VPN - suggesting very strongly that although BigCorp told Famous Brand HR not to list them as a client that is in fact the HR system used by BigCorp.
This way, you will force everyone to go through Cloudflare and utilize all those fancy bot blocking features they have.
In the context of what OP is asking this is not true. DNS zones aren't enumerable - the only way to reliably get the complete contents of the zone is to have the SOA server approve a zone transfer and send the zone file to you. You can ask if a record in that zone exists but as a random user you can't say "hand over all records in this zone". I'd imagine that tools like Cloudflare that need this kind of functionality perform a dictionary search since they get 90% of records when importing a domain but always seem to miss inconspicuously-named ones.
> Even if it were not, the message you pasted says outright that they scan the entire IP space, so they could be hitting your server's IP without having a clue there is a subdomain serving your stuff from it.
This is likely what's happening. If the bot isn't using SNI or sending a host header then they probably found the server by IP. The fact that there's a heretofore unknown DNS record pointing to it is of no consequence. *EDIT: Or the Cert Transparency log as others have mentioned, though this isn't DNS per se. I learn something new every day :o)
This is generally true but also if you watch authoritative-only dns server logs for text strings matching ACL rejections, there's plenty of things out there which are fully automated crawlers attempting to do entire zone transfers.
There are a non zero number of improperly configured authoritative dns servers out there on the internet which will happily give away a zone transfer to anyone who asks for it, at least, apparently enough to be useful that somebody wrote crawlers for it. I would guess it's only a few percent of servers that host zonefiles but given the total size of the public Internet, that's still a lot.
> NSEC3 was a “close but no cigar” solution to the problem. While it’s true that it made zone walking harder, it did not make it impossible. Zone walking with NSEC3 is still possible with a dictionary attack.
So, hardening it against enumerability is a question of inserting non-dictionary names.
It's basically the way how to get all DNS records a DNS server has. Interestingly in some countries this is illegal and in some this is considered best practice.
Generally, enabled zone transfers is considered as misconfiguration and should be disabled.
We did research on that few months back and found out that 8% of all global name servers have it enabled.[0]
[0] - https://reconwave.com/blog/post/alarming-prevalence-of-zone-...
Configuring BIND as an authoritative server for a corporate domain when I was a wee lad is how I learned DNS. It was and still is bad practice to allow zone transfers without auth. If memory serves I locked it down between servers via key pairs.
The only thing I can think of that would let you do that would be a DNS zone transfer request, but those are almost always disallowed from most origin IPs.
https://www.domaintools.com/resources/blog/zone-walking-zone...
The way around this is to issue a wildcard for your root domain and use that. Your main domain is discoverable but your subs aren't.
There are other routes: leaky extensions, leaky DNS servers, bad internet security system utilities that phone home about traffic. Who knows?
Unless your IP address redirects to your subdomain —not unheard of— it's not somebody IP/port scanning. Webservers don't typically leak anything about the domains they serve for.
There are evidently technical/footprint implications of that convenience. Fortunately, I'm not really concerned with the subdomain being publicly known; was more curious how it become publicly known.
People buying such records do so for various reasons, for example to seed some crawler they've built.
Transparency logs are fine except if you have a wildcard cert (or no https, obviously).
IP scans are just this: scans for live ports. If you do not provide a host header in your call you get whatever the default response was set up. This can be a default site, a 404 or anything else.
(Alright, some IP addresses, not all of them)
I also wonder if this is a potential footgun for eSNI deployments: If you add eSNI support to a server, you must remember to also make regular SNI mandatory - otherwise, an eavesdropper can just ask your server nicely for the domain that the eSNI encryption was trying to hide from it.
https://securitytrails.com/ also had my "secret" staging subdomain.
I made a catch-all certificate, so the subdomain didn't show up in CT logs.
It's still a secret to me how my subdomain ended up in their database.
1) CZDS/DNS record sharing program
2) CT Logs
3) Browser SCT audit
4) Browser telemetry
5) DNS logs
6) DPI
7) Antivirus/OS telemetry
8) Virus/Malware/Tracker
9) Brute forcing DNS records
10) DNSSEC
11) Server softwares with AutoTLS
12) Servers screaming their hostnames over any protocol/banner thing
13) Typing anything on the browser search bar
14) Posting it anywhere
And many other novel ways I can't think of right now. I have successfully hidden some of my subdomains in the past but it definitely requires dedication. Simple silly mistakes can make all your efforts go waste. Ask any red/blue teamer.
Want to hide something? Roll everything on your own.
Also note that your domains are live as they're allocated (they exist). Whether a web server or anything else actually backs them is a different question entirely.
For "secret" subdomains, you'll want a wildcard certificate. That way only that will show on the CT logs. Note that if you serve over IPv4, the underlying host will be eventually discovered anyways by brute-force host enumeration, and the domain can still be discovered using dictionary attacks / enumeration.
Never touched Cloudflare so this is as far as I can help you.
Another option are wildcard certificates.
This obviously can't be the only protection. But if an attacker doesn't know about a service, or misses it during discovery, they can't attack it.
If "the internet fails to find the subdomain" when using non-standard practices and conventions then perhaps "following the internet's recommendations", e.g., use Cloudflare, etc., might be partially at cause for discoverability.
Would be surprised if Expanse scans more than a relatively small selection of common ports.
Some may find this more desirable than wildcard certificates and their drawbacks.
I’m surprised amazon offers the option to not log certificates. The whole idea is that every issued cert should get logged. That way, fraudulently-issued certs are either well documented in public logs- or at least not trusted by the browser.
Depending on the issuer logging all certs would never work. You can't rely on the untrusted entity to out themselves for you.
The security comes from the browser querying the log and warning you if the entry is missing. In that sense declining to log a cert is similar to self signing one. The browser will warn and users will need to accept. As long as the vast majority of sites don't do that then we maintain a sort of herd immunity because the warnings are unexpected by the end user.
Why?
https://www.cisa.gov/news-events/alerts/2021/10/08/nsa-relea... Direct: https://media.defense.gov/2021/Oct/07/2002869955/-1/-1/0/CSI...
But I'd say there's no issue if everything else is secured properly.
2) Are you using TLS? Unless you are using a wildcard cert, then the FQDN will have been published as part of the certificate transparency logs.
I imagine the certificate transparency log is the avenue, but local monitoring and reporting up as a new URL or domain to scan for malware seems similarly plausible.
https://pentest-tools.com/information-gathering/find-subdoma...
Based on this it sounds like you exposed your resource and advertised it for others. Reverse dns, get IP, scan IP.
Probably simpler, you exposed resource on IPV4 publicly, if it exists, it'll be scanned. There's probably 100s of companies scanning entire 0.0.0.0/0 space at all times.
https://www.ghacks.net/2021/03/16/wonder-about-the-data-goog...
I have plenty of subdomains I don’t “advertise” (tell people about online) but “unlisted” is a weird thing to call those. Also I don’t see how it would matter at all when it comes to Google auth.
My guess is they blocked it based on the subdomain name itself. I made a “steamgames” subdomain to list stream games I have extra copies of (from bundles) for friends to grab for free. Less than a day after I put it up I started getting chrome scare pages. I switched it to “games” and there have been no issues.
The name "userfileupload" is far from not-obvious, so that would be my guess.
Could have been discovered from the SSL cert request for the subdomain.
Maybe you published the subdomain in a cert?
Snooped traffic is unlikely.
This is a good question, if you don't publish a subdomain, scanners should not reach it. If they do, there's a leak in your infra.
There are countless of tools to use for subdomain enumeration. I personally use subfinder or amass when doing recon on bug bounty targets.
Just knowing 1 "secret"— a subdomain in this case —shouldn't get you somewhere you shouldn't.
In general you should always assume that any password has been (or could be) compromised. So in this case, more factors should be involved such as IP restricting for access, an additional login page, certificate validation, something...
One thing you could do is use a wildcard certificate, and then use a non-obvious subdomain from that. I actually have something similar - in my set up, all my web-traffic goes to haproxy frontends which forward traffic to the appropriate backend, and I was sick of setting up multiple new certificates for each new subdomain, so I just replaced them all with a single wildcard cert instead. This means that I'm not advertising each new subdomain on the CT list, and even though they all look nominally the same when visiting - same holding page on index and same /api handling, just one of the subdomains decodes an additional URL path that provides access to status monitoring.
Separately, that Palo Alto Networks company is a real pain. They connect to absolutely everything in their attempts to spam the internet. Frankly, I'm sick of even my mail servers being bombarded with HTTP requests on port 25 and the resultant log spam.
So my guess is reverse DNS
1. DNS Leaks or Wildcard Records Wildcard DNS Entries: If your main domain (sampledomain.com) has a wildcard DNS record (e.g., .sampledomain.com), any subdomain (including userfileupload.sampledomain.com) could be automatically resolved to your server’s IP. Even if the main domain is inactive, the wildcard might expose the subdomain.
Exposed Subdomain DNS Records: If the subdomain’s DNS records (e.g., A/CNAME records) are explicitly configured but not removed, bots could reverse-engineer them via DNS queries or IP scans.
Fix: Remove or restrict wildcard DNS entries and delete unused subdomain records from your DNS provider (e.g., Cloudflare).
2. Server IP Scanning IP-Based Discovery: Bots like Expanse systematically scan IP addresses to identify active services. If your subdomain’s server is listening on ports 80/443 (HTTP/HTTPS), bots may:
Perform a port scan to detect open ports. Attempt common subdomains (e.g., userfileupload, upload, media) on the detected IP to guess valid domains. Fix:
Block unnecessary ports (e.g., close port 80/443 if unused). Use a firewall (e.g., ufw or Cloudflare Firewall Rules) to reject requests from suspicious IPs. 3. Cloudflare’s Default Behavior Page Rules or Workers: If the subdomain is configured with Cloudflare Workers, default error pages, or caching rules, it might generate responses that bots can crawl. For example:
A 404 Not Found page with a custom message could be indexed by search engines. Worker scripts might inadvertently expose endpoints (e.g., /_worker.js). Fix:
Delete unused subdomains from Cloudflare’s DNS settings. Ensure Workers/routes are only enabled for intended domains. 4. Reverse DNS Lookup IP-to-Domain Mapping: If your server’s IP address is shared or part of a broader range, bots might reverse-resolve the IP to discover associated domains (e.g., via dig -x <IP>).
Fix:
Use a dedicated IP address for sensitive subdomains. Contact your ISP to request removal from public IP databases. 5. Authentication Flaws Presigned URLs in Error Messages: If the subdomain’s server returns detailed error messages (e.g., 403 Forbidden) when accessed without authentication, bots might parse these messages to infer valid endpoints or credentials.
Fix:
Customize error pages to show generic messages (e.g., "Access Denied"). Log and block IPs attempting brute-force access. How to Prevent Future Discoveries Remove Unused DNS Records: Delete the subdomain from Cloudflare’s DNS settings entirely. Disable Wildcards: Avoid .sampledomain.com wildcards to limit exposure. Firewall Rules: Block IPs from scanners (e.g., Palo Alto Networks, Expanse) using Cloudflare’s DDoS Protection or a firewall. Monitor Logs: Use tools like grep or Cloudflare logs to track access patterns and block suspicious IPs. Use Authentication: Require API keys, tokens, or OAuth for all subdomain requests. Example Workflow for Debugging bash # Check Cloudflare DNS records for the subdomain: dig userfileupload.sampledomain.com +trace
# Inspect server logs for recent requests: grep -E "^ERROR|DENY" /var/log/nginx/access.log
# Block Expanse IPs via Cloudflare Firewall: # 1. Go to Cloudflare > Firewall > Tools. # 2. Add a custom rule to block IPs (e.g., from scaninfo@paloaltonetworks.com). By tightening DNS, server, and firewall configurations, you can minimize exposure of your internal subdomains to bots.