It "blocks profanity", but "shithead", "assfucker", etc. are allowed (not to mention obfuscating a restricted term even slightly, e.g. "sh1t")? Yes, the Scunthorpe problem exists, but you can do better, and should if you're expecting people to pay to wait 500ms.
Something that detects these sorts of things very well could actually be worth paying for, although it still would probably be better off as a library.
- need to improve categorisation (some are miscategorised, some categories don't make sense) - better list; more subsets to block (fair and very true) — this is an evolving list and so I'll work on constantly adding more to it (currently has ~1.7million records; will go to 2.5 in the next few days) - latency is a killer
Again, I said it in another comment too, I'm pretty happy with this (tears on the inside) because the problem at least is validated in some way.
I just need to do better in terms of solutioning; which, IMO, is doable.
This does feel like a real problem. The thing that concerns me (and likely other devs here) is that it adds an additional remote API dependency for a very core part of a system when a lot of people are trying to keep those dependencies to an absolute minimum. When your service goes down (not if), everyone who’s dependent on you will not be able to register new users, etc.
Is there any way you can offer this as a library instead? You deserve to get paid of course - maybe provide the library and initial data and charge for updates / premium checks, something like that.
As for the original concern though, here's some thoughts: You may just use it to flag (not act) in an async way. This way, you can just alert/monitor and decide later whether or not to take any actions while keeping the flow non-blocking. Another approach would be to run it against existing handles to see what opportunities exist (ex: premium usernames, impersonators etc.).
BUT, thanks again for the input. I'll definitely make this happen!
As for realtaylorswift, I thought about that too. I don't think — and this is my personal opinion, obviously — most platforms wouldn't want to restrict this because then it really becomes unmanageable. I could obviously be wrong though and these could very easily be introduced to the API also (i.e. detect obvious username patterns) and totally open to adding that as an API parameter too.
Highly recommend you read this and similar posts: https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...
I didn't mean to reduce the complexity of the challenge. Was mostly trying to convey that the specific cases being discussed, should be something that I could quickly solution and incorporate in the API.
You're right about ALL the different kinds of edge cases that exist though and really, I'm trying to have this API be the go-to solution for it. Clearly, it's still not there. But it will be. I'm now more sure than ever.
This is a big one for this kind of project, and I've never been sure how usernames for people named Kike should be handled.
{
"username": "bill_gates",
"isReserved": false,
"isDeleted": false,
"categories": []
}
what's the point of this thing...?And the definition of a "public figure" is absurdly broad and inconsistent. Some very common names are flagged as reserved for what are extremely minor celebrities at best (like an assistant coach of a college basketball team, or a actor with barely any formal credits as examples, and some other obscure athletes are marked as reserved while others are not).
As for definition of specific categories (more specifically public figures), you're right. Currently, it's just me building this and so I had to decide where to draw the line. I just drew it around the entire earth which I know is NOT the best appraoch but that's the one I went with just to ensure I cover all bases. Honestly, the API would tell if and why a username could be deemed reserved/premium. What to do with this info is really up to the platforms that are consuming it. They could let it slide, do nothing, just flag and monitor, block etc.
As other comments point out, lots of holes.
I think nobody should pay for that.
WDYT?
Do you expect / want this to be a business? This feels like the kind of thing where anybody big enough to pay for it will build it in house. And your pricing seems so cheap that even if you do win some it won't be enough.
Genuine curiosity but 300ms seems slow? Am I missing something? How big is the blacklist?
I'm a bit unsure about it's future as a business but for now, hoping it becomes my first app with some paying users. I typically think small scale but you're right. I suppose most big companies already have an in-house way to deal with it.
Idea behind this was super charged because there wasn't a global reserve list already available for folks to access.
On the latency, I'll work on improving it. Currently, the list (not a blacklist :P) is about 1.7 million records. I suspect it to go to 2.5M in the next few days. I should probably stop using Cloudflare Workers, KV and D1 to instantly improve on that.
In it's current state, I'd look at the API to check for reserved / premium names (or something that's profane).
If it makes sense contextually: imagine if you were building the next Twitter. I'm guessing you'd want to have a way to charge for premium names and in-turn need a way to detect what's premium. For the most part, first and last names are pretty premium and people pay (they do!) for such usernames.
Edit: 300ms?!
1. latency: my original goal was to make it sub-10s but with checking for auth, cold starts, the actual lookup, couldn't get it to do better than 2-300ms. I need to improve this though and I will. 2. increased list size: currently, the lookup happens across 1.7million records (will go up to 2.5m in the next days/weeks) BUT I don't think that would ever cover ALL scenarios. 3. better categorisation
Same question, but for place names which seems completely innocuous?
Instead of us telling you why this is a bad idea, can you tell us why this is a good idea and what bugs we are shipping currently that this prevents?
As for bugs: what I see happening now is folks either have a static list (which is already bad; not a bug) or have pattern-matching to avoid these (which isn't full proof). Regex/pattern matching can only help in cases where we have "real" or "try" or "something" as a pre/postfix. More complex cases but don't really identify a wide range of premium / reserved names. IMO, for this, we will need a dictionary of sorts, which is what I'm hoping to achieve with this API.
It's a giant manual list. I'm a human maintaining it. Just need to do better in terms of the API / deliverability side of things.
> Fair. I suppose most newer platforms may not think too much about it. So here's the pitch though: Imagine you're building the next Twitter (or, you know the platform has the potential to become the next Twitter). Knowing what we know now about social media platforms, where, users are open to paying for premium usernames (ex: @apple, @cocacola, @media etc.), it would be nice to at least flag/know if there are folks trying to reserve with these usernames. You could decide later / async what to do about it but you'll at least have a way to flag. Similarly, you can also avoid profanity or abusive words from seeping in the platform also. You may want to restrict/block 'em outright.
How many people are trying to build the next twitter? I would guess it's approximately zero, so I think you'll need a wider target audience to generate meaningful revenue.
It's much easier for the next twitter to just institute a policy that says handles can be modified by the platform as needed and deal with the "problem" post hoc.
> As for bugs: what I see happening now is folks either have a static list (which is already bad; not a bug) or have pattern-matching to avoid these (which isn't full proof). Regex/pattern matching can only help in cases where we have "real" or "try" or "something" as a pre/postfix. More complex cases but don't really identify a wide range of premium / reserved names. IMO, for this, we will need a dictionary of sorts, which is what I'm hoping to achieve with this API.
Based on what you've said, you're also using a static list, correct?
Long term, I suppose the actual value proposition is not that using a list is a bug, but you have the "best" list due to your scale and people can outsource managing their own version?
To me, the issue is that this isn't a solvable problem using your current approach because people are more creative than a list of banned strings and you're severely outnumbered at scale.
On the static list, yes. Me too. But I keep updating mine as well. For ex: on day 1, "apple" was just a dictionary word. On day 2, it was also classified as a brand. Also, every quarter, half-yearly or yearly, there are newer companies, public figures whose usernames keep getting to be significant. Currently, though manually, I intend to maintain this list for the long run.
As for a better, permanent solution, on another comment, I came across using an LLM/classifer for this (based on my understanding, that's not just asking OpenAI but building an LLM of my own) where I have the "best" source of truth and the LLM handles all variations. I think it actually is solvable to an extent now. Though, I'm not sure what the final solution looks. I WILL SOLVE THIS THOUGH :D
You're signing up to play a game you can't win preemptively IMO.
As an aside, cocacola is also "available", despite being listed as an example of what you don't want to allow on the homepage and presumably would be flagged as a reserved brand name handle by this service.
As for @cocacola — that's on me. I've not yet gotten to the bottom half of the list of categories here: https://docs.username.dev/reference/categories (need to work on "government" and below). "company" is listed there and I suspect "cocacola" should be covered there.
In hindsight, I should've reserved names that I'm showing in the flipping text of the hero title but I didn't want to game the system or make it seem more reliant than it currently is. Which, again, I'm learning is not so reliant to begin with anyway.
PS. Love the passion around the topic here. One thing that I'm happy about is getting the problem validated. It's not in my head, I'm not the only one experiencing it, this is real. AND I WILL SOLVE IT :)
Full disclosure: I'm not a developer. I understand tech architectures well. Can code (have coded in JS pre-AI too) BUT will figure this out as I go along. Thanks and truly appreciate the input.
Edit note: added million next to 1.7. fml!
If the handle is taken by what seems like the same content/brand owner across FB, IG, reddit, X, etc. then that could add weight to a decision to reserve it (and be provided as useful context to your user as to why you recommend it be reserved), and if it's associated with something like hate speech or just crappy content someone who is doing brand research can know to look for alternatives.