This isn't something that commonly known (even judging by comments here) but in the publicly viewable metadata of every upload it contains the uploader's IA account email address. So from a security perspective it's bad but from a privacy perspective a lot of users probably weren't aware of this detail if they've uploaded anything.
If someone wants to upload and never be found out, then they need to use a throwaway address in any case, lest they be providing their "private" address to the administrators of the service without explicitly forbidding further disclosure. If I say something to Alice without demanding that Alice keep it from Bob, then I implicitly don't mind if Alice tells Bob what I said.
Even if your email is public information and even if what is uploaded is public information that doesn't imply that the email address behind the account that uploaded that information should be public.
With ChatGPT, this can be extended to create emails that look very personal - as if someone has followed all of your work and is genuinely interested in what you are up to - with extremely low effort. And people are already doing this, I already get emails like this today.
Should emails be private? I don't know - I personally consider them to be public because I know for a fact mine will eventually be public whether I like it or not. But I am aware AI is out their slurping up every public communication I've ever had, and is likely trying to manipulate me in various ways already today.
Quantity is a quality. Add that the AI can profile you and do a decent job spear phishing and you're talking about a sea change.
>and the real internal one
“Three can keep a secret, if two of them are dead.”
There is no such thing as an 'internal' email you communicate to other people outside your company with. It's just an email address. Someone at some point will leak it by accident or malice.
Sure, so personally I never use it to communicate with people outside. Also, I make sure it's never used to register with external licenses like Docker Desktop etc. as they subscribe me to their spamlist and send the usual semi-personalized messages - but as far as I can tell most of these bigger companies don't sell them outside (for a good reason). Startups, however, will do what they want and will make sure to squeeze the last drop from the info that such-and-such person works and that company and does X.
shit, now i don't feel like sending e-mails to people i'm actually interested in
There are several ways to look at that.
The organization that I work for considers anything that ties two pieces of information about a person together as private information. That is to say that a person's name is not private and a phone number is not private, but connecting a phone number to a name is private. In one form or another, an email is frequently tied to a name (e.g. the email address is based on their name, or an account record includes both a name and an email address).
Another way is to consider how accessible the information is. There was a lot of information that was not considered as private prior to the widespread adoption of the internet. One issue that I remember popping up in the early 1990's involved property (i.e. land) records. Historically, people had to go to a government office to access them but they were publicly available. Since they were publicly available, some governments made them available online. Once they were available online, the barriers to access were removed (e.g. having to physically visit an office) and the ability to abuse that information was vastly increased. All of a sudden, people started considering something that used to be considered as public information as private information.
For contrast truly unique email aliases for example aren't possible on common services like free Gmail*, only things like self-hosting/certain paid email hosts, which makes less feasible for many. So from a privacy perspective while in an ideal world everyone would be able to freely create entirely unique per-account creds we're mostly stuck with the email implementation.
* One could create entirely separate accounts but it's high friction and IIRC the same phone number (now a requirement) can only be used for 2-3 accounts.
They're better at it than I am, and it means I don't have to fill up my free time maintaining another server.
> same phone number (now a requirement) can only be used for 2-3 accounts.
I've wondered about this. Every Android/ChromeOS device I've ever bought, I had a new Google account created for it (during setup, instead of using an existing account), and only a few actually had phone numbers (I don't generally use smartphones for telephony). Is "Google account" synonymous with "GMail account" these days?
I've had this idea for an experiment where I get such a device (without a simcard), and see how many times I can iterate the Initialize-Device-With-New-Google-Acct-PowerWash-Repeat cycle, and how many Gmail accounts I would have as a result.
Links to information would be appreciated, even/especially if it's a complex task to do this.
(I never put a lot of effort into this, because having the Google account be anonymous/fake-named was generally tolerable for my privacy standards)
but you're right, it does help!
I sadly don't think that's viable.
What might be, in our current world, would be having a mail server/client setup where you can generate random addresses for yourself like Wf1JJUBHLu@domain.com and never re-use an e-mail address, much like with passwords, while being able to see all of the incoming mail in the same place and respond with the corresponding accounts.
Then, when your address gets traded around, it'd be fairly obvious (with some basic bookkeeping, e.g. a text field with purpose/URL for why a certain address was created) who is to blame for it and blocking incoming traffic from somewhere would be trivial as well.
I do have a self-hosted mail server and there are commands to create new accounts pretty easily, I'd just need to figure out the configuration for collecting everything in one place, as well as maybe make a web UI for automating some of the bits. I wonder if there are any off the shelf solutions for this out there.
That way the SMTP server can reject all unknown user@ without accepting them in the first place - preventing spamming and some types of denial of service through resource starvation.
I also apply greylist based on a unique tuple (From, To, client IP address) so on first connection with that tuple valid SMTP clients need to re-deliver the email after a waiting period. Any subsequent delivers are accepted immediately.
And the other way around as well. Send an email from an arbitrary <whatever>@domain email address.
At that point, you probably want to use whatever features one of the big providers use, like: https://proton.me/support/aliases-mail
Maybe even something that'd sit in front of a mail server that you yourself control, I wonder what the variety of options out there is.
Yes and no. Both of them. As any powerful tool, email is going to be abused, like any other alternative would be when it will come one day. Those services allowing creation of dynamic email addresses do their job (until they're banned, that's why I'm not mentioning them), however using them isn't automatic and most people don't even know about their existence. What if we then did upgrade email protocols to reflect current needs wrt privacy and modified existing mail servers so that they could create dynamic addresses when asked by a simple flag? Example: I want to subscribe to a service from company XYZ, however I'm not sure how much I can trust them, therefore, when writing an email or filling a web form I can activate the option to create a new address that is tied to the recipient I'll be writing to, and will work as a dedicated proxy for my real address, that is, every mail I send to the recipient using my real address will be actually sent from the new dynamic address, then all replies to the dynamic address will be routed to my real one, but a field in its headers will always contain either a memo by me (example: "signup with XYZ") or the original recipient (example: "info@xyz_trustuswerenotspammers_yeahsure.com"). This way one can immediately spot whoever sold their address to others and blacklist them. As said, those services work well but not being built in into mail servers and clients their adoption is quite restricted. I don't see why that function shouldn't be embedded in a new upgraded email protocol as the modification would neither be that hard nor consume any serious resource. I would however expect heavy resistance against the adoption, of course.
Buildings are analogous to domains, not email addresses.
GDPR is clear on this and there have been significant fines for revealing email addresses against the will of their owners (e.g. using cc instead of bcc). Not saying this is the ultimate wisdom, just a data point to consider.
I dunno. Should your personal phone number be private? Or your home address? Would you be okay if I knew it and shared it with a stranger? Or would you rather be asked permission to share it first?
Seems pretty cut and dry to me. Yeah, there's going to be someone out there (there always is) who doesn't care, but I'd wager the majority would be pretty ticked off if you gave those pieces of information out to a rando on the street.
But public vs private is a spectrum, not a binary true/false. My phone number is public because I get sales calls from various companies to it. It's annoying, but bearable. But there's a big gap between that and the New York Times putting my name, number and picture on the front page.
So your home address and phone number aren't private. But they're also not readily accessible unless someone is really dedicated to finding them, so they're not quite public either.
An email (or phone number, or address) is an identifier. Asking whether this identifier is public or private misses the important thing, which is the action that can be paird with the identifier.
So therefore, there's no universal answer to whether the identifier should be public or private. It's a case by case basis, when paired with an action.
For example, i don't want a shop to see me buying condoms, so shops shouldn't get my email address (or phone number).
Numbers were however tied to a property rather than individual personal phones in our pockets. When you think about it, mobile phone technology arrived quickly and caught everyone by surprise. Back in the 80s very few people thought we'd be carrying around "pocket TV phones" in such a short time.
An email address will be part of the xml in his uploads but also in his profile, which anyone can access by simply changing the url from https://archive.org/details/@foobar to https://archive.org/download/foobar. So, in essence, one just needs to have a registered account, independeltly any uploads made.
Theoretically, someone could scrape the pages and compile a list of exposed email addresses.
I laughed. Oh no! Anyways…
The people interested in identity theft are probably too busy figuring out what to do with all the SSNs they stole (not from this breach, but from the annual catastrophic breach of a credit bureau or government repository).
And the people who want your email probably already got it from one of the hundreds of other services you have to create an account for now.
I’m not really sure if there are circumstances where donating to the internet archive could be held against you and lead to persecution. Maybe in certain Luddite communities? The Amish? But then, how would they know…
He's moved on the next stage, but I was glad I was able to put his site back up.
It'll be a shame if IA goes down permanently, but we need a decentralized solution anyway.
Having a single mega organization in charge of our collective heritage isn't a good idea.
if anyone knows something like what I'm suggesting, I'd love to hear about it!
https://en.wikipedia.org/wiki/Cooperative_storage_cloud gives a few examples, like Filecoin.
1) wedding itself to crypto with FileCoin.
2) terrible performance due to architectural choices (basically: too much pointer-chasing, except every pointer was back out to the DHT).
3) No serious attempts to integrate with existing software distribution strategies.
I think it's still a good core idea.
Independently ran mirrors all over the world, along with snapshots.
Have the occasional fork or two. Say your from a small town in Northern Illinois. If you have 2 TB of image archives from a defunct local newspaper, it might be good for photography forks even if it wouldn't make sense for the main archive.
The system that asks volunteers about their age, sex, location, and storage format details (the model, past use etc. can be used to predict the durability of a single storage) without sharing most of this data anywhere.
The downloaders are then algorithmically allocated pieces of the archive. Exampli gratia such that there is at least limited amount of overlap between the pieces, and two people same country won't provide redunancy for each other.
When a downloader verifies that they have completed the download by giving (unique, to prevent fake-download sabotage) SHA hashes of the data, the information that these pieces have been downloaded in this or that country, plus an estimate of the reliability of the storage, is added to a public database, for the algorithm to use in the future.
Every downloader is then generated a public and private key so that they can give the hash of their download again once in a while or just verify that the piece is still there. The reliability estimates (based on storage / hardware details) would be empirically calibrated based on the data about the actual storage failures.
A public counter, estimating how well the archive is currently backed up via this scheme, could be displayed.
For copyright issues, it would be possible to encrypt some of the data, e.g. such that normally borrowable items become readable files only when X% of downloads are pieced together.
The scheme would be primarily based on existing designs and algorithms but work roughly as depicted above. I am not an expert of what compression, hashing and other algorithms should be used, and it needs lots of good work, to determine how to avoid errors in the scientific part of estimating the reliability of the downloads—and generally a situation where it would turn out that lots of data was lost when attempting to put the pieces back together again.
Remark (engineering): To empirically validate the correctness of the software of the backup architecure by testing it on grids of real hard drives in single places will probably give safety against catastrophic failure. Even better would be to obtain large amount of old hard drives and SSDs kept in a single place for a long time, to validate that the software works over time.
Remark (integrity): That a downloader actually has the downloads can be verified efficiently by IA server adding small part to the piece the downloader has, hashing it again, and requesting the new hash.
Remark (redunancy): It may be possible to develop a social program that analyzes whether a volunteer in certain place can provide more redunancy by buying themselves a hard drive or by supporting the acquisition of hard drives for volunteers who have proved themselves realiable elsewhere. This is speculative and the benefit may be lower than the risks.
Finally, instead of "public database" it may be much more optimal to decide to use a blockchain of some sort. Not a cryptocurrency, but a blockchain. This is because if the idea is to distribute copies over the world to ensure continguency in case of IA main architecture collapse, then the more parts of the distributed backup architecture (which must actually not be "the backup architecture" but "a scheme", that no everyday IA decisions rely upon, and that just exists out there) are on a blockchain network run by a "decentralized" system, the more reliable it will be.
My heuristic plausibility analysis: 0. IA backup would not need to be constantly accessed or changed (this makes storage easier, cheaper and prolongs the maximun age of the storage) 1. Not all IA has to be backed up: a distrobuted backup that successfully recovers 10% of IA in a catastrophe is by all means a great success (consequently priorization of what might / should be stored should probably be part of the algorithm that decides what volunteers download; and what existing "big" archives already store that overlaps with IA should be taken into account in this analysis) 2. I recall you estimated 30-40 M USD ballparks for a single copy: a properly led open source project may be able to develop this for free, and fairly compensated one could be ~ 0.1% to 1% of the cost. 3. The Sia network https://siascan.com/ has space for 7PB; and it's for storage where one can download their own files at any time; and they have had very little publicity. 4. 2TB hard drive costs 50-100 USD and 20PB would be 10 000 humans buying one 2TB hard drive which by itself is possible. Hobbyists and organizations may be able to provide even larger capacities. 5. Most IT projects fail, but since lots of technology already exists and in this we know what we are doing and IA might be able to recruit above talent we can conservatively, give conservatively 50% chance the groundwork development to succeed, or 45% without funding. 6. If the develoment succeeds, then there may already be around ~ 100 potential volunteers. I estimated that 0.1% IA visitors may volunteer, plus 1% from Hacker News traffick were to project to be mentioned there, plus growth over first few years and traffick from elsewhere. Perhaps 75% chance to get 10% of IA backed up by volunteers, given development succeeds. 7. If that much is backed up, there is perhaps 5% of attaining 200 TB in next few decades.
Conservatively, given that open-source development starts, one gets apprx. 33% - 38% chance that 10% backup is achieved & apprx. 1-2% that 100% of what is now in the IA, could be backed up. These are of course rather meaningless numbers, but the fact seems that in the lack of funding to build a complete backup IA can best guarantee continguency by starting to build a distributed one. Perhaps this was needlessly lots of words for a simple proposal.
- X
---
Note: It's probable that at least the NSA has a private full IA backup.
https://www.friendlyelec.com/index.php?route=product/product...
(just an example, as it's way overkill for the task)
With copyright, as individuals we get to trade all of the wonderful stuff already made (and long paid for) for the flood of minute-old shit and sludge inundating us online constantly. It's a bad trade. Maybe copyright should stop encouraging creativity; the answer to how "artists" would get paid post-copyright might be "who cares, quit if you want."
We already have Herman's Head, we don't need any more crap.
That being said, please do not host content this way. P2P blows away the already thin privacy guarantees that the web provides. Anyone seeding the site gets the IP addresses of everyone on that site, and can trivially correlate that with other sites to build detailed dossiers on, if not individual people, at least households[0] of people. After all, that's how the MAFIAA[1] sent your ISP DMCA scare letters back in the 2000s P2P wars.
[0] IPv4 CGNAT would frustrate this level of tracking, but IPv6 is still subnet per subscriber. Note that you can't use individual v6 addresses because we realized very early on that the whole "put the MAC in the lower 64 bits of the address" thing was also a privacy nightmare, so IPv6 hosts rotate addresses every hour or so.
[1] Music And Film Industry Association of America, a ficticious merger of the MPAA and RIAA in a hoax article
Isn't that exactly what WebTorrent is?
I would have absolutely no trouble downloading the latest marvel movie but if you are looking for some old Soviet movie, Iranian movie or even old American movie then you're in bad luck. I've never seen more than 0 seeder on thepiratebay.
https://www.bleepingcomputer.com/news/security/internet-arch...
Do they? Why?
* Exceptions apply.
Troy isnt publicly sharing the credentials and that's what's valuable — especially having "exclusive" access.
He blogged or tweeted about this at some point. Sadly, I can't find the link.
My unique-to-archive.org email address is not there yet.
EDIT: Should've read TFA more thoroughly, it says the breach happened before the 30th September. And I created my account around the 2nd October
I have checked and known my address was in a hack and it isn't there, while other times it is there. I also wonder if they start filtering out by domain, as they see a domain across multiple databases with unique addresses in each database exactly one time.
All you need is a domain and an email provider that allows catch-all addresses, both of which are easy and cheap.
Edit: even more fun with catch all domains then it’s company-name@spam.my.domain
Real estate agents can be pretty aggressive with emailing, but IME respect unsubscribes and don't seem to share/leak emails. I kind of wish I'd used an address per agent instead of per company to see what was happening better.
Non-company uses can also reveal issues. I had an address scraped from a flatmate finding site, and one apparently lifted from a relative's contact list somehow (I only have one I use for family, so that was a concern, but spam to it petered out quickly).
You can't sign up for a Samsung account with the name Samsung anywhere in your e-mail address. Aliexpress another offender. There my email is just spam@domain.
1. Buy a domain. About $10/year for a .com
2. Buy a /24 ipv4 block with good reputation (maybe like $10k)
3. Get a rack in a nearby datacenter, rack up a BGP-capable router and your servers for redundancy to run email. Takes about $30k initial setup costs if you buy all new, and about $5k initial setup costs if you cut corners and buy used. It'll be $2k/mo after that, so less than the cost of 1 $100 avocado toast per day, quite affordable.
4. Setup your mailserver of choice, such as dovecot + postfix. Enable either a catch-all address, or use recipient_delimiters. The former means "anything@domain.com" works, and the latter means "user-anything@domain.com" works (assuming your recipiient_delimiters are '-'). I recommend using a real catchall.
5. Setup your spam setup, this is the hardest part. I have no guidance here.
6. Point your DNS over, setup SPF and DKIM records, test, and off you go! This should all take about 1 to 3 days if you know what you're doing.
7. Find out that some email will go to spam anyway because you're not using one of the big 4 email providers, but it can't be helped, and anyway no one uses email anymore.
And after that, for less than $30k/year, you have email with catchall or subadressing support. Nice and easy.
You can also pay Fastmail for email and use their "catchall" feature https://www.fastmail.help/hc/en-us/articles/1500000277942-Ca...
Or Google Apps also has a catchall feature.
Then, after you do this, you can simply give internet archive the email address "internet-archive@mydomain.com", or generate a random string. If you forget the email you used, you can search your email history for the first email they sent you, and check the To field.
Why do you need a dc rackspace and a /24 just to have your email ?
Sure, you could pay fastmail $40/year for this, but that's not really the hacker news spirit, and no one on this site knows how to count as low as $40.
The real justifications you can give yourself:
Shared VPS hosting pretty much all bans email, AWS, DO, etc all have ToS that say "no email" as anti-spam measures.
Shared IP space will go straight to spam due to people having spammed on it in the past. Buy a /24 to ensure you don't go straight to spam.
Rackspace ensures you actually own your email, at least moreso than with other shared hosting, and owning your email is important.
Complete FUD.
Here is DO's acceptable use policy:
https://www.digitalocean.com/legal/acceptable-use-policy
You can see that they explicitly have policies for email hosts.
Here is a guide they host on how to setup a mail server:
https://www.digitalocean.com/community/tutorials/how-to-run-...
They forbid spamming, not all mail.
> Shared IP space will go straight to spam due to people having spammed on it in the past. Buy a /24 to ensure you don't go straight to spam.
I have had no problems with deliverability to Google from an IP on a shared block. I don't send marketing mails or any other kind of spam though. Microsoft blocks my IP but they are too small (outside businesses) for me to care to give them special snowflake treatment.
Deliverability of your own mails is also irrelevant for the original discussion about using unique email addresses for signing up to services - you don't need to be able to send at all for that.
costs around $12/year+domain
Note that I am speaking from personal experience here. I have been self-hosting email for over a decade, from the same IP, with (roughly) the same DNS records. Occasionally, for no reason, I will end up on the global spam list for Gmail, Outlook, or iCloud - never more than one at the same time, and never with a discernible reason. The best I can figure is that the IP is allocated to me by a hosting provider that occasionally sends out spam from its subnet (aka any hosting provider that doesn’t block smtp). I have also tried self-hosting a different mail server from a variety of residential IPs in different cities and countries, and ran into the same problem.
- have an iphone/mac w/ icloud+
- go into settings
- add custom email
- get redirected to login to cloudflare
- buy/pick a domain for $12
- icloud+ automatically sets up the MX records on the domain via cloudflare
- enable catch-all emails in icloud settings
- Done!
Takes about 10 minutes & icloud provides the email hosting without any additional fees
2. Configure a catch-all forwarding address to your private GMail
Done.
All a service provider or malicious actor has to do is simply not include it when storing or publishing it to evade tracking.
Stripping it is not uncommon for services to prevent duplicate accounts.
How this specific instance unfolded, time will have to tell. The leak may have occurred in 2020 for all we know at this point
When not used for extortion and for "status" in the hacking community, they share them with researchers (commonly HIBP) to warn people about a site's security and so that site is forced to fix things.
Definitely a strange dynamic.
$2a$
10$
Bho2e2ptPnFRJyJKIn5Bie
hIDiEwhjfMZFVRM9fRCarKXkemA3Pxu
ScottHelme
2a = bcrypt, 10 = 2^10 rounds, Bho2e2ptPnFRJyJKIn5Bie is the 22 character salt, hIDiEwhjfMZFVRM9fRCarKXkemA3Pxu is the 31 character hash value, and then there's ScottHelme. Best guess is that the archive.org folks just appended the user name to the stored hash. Maybe once upon a time they didn't have a username column in their table and this was a creative way of adding it.> Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach? It just happened. See 31 million of you on HIBP!
What a nice guy.
What is evil is the way that he's ensured that the predators in the dataset will never face any consequences by making the data available to HaveIBeenPwned, making it trivial for predators to protect themselves (the method through which this is possible intentionally left as an exercise for the reader), and making the data available to a news website for...some reason, but it's bound to ensure that the vulnerability will be patched out quickly and no one else will be able to access the data.
I find it much more likely that this hacker who sought out a website for uncensored AI erotica isn't actually a good guy, and might even have something to hide within the dataset. Hopefully, I'm wrong and we'll see more of this.
I don't know what the best practice is for keeping our personal data safe anymore.
Exactly that, yes! Various services like icloud or proton offer "hide-my-email" addresses, or you can use any email service and just leverage a dedicated email aliasing service like SimpleLogin (paid but cheaper).
This way your email addresses are always random, and since these are shared services, the fact that it's random doesn't identify you either. In proton's / simplelogin's case, you can even set the display name used and email first, so from the outside it's not going to appear as strange, or have any real limitations.
If you think about it, modern email services don't really allow for easily testing if an email address is valid or not, so pretty much the only way your email is ever found out is if you share it on. So never share it on. Always share an alias instead. With automated systems, you may even want to rotate it every so often, so that if there's a leak, you can identify not just who leaked, but also roughly when.
Fixed identifiers, like an email address, are terrible, as their lifetime is always significantly longer than whatever context they're being used in for.
(No, this official looking email from my bank is fake since it was sent to Grocery@my.domain …)
Yes! Just get a domain and have every email it go to you. Mine is something like “@super-secure-no-viruses.email”
I guess internet security is not as bad these days. :)
Sometimes with friendly / attempt-at-humorous error messages it’s difficult to tell
Obv an attackers ability to insert a message does imply a breach beyond a DoS. But I am pretty confident that message was not from the IA.
Submitted URL was https://archive.org/.
Is there any link between them and the real attack or are they just unrelated people claiming credit for it?
Update: Subdomain seems to be returning normal responses again now.
https://sourcegraph.com/github.com/polyfillpolyfill/polyfill...
Seems like they self hosted that service
https://archive.org/metadata/naturally_a_girl/metadata
One way or another, there was going to be someone who would take loads of emails with a username attached to it. A bit intrigued by how the hacker compromised the database and got the passwords.
This honestly seems like a bit of a design flaw.
Already there are two new users just for this.
BTW, for the current account details, I changed the password to another random string generated by my password manager, and also deleted the masked email address and generated another one, so going forward this sort of thing isn't that much of an issue for me.
I found this reddit thread from /r/DataHoarder about backing up the internet archive particularly interesting, given the circumstances
1: https://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/
Not that they want to, but I think Wikipedia could fund this using their current donations if they wanted. Hell, I almost wonder if one of the big storage providers would do it for free if they could do it in their staging environment so they get real traffic. It would be less good than real backups, but extra copies are still extra copies even if they're unreliable.
A good portion of the text on Wikipedia relies on Wayback Machine links to remain verifiable. If they lose that, I guess the editors might have to comb every page for information which would need to be either resourced or deleted.
You might be able to back up a significant portion of the unique data in IA if you limited it to text files. I think they probably have the highest information to file size ratio.
It’s also probably the most likely to already be back up, though. Interesting issue; you might also get somewhere by cutting the 50TB up into 10GB torrents (or 100GB or whatever, something reasonable for a consumer hard drive) and maybe adding a script that checks the torrent swarm stats to recommend a torrent to download.
Something where I run it, tell it I want to let it use 600GB, and it hands me torrent files for the least seeded 600GB. Maybe a super basic web UI so people can see how well backed up it is?
Unsure if people would sign on or not; I probably would. I’ve got 10 or so TB of NFS I’m not using I could chuck at it. I would guess there are other data hoarders out there who would do the same, but only if it were somewhat easy. I’m probably not going to volunteer to do an hour of rtorrent cleanup a week to make sure I’m backing up the right things.
This is a great question, and a state of the art kind of thing.
HDDs are sold with a lifetime drive read/write amount and power cycle warranty, along with usually some environmental operating envelope. read/write relates to the quality/space of the platter, power cycle is usually the actuator & read/write head being reseated/wearing out. Environment is the same as all other devices in a DC.
Most folks replace drives when they die (reads/writes stall or return garbage), or when the warranty runs out. Some will pay for a warranty exception, and some will just use the drive outside of warranty. Depending on how you use the drive, what environment it's in, etc changes how much you can push things.
I'd say anywhere from 4-8 years, depending on how it's used. In many cases it can be cheaper to have a worse environment for your fleet (thus using less power on hvac) and replace devices more frequently.
is for sure not true, that would be crazypants
I have no other explanation. At some point, having too many nested loops and variables causes segmentation faults, whereas less complex code functioned without error. I needed to have certain things performed, and it only functioned in the main.
I remember for a long time (I'm talking 20-ish years back here), every hard drive I bought had double or more the capacity of every drive I'd ever bought previously combined. My first ever 40MB (yes, megabyte) drive got upgraded to an 80MB one, that got updated to a 250MB one, then a 750MB, and then a whopping 2GB drive (how would I _ever_ fill that up???) - and so on. That's slowed down some, but I'm currently starting to think about upgrading my 8TB drives (Raid1 pair) with 20TB drives when the prices start to drop a bit more.
Do people really replace their drives when the warranty runs out? Hard drive manufacturers won't provide data recovery on drives that fail under warranty[1]. It makes more economical sense to just run a drive until it dies. You'll end up paying the price for a new drive either way, but less often if you ignore the warranty expiring.
1: I discovered this myself when a Seagate drive containing some important data failed under warranty. If you're foolish enough to send them a failed drive with data you need recovered (like I was), all they'll do is throw it in the bin and send you a replacement drive.
1.71% a year failure rate if you care for the hardware as much as they do.
So the question becomes more like "how long does an average hard drive last while powered down and still reliably be able to power back up and be read?".
I'm fairly sure that is a lot longer than the single digit years that'd be the probably answer to your question.
I wonder if there are useful guidelines for long term storage of powered down hard drives? My gut feel is the major failure modes would be electrolytic capacitor failure, bearings sticking as the lubrication ages, and obseleting of the interfaces. I wonder how hard it'd be to find hardware that'd read my Mac SCSI hard drives from 25 years ago?
Easy… that original Mac is sitting in my basement and it worked like a charm last time it was powered on 4 years ago.
They are cheaper per Gio, and last significantly longer
You'd have to spend a lot more, because with that many drives, you need redundancy now.
I think with that many drives, you'd be losing them constantly, and I suppose you wouldn't know which ones until later (assuming you're doing an offline backup, if you aren't you have to factor in power costs).
hard-drive price: $0.014/GB
B2 price (12*6/1024): $0.070/GB/year
They have their own backups which I think is good enough for now unless someone plans on donating a few hundred million.
From my own personal experience doing distributed archiving with no relation to Archive.org, Filecoin/IPFS's UX isn't quite there yet. They still don't let you serve data to the network from a normal filesystem, you have to let their system ingest all of your stuff so you end up double-storing data or you have to give into everything being stored as inscrutable binary blobs.
That's why I still haven't integrated ArchiveBox with IPFS/Filecoin/Storj, let my data live in a normal filesystem dammit!
I don't understand this part. What data would you have to give them? Why can't it just live next to your stuff on your OS' filesystem?
For Filecoin, if you want fast access, you do need to keep a second hot plaintext copy, as well as the sealed Filecoin copy. But that works for the backup case for IA, because the hot copy would be served from the archive's existing infrastructure (and/or a distributed IPFS hot cache) -- you'd just use Filecoin for the proven safe backup.
The project to back up IA to Filecoin is still ongoing. The IA dashboard that shows the current state is (perhaps predictably) down at the moment, but it crossed the 1PiB line last year[2], and they've been optimising the onboarding flow recently.
[1] https://docs.ipfs.tech/reference/kubo/cli/#ipfs-add
[2] https://blog.archive.org/2023/10/20/celebrating-1-petabyte-o...
(Disclosure: I work at the Filecoin Foundation/Filecoin Foundation for the Decentralized Web, which partners with the Archive on this project, as well as supporting other Internet Archive backup projects.)
I appreciate your effort and I hope the project continues.
I found this, not sure if it's still up-to-date:
◉ PHP's default implementation of bcrypt uses 10 rounds.
◉ Python's bcrypt library uses 12 rounds by default.
◉ Node.js's bcrypt library uses 10 rounds by default.
See also: https://gist.github.com/Chick3nman/32e662a5bb63bc4f51b847bb4...
brypt passwords are very slow to crack.
Bit of a shame the emails contain an ad for a password manager, saying there's two easy steps to become more secure: Step 1: use our password manager (fair enough), "Step 2: Enable 2 factor authentication and store the codes inside your [password manager]" ehh now it's back to 1 factor or am I missing something?
Edit: according to https://www.bleepingcomputer.com/news/security/internet-arch... (via https://news.ycombinator.com/item?id=41793669), Troy Hunt / HIBP already received and verified this "three days ago" as of yesterday 6pm AoE
If you protect your password manager with a yubikey or any other hardware key, then your 2FA inside your password manager is quite secure and convenient. But this is very individual, what your threat model is and how secure you want/need to be.
> even if they got your password, if they don't have access to your password manager they can't login.
Wouldn't the same argument go for a non-2fa password? What's the difference between a randomly generated 2fa secret and a randomly generated password here?
But, doesn't a DB compromise mean that the attacker would have the TOTP seed as well? It can only increase your account security elsewhere, but also not re-using password prevents the IA leak from hurting you elsewhere as well?
Note I'm quoting HIBP's advice from the email they've sent me! I'm absolutely not recommending to store one's 2FA secrets in the same place as the password!
Even if one uses 2FA for the password manager, it stops proving "something you have" in addition to something you know and you're one unlock away from malware vacuuming it all up. The point of 2FA is to be on a separate device you need to have on hand
Of course, the same logic goes for a password manager in the first place, but password reuse is a big enough problem that (for most people's threat model) it seems to be a net positive. 2FA tokens don't have that reuse issue
In fact, the Wayback Machine and the book archives are responding more quickly than they did for me a week ago, when I showed the Archive to the students in an online class I teach. I gave the students a homework assignment that involves accessing some old books at the Archive. That assignment is due in about 12 hours, and I was just getting ready to e-mail the students about the outage when I saw that the site is working again.
What info does archive.org have on people? Is this info scraped from other websites and stored in the archive.org database? Or is this info related to personal archive.org accounts (as I said I don't recall making an account)?
Now I'll have to dig through my IA account and remember if I donated to them directly via credit card (and if they stored it), or if it was through PayPal.
> Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach? It just happened. See 31 million of you on HIBP!
But is this an official message from the company? It sounds odd and unprofessional, especially the "See 31 million of you on HIBP!" part, which jokingly refers to a huge privacy issue for users. Could it also be that the site was hacked, with hackers posting that message in addition to the data breach and DDoS attack?
>>>
Let me share more on the chronology of this:
30 Sep: Someone sends me the breach, but I'm travelling and didn't realise the significance
5 Oct: I get a chance to look at it - whoa!
6 Oct: I get in contact with someone at IA and send the data, advising it's our goal to load within 72 hours
7 Oct: They confirm and I ask for a disclosure notice
8 Oct: I follow up on the disclosure notice and advise we'll load tomorrow
9 Oct: They get defaced and DDoS'd, right as the data is loading into HIBP
The timing on the last point seems to be entirely coincidental. It may also be multiple parties involved and when we're talking breach + defacement + DDoS, it's clearly not just one attack.
<<<
It could also be that the attacker has compromised IA communication channels and timed it for maximum dramatic effect and confusion.
this was coordinated. several archive services hit around the same date. ddosecrets was the first to be hacked, as far as i can tell. span of one week.
here's list of suspects i guess https://en.m.wikipedia.org/wiki/List_of_material_published_b...
IA is an incredibly valuable resource, but let’s not put them on a pedestal.
But I agree, no need to put them on a pedestal. Nobody is perfect.
Wasn't the issue precisely that they removed that limitation and then never added it again?
https://www.bleepingcomputer.com/news/security/internet-arch...
Edit: I had only seen the one post on X in which responsibility for the attack was claimed when I made this comment, but looking at the account further they do make many politically motivated comments.
With this new insight my comment now seems unnecessarily dismissive because it's not completely unreasonable to suspect false flag attacks when political motivations are being broadcast. To be clear I'm not making any assumptions for this specific case one way or the other, but I am acknowledging that the political speech presented by the attackers does add some merit to your suspicion.
Consolation is that I used a randomly generated unique password, tried to reset my credentials and see of any 2FA options but the site is overloaded throwing 504s.
Even if we assume folks are using up-to-date browsers (and many aren't!), a compromised site could deliver payloads to browsers ranging from zero-days to phishing content to browser extension compromises (esp. for crypto wallets etc.), that might be delivered differently to different viewers. We don't want to amplify the spread of an attack, especially to our community!
That's also why the site guidelines (https://news.ycombinator.com/newsguidelines.html) are nowhere near as long as they would be if we tried to include all the important things. Better a shorter list that people can actually read.
I hope that doesn't come across as dismissive—I do see your point!
(I still haven't forgiven Sony for the album on CD I bought with a rootkit on it...)
The bad old days before music companies just gave up and started selling un-DRMd mp3 files, and then Spotify solved THAT problem for them.
Curious to see if they go after archive.is next.
The crazy rise of conspiracism in our society in general, combined with Israel really is doing some nasty stuff (but not controlling everything you don't like), combined with the latent antisemitism in most conspiracism.
And I say this as a strong supporter of and activist on Palestinian rights and liberation. Free Palestine. (But there is no reasonable reason to think Israel is behind an IA hack. Or the fact that your mail came late, or anything else except what they're actually doing which is bad enough. Call your senators and tell them to vote for Bernie's JRD resolutions).
That is. Paying over 100k at the lower end of the range for 3y experience as software engineer
For context someone making less than $105k is classified as "low income" in San Francisco. https://www.sfgate.com/local/article/under-100k-low-income-s...
[0] https://www.hcd.ca.gov/sites/default/files/docs/grants-and-f...
Does this mean you get benefits (like free housing, healthcare, and money to buy food with) if you earn less than 105k/year? Or what does low income threshold mean here
Thanks for clarifying your intent.
They aren't predicting the future, they are reporting on an ongoing event.
This I can very much underwrite. Error bars or rough confidence indicators are missing far too often, also from sites reporting on e.g. benchmark values of hardware they've been testing... such professional organisations yet such basic omissions
- I have a catch all setup to forward all emails to specific user on mail server
- able to setup adhoc email addresses for each online service (ie, iarch@example.com)
- able to claim example.com in haveibeenpwned
Now I get breach emails from hibp for the whole domain. Unfortunately, I was exposed in this IA breach
If you need free, you need free.
But if you can pay, you want to pay a vendor whose scale is such that you mean something to them while still being mature enough to rely on.
This applies to pretty much everything, not just email.
With Google and Apple, you service needs are overhead and with Google in particular, your value is entirely in them being able to monitor as much as they legally can about your activity.
With Fastmail, Protonmail, etc, you are a customer already and they're invested in making you a bigger happy cuatomer in the future. They have staff that will service your support tickets, you represent profit on their books, and the services they offer you are generally designed for your scale more precisely.
[1] https://www.cloudflare.com/en-ca/developer-platform/email-ro...
So far as I can tell, Cloudflare seems to still be in the early stages of enshittification [1], and while I as a business customer am probably going to be taken for a ride later than most customers, I'm also small fry, so I'm guessing at some point in the next 5 years, some of the "for free" features like zero trust / tunnels are going to become prohibitively expensive for me.
[1] https://pluralistic.net/2023/01/21/potemkin-ai/#hey-guys
I assume Cloudflare will enshittify because too much of its services are free or too cheap to make sense, so my guess is they're trying to achieve massive market capture and dependency so they can later start squeezing customers for way more money.
I prefer more transparent cost structures, like what I get through Migadu for example.
I don’t want these massive entities (Google, MS, CF) controlling my data.
The rest of apple's email landscape sucks. It is pretty poor at managing spam, the client is terrible, it doesn't sync rules between the desktop app, icloud email, and iphone.
I hate email in general. It is getting to be 1 in a 100 type scenario of anything of value and likely worse if I knew all the emails that were deleted before I saw them.
The error message was very clear: hide-my-email was not permitted.
I was just trying to check for available service appointments near me and didn’t want the spam. But I guess sending spam is very very important to Toyota.
Cashier: "What's your email?"
Me: "walmart@somedomain.com"
Cashier: "No I meant YOUR email address."
Me: "Yeah walmart@somedomain.com"
Cashier: "Oh do you work for Walmart???"
Me: "No see I set up my email so... oh nevermind, 420BLAZEIT@GMAIL.COM"
I think if you are at the level of catch-alls and your own domain(s) then you tell the cashier "no thanks!"
The advantages are numerous: tracking who leaked my data (many times before the company even noticed it), easier to spot spam (20 years ago spam filters were a lot less sophisticated), minimize credential stuffing (before Pwd Managers became the norm), etc.
I'd be worried if 1) I hadn't seen many versions of similarly creative extortion emails over the years, and 2) if they hadn't use some obvious "donotspamCompanyThatWasHacked@mydomain".
Sadly, I can see how this may trick some people into sending money to scammers.
ON only one occasion in ~20 years, someone refused to do business with me because they thought I was impersonating them and told me I was being disrespectful by using their brand as my email, and even after explaining how it works they weren't happy.
But better than giving them an iCloud “hide my email” generated addy ;)
Fun fact! Troy actually got this database back in Sep. 30th.
We need not one but many internet archives. Just one and we will repeat the outcome of the Library of Alexandria.
"Goodwill and donations" will never be robust against an entire industry that makes profit off of artificial digital scarcity.
Does IA store anything sensitive for any users?p physical addresses, credit cards, etc?
"Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach? It just happened. See 31 million of you on HIBP!"
Maybe they managed to convince some critical service like an SSL cert provider that they were the owners of the subdomain? I don't know still wouldn't explain access to user and password database.
I hope that this event makes some forward-thinking benevolent rich folks step up, or alternative solution.
Is it safe to assume the hacker want to erase the evidence?
Forcing the service offline also means they want to prevent people from archiving evidence in the next how-ever-long hours. Combining with the spoken language they used in that video, are they planning some online disinformation campaign?
----
Edit: some more info about this group: https://old.reddit.com/r/technology/comments/1g0kupb/hacktiv...
----
This group claims to be pro palestinian and it's entirely based on Russia.
[https://therecord.media/middle-east-financial-institution-6-...
>SN\_BLACKMETA has operated its Telegram channel since November 2023, boasting of DDoS incidents and cyberattacks on infrastructure in Israel, the Palestinian Territories and elsewhere. While all of the group’s messages focus on the Palestinian Territories and perceived opponents to Palestine, many of its posts are written in Russian.
>The group’s account on X also shows that it was created by someone in Staraya, a town in Novgorod Oblast, Russia. The account’s initial language was also set to Russian.
>The researchers added that analysis of timestamps and activity patterns showed possible evidence that the actors within the group are operating in a timezone “close to Moscow Standard Time (MSK, UTC+3) or other Middle Eastern or Eastern European time zones (UTC+2 to UTC+4).”
~~Attacks include pro palestine sites and groups, so~~ take that "pro palestine" with a grain of salt.
EDIT: edited for clarity on what is actually in the article and not in outside anonymous sources. If you want to read more, [there's a clearer report on one of their attacks and their usual targets.](https://www.radware.com/security/threat-advisories-and-attac...)
How is someone stupid enough to post this? Warrant for the account's IP is probably already issued. I don't know how many proxies the guy is behind, but it's playing with fire.
Also at some point the account of a malicious hacker has to be banned right?
>Also at some point the account of a malicious hacker has to be banned right?
You can try ask musk about it.
On one hand, I love IA
On the other hand…I’m in a long thread with their support right now on removing old snapshots of a social media account I have. Creeps are actively using the old snapshots to dox me and send me death threats using my PII.
It’s incredibly frustrating and IA keeps insisting they cannot do anything about it.
A small part of me hoped IA didn’t recover from today because I knew my info would be finally deleted :/
You probably can do this, OP.
Sucks to hear you are getting doxxed still
It's a perfectly reasonable opinion to wish for retention of old sources of knowledge without retaining pages containing personal information of non-public people, or sensitive non-newsworthy information about anyone at all.
Not downplaying or excusing; just adding context that IA aren't the only ones and it's difficult to prevent (since the cause can be well outside of the individual's control).
Not that I'd cheer for the loss of IA, but it'd probably be nice if they took down PII on request.
It has less to do with what you say or how you say it, but with who you are.
Ah yes, known arm of the US military-industrial complex, The Internet Archive
In any case, the IA was in some cases the only public host of important documents about Palestinian history, which are currently inaccessible, to say nothing about how important the Wayback Machine has been over the past year.
https://www.google.com/search?client=safari&rls=en&q=zionist...
So just to play devils advocate, since Zionism is being critically received all across the Internet - it is more likely that IA was attacked in order to censor those materials, and then a sockpuppet was created to shift the blame to pro-palestinian voices - which makes no sense, since pro-palestinian voices would want IA to stay up so that embarassing Zionist material was made more available - but such is the nature of agitprop campaigns during war time: through subterfuge and obfuscation, deny your enemy the materials it requires to continue its campaigns, and also deny them the ability to identify the cause of that material going missing, also - or, at the very least, obfuscate the actors responsible for denying it, using sockpuppetry ..
If there is "pro-Palestinian" materials at the IA, I would imagine it being based on materials collected over the past year documenting the genocide, war crimes, and crimes against humanity being committed against them.
There is a definite effort to censor any and all reporting of Israeli crimes against humanity on the Internet - IA was probably a last refuged for those collecting this material.
BTW I'm a non-Zionist and strongly opposed to the occupation, etc. So please don't make any assumptions that I'm a hasbarist coming at you with their usual stuff. The depressingly tragic fact of this conflict is that there are legions of assholes and extremely naive, easily manipulated people on all sides.
what is more likely is that these pro-Palestinian hacktivists are once more engaging in misplaced activism, targeting those they perceive as tied to Israel, regardless of whether those targets have any direct connection. just see the boycott movements... they're boycotting Gal Gadot, McDonalds, and Starbucks
Yes, but they should be.
What do you consider worse? The Genocide of the people of Gaza and the occupation? Or that the Zionism is now a bad word?
That said, this just seems to me like the attackers are trying to come up with some justification after the fact to explain why they would go after something as universally beloved as the Internet Archive. Actual pro-Palestine activists are not happy, eg (strong language): https://x.com/Aldanmarki/status/1844155616199413969
Missed the bus - Russia.
Stubbed my toe - FFS why is it always Russia?
Not excusing it, Russia, China and Iran do make my honeypot's top ten list every month. But then again so do the US, UK and France....
But they are democracies, not some kind of real life Sacha Baron Cohen sketch..
Ah the only conspiracy theory we’re encouraged to believe. Wouldn’t that be convenient. A perpetual enemy far away that’s responsible for all of our failures, infiltrating and puppeteering western democracies on the other side of the world. Even the Russian propaganda machine loves this narrative – it makes them seem powerful and dangerous. Not like a corrupt and broken former empire sending off their young to the meat grinder for a bit of loot and territorial ambitions from a lost era.
IA is one of the go-to examples for that. is it good to make every book ever written freely downloadable (as they were trying with their library project a while back), or is that bad? you and i might think the answer is obvious. we might even agree on it. but we would occupy a rather different world if even a supermajority agreed on that question, in either direction.
We completely agree about the perpetrator. My point was if that is the case, it would implicate that IA enemies were going beyond lawsuits.
A special place in Hell…
Now, it depends what the "it" is referring to here, but so far all I've heard is about an alert() message saying the usernames will be sent to a breach alerting site. If they're doing it just for the heck of it, it's still costing a lot of people a lot of time that they could have spent doing better things, but I'd reserve special places in hell for the people who do plan this out carefully and make malicious demands
Hacking the Internet Archive and only placing an alert with a provocative message, I could see my teenage self do that. My judgment of the character is going to depend on what it turns out they've actually done
Of course, my grown up self (or late teen also, as I've done responsible disclosures back then as well) would rather have seen them do a coordinated vulnerability disclosure, but alas, I just meant to remark upon the "special place in hell" for not having a plan or motive bit
*Edit:* wait, I just saw in the article (I opened the thread before the link was changed) that this quote refers to a DDoS, not the alert() message that the thread was initially about
> the site was experiencing a DDoS attack, posting on Mastodon that “According to their twitter, they’re doing it just to do it.
That's indeed just destructive and not related to (hacker) curiosity...
If there's a call you wouldn't make unless it was free, the infrastructure isn't at capacity, and you're not acting otherwise in a detrimental fashion to other users of the infrastructure-- there's no harm to that organization.
Toying with the system, learning how it works and finding what you can make it do, there's a certain art to it and I'd encourage anyone to at least tinker with the systems they own (and everything else within reason and ethics), but there's two sides to nearly everything
Still awful, but nowhere near as awful as the former.
There are so many other possible targets that would get even positive reactions from people. The only kind of people that might be happy about TIA being down is maybe some big corporations that want to control and sell the information being freely preserved there.
The action is reprehensible either way, but if this is truly just an old-fashioned Anonymous attack with no ulterior motive beyond just being bad that's honestly kind of refreshing.
> Make public data available, protect private data.
Think of it more along the lines of you having a blinding hatred of mosquitos, and then they keep getting sent to you, and at the same time you're a very powerful, capable individual who can deal with hordes of mosquitos in fantastically wicked ways.
They stated on twitter because IA is controlled by "the US" and is "pro Israel".
could also just be RU larping under another flag. They have done this in the past with groups like Anonymous Sudan.
See their Twitter https://x.com/Sn_darkmeta
could also just be RU larping under another flag.
This is... the most obvious false flag I've ever seen
This is why we can't have nice things.
I mean... would it be better if the hackers had asked for money or did it to protest global warming or something?
Perpetrators without motive can not be negotiated with, punishment may not a strong deterrent, rehabilitation is lot harder. Economic crimes or crimes of passion or ones as a result of addiction can have a path to rehabilitation and recidivism can be solved by tackling the underlying issue like poverty, addition etc. Even solving crimes without motive can be harder as there is less assumptions we can make about the perpetrator.
The bracker was a terrorist so we killed the candle stick makers family.
https://en.wikipedia.org/wiki/HaKirya
What’s the permissible distance in a three mile wide strip of land among the most densely populated in the world?
Maybe you should do something about it?
This Twitter account is suspicious and odd. I don't think anyone doing this is stupid enough to actually believe that they're doing it to "help Palestine." Seems like a job by Israel or supporting countries pretending to be supporters of Palestine.