screen.studio is macOS screen recording software that checks for updates every five minutes. Somehow, that alone is NOT the bug described in this post. The /other/ bug described in this blog is: their software also downloaded a 250MB update file every five minutes.
The software developers there consider all of this normal except the actual download, which cost them $8000 in bandwidth fees.
To re-cap: Screen recording software. Checks for updates every five (5) minutes. That's 12 times an hour.
I choose software based on how much I trust the judgement of the developers. Please consider if this feels like reasonable judgement to you.
There are plenty of shitty ISPs out there who would charge $$ per gigabyte after you hit a relatively small monthly cap. Even worse if you're using a mobile hotspot.
I would be mortified if my bug cost someone a few hundred bucks in overages overnight.
How on earth is a screen recording app 250 megabytes
I work with developers in SCA/SBOM and there are countless devs that seem to work by #include 'everything'. You see crap where they include a misspelled package name and then they fix it by including the right package but not removing the wrong one!.
Unless is was absolutely critical the server have as small as a footprint as humanly possible and it was absolutely guaranteed there would never need to be included in the future of course. However, that first constraint is the main one.
> Someone imports a single method from the RIGHT package
and hundreds of megabytes come in for what might be one simple function.
> How on earth is a screen recording app 250 megabytes
How on earth is a screen recording app on a OS where the API to record the screen is built directly into the OS 250 megabytes?
It is extremely irresponsible to assume that your customers have infinite cheap bandwidth. In a previous life I worked with customers with remote sites (think mines or oil rigs in the middle of nowhere) where something like this would have cost them thousands of dollars per hour per computer per site.
Judging by the price of monitor stands, I wouldn't be surprised for Apple to make such assumptions.
I've read on HN that a lot of people have 10Gb Ethernet at home. /s
Any effort to use their brain shall be drastically punished. /s
For me that would also be wrong, if I cannot disable it in the configuration. I do bot want to extend startup time.
It's a whole new world out there.
517M ─┬ Screen Studio.app 100%
517M └─┬ Contents 100%
284M ├─┬ Resources 55%
150M │ ├── app.asar 29%
133M │ └─┬ app.asar.unpacked 26%
117M │ ├─┬ bin 23%
39M │ │ ├── ffmpeg-darwin-arm64 8%
26M │ │ ├── deep-filter-arm64 5%
11M │ │ ├─┬ prod 2%
10.0M │ │ │ └── polyrecorder-prod 2%
11M │ │ ├─┬ beta 2%
10.0M │ │ │ └── polyrecorder-beta 2%
10.0M │ │ ├── hide-icons 2%
9.9M │ │ ├─┬ discovery 2%
8.9M │ │ │ └── polyrecorder 2%
5.6M │ │ └── macos-wallpaper 1%
16M │ └─┬ node_modules 3%
10M │ ├─┬ hide-desktop-icons 2%
10.0M │ │ └─┬ scripts 2%
10.0M │ │ └── HideIcons 2%
5.7M │ └─┬ wallpaper 1%
5.7M │ └─┬ source 1%
5.6M │ └── macos-wallpaper 1%
232M └─┬ Frameworks 45%
231M └─┬ Electron Framework.framework 45%
231M └─┬ Versions 45%
231M └─┬ A 45%
147M ├── Electron Framework 29%
57M ├─┬ Resources 11%
10.0M │ ├── icudtl.dat 2%
5.5M │ └── resources.pak 1%
24M └─┬ Libraries 5%
15M ├── libvk_swiftshader.dylib 3%
6.8M └── libGLESv2.dylib 1%
So yes, it’s insane, but easy to see where the size comes from.
Regardless, that’s absolutely irrelevant to the point that this app’s size is explained by Chromium’s (and thus Electron’s) size.
Also webapps are just great nowadays most OS support install PWA's fairly decently no?
ffs
For example, on Linux, it uses WebKitGTK as the browser engine, which doesn't render the same way Chrome does (which is the web view used on Windows), so multi-platform support is not totally seamless.
Using something like Servo as a lightweight, platform-independent web view seems like the way forward, but it's not ready yet.
I suspect the real reason electron got used here is that ChatGPT/Copilot/whatever has almost no Tauri example code in the training set, so for some developers it effectively doesn't exist.
I imagine if you stick to desktop the situation is less awful but still
Found this a few months ago: https://gifcap.dev/
Screen recording straight from a regular browser window, though it creates GIFs instead of video files. Links to a git repo so you can set it up locally.
I would say no, and some are actively moving away from PWA support even if they had it before.
Plus, electron et al let you hook into native system APIs whereas a PWA cannot, AFAIK.
Yes, even in metropolitan areas in developed countries in 2025.
1.5megabits/s is the still common, but Starlink is taking over.
Apparently such service is still somehow available; I found https://www.dialup4less.com with a web search. Sounds more like a novelty at this point. But "real" internet service still just doesn't work as well as it's supposed to in some places.
In point of fact, I can fairly reliably download at that rate (for example I can usually watch streaming 1080p video with only occasional interruptions). The best case has been over 20Mbit/s. (This might also be partly due to my wifi; even with a "high gain" dongle I suspect the building construction, physical location of computer vs router etc. causes issues.)
They are building features right now. There are a lot of bugs which Microsoft will never fix, or it fixes them after years. (Double click registered on mouse single clicks, clicking "x" to close the window, closes also the window underneat, GUI elements rendered as black due to monitor not recognized etc).
Those packets consume bandwidth and device utilization, too but this is flat fee, whereas log traffic is measured per GB so we investigated where an unexpected growth came from.
A while ago I did some rough calculations with numbers Microsoft used to brag about their telemetry, and it came out to around 10+ datapoints collected per minute. But probably sent in a lower frequency.
I also remember them bragging about how many million seconds Windows 10 users used Edge and how many pictures they viewed in the Photo app. I regret not having saved that article back then as it seems they realized how bad that looks and deleted it.
Even if it is made to CIA/GRU/chinese state security ? /s
Plenty of things (like playstation's telemetry endpoint, for one of many examples) just continually phones home if it can't connect.
The few hours a month of playstation uptime shows 20K dns lookups for the telemetry domain alone.
The server can return an override backoff so the server can tell the client how often or how quickly to retry.
It’s nice to have in case some bug causes increased load somewhere, you can flip a value on the server and relieve pressure from the system.
> Add special signals you can change on your server, which the app will understand, such as a forced update that will install without asking the user.
I don't like that part neither.
Turns out Adobe's update service on Windows reads(and I guess also writes) about 130MB of data from disk every few seconds. My disk was 90%+ full, so the usual slowdown related to this was occurring, slowing disk I/O to around 80MB/s.
Disabled the service and the issues disappeared. I bought a new laptop since, but the whole thing struck me as such an unnecessary thing to do.
I mean, why was that service reading/writing so much?
"We will stop filling your drives with unwanted windows 14 update files to you once you agree the windows 12 and 13 eulas and promise to never ever disconnect from the internet again."
So yes it should only be once a day (and staggered), but on the other hand it's a pretty low-priority issue in the grand scheme of things.
Much more importantly, it should ask before downloading rather than auto-download. Automatic downloads are the bane of video calls...
In this case, that means an update should have been sent by some kind of web socket or other notification technology.
Today no OS or software that I'm aware of does that.
Keeping a TCP socket open is not free and not really desirable.
Your app can also be ready to receive notifications even when the app isn't running - using zero RAM. Inetd on Linux allows similar stuff (although no ability to handle ip changes or traverse NAT makes it fairly useless in the consumer world).
This stuff is important because polling dominates power use when idle - especially network polling which generally requires hundreds of milliseconds of system awakeness to handle tens of network packet arrivals simply for a basic http request.
Did you know, a typical android phone, if all polling is disabled, has a battery life of 45 days?
It's actually required by the qualification process for lots of carriers. The built in apps have pretty much no polling for this reason.
During the qualification test, it's actually connected to both LTE and WiFi, but not actually transferring any data.
They cheat a little - the phone is not signed into a Google account, which makes pretty much all Google apps go idle.
Just poll every launch or 24 hours and move on.
The tone might be somewhat charged, but this seems like a fair criticism. I can’t imagine many pieces of software that would need to check for updates quite that often. Once a day seems more than enough, outside of the possibility of some critical, all consuming RCE. Or maybe once an hour, if you want to be on the safe side.
I think a lot of people are upset with software that they run on their machines doing things that aren’t sensible.
For example, if I wrote a program that allows you to pick files to process (maybe some front end for ffmpeg or something like that) and decided to keep an index of your entire file system and rebuild it frequently just to add faster search functionality, many people would find that to be wasteful both in regards to CPU, RAM and I/O, alongside privacy/security, although others might not care or even know why their system is suddenly slow.
Why not just follow every Mac app under the sun and prompt if there's an update when the app is launched and download only if the user accepts?
No, it doesn't mean that.
Auto updater introduced series of bad outcomes.
- Downloading update without consent, causing traffic for client.
- Not only that, the download keeps repeating itself every 5 minutes? You did at least detect whether user is on metered connection, right... ?
- A bug where update popup interrupts flow
- A popup is a bad thing on itself you do to your users. I think it is OK when closing the app and let the rest be done in background.
- Some people actually pay attention to outgoing connections apps make and even a simple update check every 5 minutes is excessive. Why even do it while app is running? Do on startup and ask on close. Again some complexity: Assume you're not on network, do it in background and don't bother retrying much.
- Additional complexity for app that caused all of the above. And it came with a price tag to developer.
Wouldn't app store be perfect way to handle updates in this case to offload the complexity there?
Thinking of it, the discussed do-it-yourself update checking is so stupid that malice and/or other serious bugs should be assumed.
Going back to the blog post and re-reading it with this possibility in mind is quite a trip.
> It turns out thousands of our users had the app running in the background, even though they were not using it or checking it for weeks (!). It meant thousands of users had auto-updater constantly running and downloading the new version file (250MB) over and over again every 5 minutes
This could easily have been data exfiltration from client computers instead, and few (besides the guy whose internet contract got cancelled for heavy traffic) would have noticed.
Screen Studio has 32k followers, lets say 6% are end users, 2000 users at $229, that is $137k in App Store fees.
I am going to say writing your own app update script is a wash time wise, as getting your app published is not trivial, especially for an app that requires as many permissions as screen studio.
If you’re a small shop or solo dev, it is real hard to justify going native on three platforms when electron gives it for (near) free. And outside of HN, no one seems to blink at a 250MB bundle.
There are alternatives like Tauri that use the system browser and allow substantially smaller bundles, but they’re not nearly as mature as Electron, and you will get cross platform UI bugs (some of which vary by user’s OS version!) from the lack of standardization.
I’d actually seen this project before because the author did a nice write up on using React portal to portal into electron windows[1], which is something I decided to do in my app.
I’d just assumed his was a cross platform project.
1: https://pietrasiak.com/creating-multi-window-electron-apps-u...
Please, many people connect to the internet via a mobile phone hotspot, at least occasionally.
This bug would likely cause you to go through your entire monthly data in a few hours or less.
You should probably not roll your own auto-updater.
If you do, checking every 5 minutes for updates is waaaay too often (and likely hurts battery life by triggering the radio).
And triggering a download without a user-prompt also feels hostile to me.
The app size compounds the problem here, but the core issue is bad choices around auto-updating
Except like 1 or maybe 2 billion people with slow or expensive internet.
I can remember when I would have to leave a 250MB download running overnight.
Before that, I can remember when it would have filled my primary hard drive more than six times over.
... Why can't the app-specific code just get plugged into a common, reusable Electron client?
Tauri is an alternative framework that uses whatever web view the OS provides, saving ~200mb bundle size. On Mac that’s a (likely outdated) version of Safari. On Windows it’ll be Edge. Not sure what Linux uses, I’d guess it varies by distro.
The promise of Electron (and it’s an amazing value prop) is that your HTML/JS UI will always look and work the same as in your dev environment, no matter what OS the host is running.
I don’t have the time or inclination to test my app on the most recent 3 releases of the most popular operating systems every time I change something in the view layer. With Electron, I trade bundle size for not having to do so.
I do think alternatives like Tauri are compelling for simple apps with limited UI, or where a few UI glitches are acceptable (e.g. an internal app). Or for teams that can support the QA burden.
FWIW the transitive dependencies of the nixOS ffmpeg add up to 764MB, but dynamically linking is always much larger than statically linking, and that calculation will include more than just the shared-libraries.
Also note that he app includes an ffmpeg that is 39MB uncompressed.
1: https://johnvansickle.com/ffmpeg/ (based on the arm64 build, since TFA is an arm64 app).
That was a thing I thought was missing from this writeup. Ideally you only roll up the update to a small percent of users. You then check to see if anything broke (no idea how long to wait, 1 day?). Then you increase the percent a little more (say, 1% to 5%) and wait a day again and check. Finally you update everyone (who has updates on)
But then the HN crowd would complain "why use an app store? that's gate keeping, apple could remove your app any day, just give me a download link, and so on..."
You literally can't win.
Once a day would surely be sufficient.
Data centers are big and scary, no body wanted to run their own. The hypothetical cost savings of firing half the IT department was too good to pass up.
AWS even offered some credits to get started, first hit's free.
Next thing you know your AWS spend is out if control. It just keeps growing and growing and growing. Instead of writing better software, which might slow down development, just spend more money.
Ultimately in most cases it's cheaper in the short term to give AWS more money.
Apart of me wants to do a 5$ VPS challenge. How many users can you serve with 5$ per month. Maybe you actually need to understand what your server is doing ?
I'm talking non sense, I know.
But, on the AWS marketplace I can click a button, a line item is added to our bill, and infosec are happy because it’s got the AWS checkmark beside it. Doesn’t matter what it costs, as long it goes through the catalog.
That’s why big companies use AWS.
At my last job, I worked for a vc backed startup. I reached out to our fund, and they put us in touch with AWS, who gave us $100k in credits after a courtesy phone call.
That’s why startups use AWS
Except this is unironically a great value proposition.
OpenStack has been around 15 years powering this idea at scale for huge organizatons, including Wal-Mart, Verizon, Blizzard and more.
Don't forget the Java + Kafka consultants telling you to deploy your complicated "micro-service" to AWS and you ending up spending tens of millions on their "enterprise optimized compliant best practice™" solution which you end up needing to raise money every 6 months instead of saving costs as you scale up.
Instead, you spin up more VMs and pods to "solve" the scaling issue, which you lose even more money.
It is a perpetual scam.
Correction--many have years of inexperience. Plenty of people that do things like this have "7 years designing cloud-native APIs".
Weekly or monthly would be sufficient. I'd also like "able to be disabled manually, permanently" as an option, too.
There are only a few applications with exposed attack surface (i.e. accept incoming requests from the network) and a large enough install base to cause "massive damage all of the Internet". A desktop screen recorder app has no business being constructed in a manner that's "wormable", nor an install base that would result in significant replication.
The software that we need the "average user" to update is stuff like operating systems. OS "manufacturers" have that mostly covered for desktop OS's now.
Microsoft, even though their Customers were hit with the "SQL Slammer" worm, doesn't force automatic updates for the SQL Server. Likewise, they restrict forcing updates only to mainstream desktop OS SKUs. Their server, embedded, and "Enterprise" OS SKUs can be configured to never update.
Well they might need to rush out a fix to a bug that could be harmful for the user if they don't get it faster.
For example, a bug that causes them to download 250MB every 5 minutes.
Good way of showing adoption and growth.
Nobody under any circumstances needs usage stats with 5 minute resolution. And certainly not a screen recorder.
Websites get this data pretty much by default and they don't need consent for it.
Screen Studio can collect basic usage data to help us improve the app, but you can opt out of it during the first launch. You can also opt out at any time in the app settings.
Their users do not care about their screen recording studio anywhere near as much as the devs who wrote it do.
Once a month is probably plenty.
Personally, I disable auto-update on everything wherever possible, because the likelihood of annoying changes is much greater than welcome changes for almost all software I use, in my experience.
1) Emergency update for remote exploit fixes only
2) Regular updates
The emergency update can show a popup, but only once. It should explain the security risk. But allow user to decline, as you should never interrupt work in progress. After decline leave an always visible small warning banner in the app until approved.
The regular update should never popup, only show a very mild update reminder that is NOT always visible, instead behind a menu that is frequently used. Do not show notification badges, they frustrate people with inbox type 0 condition.
This is the most user friendly way of suggesting manual updates.
You have to understand, if user has 30 pieces of software, they have to update every day of the month. That is not a good overall user experience.
That's not an user issue tho, it's a "packaging and distribution of updates" issue which coincidentally has been solved for other OS:es using a package manager.
If the update interval had been 1 day+, they probably wouldn't have noticed after one month when they had a 5 minute update check interval.
How the times have changed ..
The "send and receive" button is seared into my brain
I was in Spain at the time, and at first you had to connect to the Internet through a phone number in France.
Did you guys have something like that?
However on BBS days was much worse, it was mostly long distace calls to someone around the country, and they usually only had a couple of connections available like five or so.
Ah another thing is that they adopted the same model as mobile phones, so at least we could pre-pay the calls, and went we run out of cash there was it, no surprise bills, even if frustated.
It is sort of fun (for $8,000) as it was “just” a screenshotter, but imagine this with bank app or any other heavily installed app.
All cloud providers should have alerts for excessive use of network by default. And they should ask developers if they really want to turn alerts off.
I remember Mapbox app that cost much more, just because provider did charge by months… and it was a great dispute who’s fault it was…
This could have easily been avoided by prompting the user for an update, not silently downloading it in the background... over and over.
The number of times I have caught junior or even experienced devs writing potential PII leaks is absolutely wild. It's just crazy easy in most systems to open yourself up to potential legal issues.
The context it makes the most sense is accepting code from strangers in a low trust environment.
The alternative to trying to prevent mistakes is making it easy to find and correct them. Run CI on code after it’s been merged and send out emails if it’s failed. At the end of a day produce a summary of changes and review them asynchronously. Use QA, test environments, etc.
This feels like a strange sense of priorities which would be satirised in a New Yorker/Far Side single-panel comic: “Sure, my mistake brought down the business and killed a dozen people, but I’m not sure you appreciate how fast I did it”.
Code should be correct and efficient. Monkeys banging their heads against a keyboard may produce code fast, but it will be brittle and you’ll have to pay the cost for it later. Of course, too many people view “later” as “when I’m no longer here and it’s no longer my problem”, which is why most of the world’s software feels like it’s held together with spit.
Thanks for taking my experience and comment seriously and challenging your preconceptions.
> Code should be correct and efficient.
When it ships to customers. The goal is to find the bugs before then. Having a stable branch can be accomplished in many ways besides gating each merge with a review.
Do you have any studies to show how effective synchronous code review is in preventing mistakes? If they are such a good idea why not do 2 or 3?
I apologise if my comment read as mean. I wanted to make the joke and it may have overshadowed the point.
> Do you have any studies to show how effective synchronous code review is in preventing mistakes?
I could’ve been clearer. I’m not advocating for code reviews, I’m advocating for not placing “velocity” so high on the list of priorities.
> If they are such a good idea why not do 2 or 3?
This argument doesn‘t really make sense, though. You’ve probably heard the expression “measure twice, cut once”—you don’t keep measuring over and over, you do it just enough to ensure it’s right.
Well my comment is against synchronous code reviews. So we are not in disagreement.
> you do it just enough to ensure it’s right.
I agree. Each layer of review etc is a cost and has benefits. You want to pick an appropriate level.
I'm honestly curious what you're experience level is? I've never known a developer with more than a couple years of experience valuing velocity over financial stability
The purpose of such a review is a deliberate bottleneck in the earlier stage of development to stop it becoming a much larger bottleneck further down the line. Blocking one PR is a lot cheaper than blocking an entire release, and having a human in the loop there can ensure the change is in alignment in terms of architecture and engineering practices.
CI/CD isn’t the only way to do it but shifting left is generally beneficial even with the most archaic processes.
You’re taking a more extreme position than the one I’m stating. You can review every day or every hour if you want.
> a deliberate bottleneck in the earlier stage
Wouldn’t it be better if we could catch bugs AND avoid the bottleneck? That’s the vision. Good intentions may disagree about how to accomplish that.
Like it or not you still have to stop what you’re doing to identify a bug and then fix it, which takes time away from planned feature work. You’re not optimising anything, you’re just adding fragility to the process.
As I said before, an issue localised to a PR in review blocks one person. An issue that has spread to staging or prod blocks the entire team.
Yes, they kill your velocity. However, the velocity of a team can be massively increased by shipping small things a lot more often.
Stable branches that sit around for weeks are the real velocity killer, and make things a lot more risky on deployment.
This is the same point three times, and I don't agree with it. This is like saying tests kill velocity, there's nothing high velocity about introducing bugs to your code base.
Everything introduces context switching, there's nothing special about code reviews that makes it worse than answering emails, but I'm not going to ignore an important email because of "context switching."
Everyone makes mistakes, code reviews are a way to catch those. They can also spread out the knowledge of the code base to multiple people. This is really important at small companies.
CI is great, but I have yet to see a good CI tool that catches the things I do.
No it isn’t. Fake work, synchronization, and context switching are all separate problems.
> code reviews are a way to catch those
I said you can do reviews - but there is no reason to stop work to do them.
Why not require two or three reviews if they are so helpful at finding mistakes?
I agree everyone makes mistakes - that’s why I would design a process around fixing mistakes, not screening for perfection.
How many times have you gone back to address review comments and introduced a regression because you no longer have the context in your head?
Places do? a lot of opensource projects have the concept of dual reviews, and a lot of code bases have CODEOWNERS to ensure the people with the context review the code, so you could have 5-10 reviewers if you do a large PR
For secure software, e.g. ASIL-D, you will absolutely have a minimum 2 reviewers. And that’s just for the development branch. Merging to a release branch requires additional sign offs from the release manager, safety manager, and QA.
By design the process slows down “velocity”, but it definitely increases code quality and reduces bugs.
Context switching is a problem because it...kills velocity. Fake work is a problem because it kills velocity. You're saying it's time that could be better spent elsewhere, but trying to make it sound wider. I disagree with the premise.
Synchronization is a new word, unrelated to what you originally wrote.
> How many times have you gone back to address review comments and introduced a regression because you no longer have the context in your head?
Never? I am not unable to code in a branch after a few days away from it. If I were, I would want reviews for sure! Maybe you have had reviews where people are suggesting large, unnecessary structural changes, which I agree would be a waste of time. We're just looking for bug fixes and acceptably readable code. I wouldn't want reviewers opining on a new architecture they read about that morning.
I believe you can figure it out.
> Never?
Ok well I’m trying to talk to people who have that problem. Because I and my team do.
Diminishing returns, of course. I have worked places where two reviews were required and it was not especially more burdensome than one, though.
I catch so many major errors in code review ~every day that it's bizarre to me that someone is advocating for zero code review.
The website makes it seem like it's a one person shop.
If you're not confident you can review a piece of code you wrote and spot a potentially disastrous bug like the one in OP, write more tests.
At some scale such careless mistakes are going to create real effects for all users of internet through congestion as well.
If this was not a $8000 mistake but was somehow covered by a free tier or other plan from Google Cloud, would they still have considered it a serious bug and fixed it as promptly?
How many such poor designs are out there generating traffic and draining common resources.
Just amazed. Yea ‘write code carefully’ as if suggesting that’ll fix it is a rookie mistake.
So so frustrating when developers treat user machines like their test bed!
After I shipped a bug the Director of Engineering told me I should "test better" (by clicking around the app). This was about 1 step away from "just don't write bugs" IMO.
TBH, that was well done for what it was but really called for automation and lacked unit-testing.
I wish I could teach everything I learned the hard way at that job
Avoidable, unfortunate, but the cost of slowing down development progress e.g. 10% is much higher.
But agree that senior gatekeepers should know by heart some places where review needs to be extra careful. Like security pitfalls, exponential fallback of error handling, and yeah, probably this.
I doubt there’s a CEO. Despite the use of “we”, pretty sure this is one guy building the app. All the copyright notices and social media go back to one person.
> But agree that senior gatekeepers should know by heart some places where review needs to be extra careful. Like security pitfalls, exponential fallback of error handling, and yeah, probably this.
The lesson here is much better use of automated tests (The app likely has no tests at all) and proper use of basic testing principles like TDD would prevent such junior-level embarrassing bugs creeping up in production paid software.
That is the difference between a $100 problem vs a $200M problem.
See the case of Knight Capital [0] who lost $460M, due to a horrific deploy.
[0] https://www.henricodolfing.com/2019/06/project-failure-case-...
Although, after such a fuck up, I would be tempted to make a pre-release check that tests the compiled binary, not any unit test or whatever. Use LD_PRELOAD to hook the system timing functions(a quick google shows that libfaketime[0] exists, but I've never used it), launch the real program and speed up time to make sure it doesn't try to download more than once.
Then it's a unit test that looks too obvious to exist until you read the ticket mentioned in the comment above it
No need for monkey patching or hooking or preload
But before that you add a couple checkmarks to the manual pre-release test list: "1 hour soak test" and "check network transfer meters before and after, expect under 50 MB used in 1 hour (see bug #6969)"
In Linux they're under /sys/class/net I think
From TFA: "Write your auto-updater code very carefully. Actually, write any code that has the potential to generate costs carefully." So the focus is on code that "generate[s] costs". I think this is a common delusion programmers have; that some code is inherently unrelated to security (or cost), so they can get lazy with it. I see it like gun safety. You have to always treat a gun like it's loaded, not because it always is (although sometimes it may be loaded when you don't expect it), but because it teaches you to always be careful, so you don't absent-mindedly fall back into bad habits when you handle a loaded one.
Telling people to write code carefully sounds simplistic but I believe for some people it's genuinely the right advice.
I don't get the impression they did any testing at all.
We used Sparkle, https://sparkle-project.org/, to do our updates. IMO, it was a poor choice to "roll their own" updater.
Our application was very complicated and shipped with Mono... And it was only about ~10MB. The Windows version of our application was ~2MB and included both 32-bit and 64-bit binaries. WTF are they doing shipping a 250MB screen recorder?
So, IMO, they didn't learn their lesson. The whole article makes them look foolish.
250 MB is just the download DMG, the app itself is almost 550 MB. It’s an Electron app.
Who would be foolish enough to download that?
Electron.
> So, IMO, they didn't learn their lesson. The whole article makes them look foolish.
The lesson is to do better testing and write automated tests and don't roll your own updater.
It's just tricky, basically one fat edge case, and a critical part of your recovery plan in case of serious bugs in your app.
(This bug isn't the only problem with their home-grown updater. Checking every 5 min is just insane. Kinda tells me they aren't thinking much about it.)
Especially for a Mac-only application where Sparkle (https://sparkle-project.org/) has been around for almost two decades now and has been widely used across all sorts of projects to the point that it's a de facto standard. I'd be willing to bet that almost every single Mac "power user" on the planet has at least one application using Sparkle installed and most have a few.
You can use whatever you want outside of the App Store - most will use Sparkle to handle updates https://sparkle-project.org/. I presume Windows is similar.
The fact that that is what the system package manager is is why I said Apple "knows best". You can pick from dozens of system packages managers hooked up to hundreds, if not thousands of different repos on Linux.
If the file contains invalid JS (syntax error, or too new features for IE on Win7/8), or if it's >1MB (Chromium-based browsers & Electron limit), and the file is configured system-wide, then EVERY APP which uses wininet starts flooding the server with the requests over and over almost in an endless loop (missing/short error caching).
Over the years, this resulted in DDoSing my own server and blackholing its IP on BGP level (happened 10+ times), and after switching to public IPFS gateways to serve the files, Pinata IPFS gateway has blocked entire country, on IPFS.io gateway the files were in top #2 requests for weeks (impacting operational budget of the gateway).
All of the above happens with tight per-IP per-minute request limits and other measures to conserve the bandwidth. It's used by 500 000+ users daily. My web server is a $20/mo VPS with unmetered traffic, and thanks to this, I was never in the situation as the OP :)
The author seemed to enjoy calculating the massive bandwidth numbers, but didn’t stop to question whether 5 minutes was a totally ridiculous.
Good on them. Most companies would cap their responsibility at a refund of their own service's fees, which is understandable as you can't really predict costs incurred by those using your service, but this is going above and beyond and it's great to see.
I understand the reasoning, but that makes it feel a bit too close to a C&C server for my liking. If the update server ever gets compromised, I imagine this could increase the damage done drastically.
On one hand it's good that the author owns up to it, and they worked with their users to provide remedies. But so many things aren't adding up. Why does your screen recorder need to check for updates every 5 minutes? Once a day is more than enough.
This screams "We don't do QA, we shorts just ship"
And pay Apple their 30% cut on your revenue? No thanks.
> the other one being not spending thousands in accidental data transfer when you do auto updates wrong.
Or just actually write proper automated tests for basic features first, before a large refactor to prevent introducing issues like this from happening again?
While I respect the author's honesty in this mistake, the main takeaway here is not mentioned and that is just writing proper automated tests as their impression on this post is that there aren't any.
It was good enough for netflix etc.
*I* don't want applications to be able to update itself. Look at malware zoom for example.
It's funny that people don't like telemetry, but at the same time they're ok with regular software update checks + installs.
What's really scary here is the lack of consent. If I want to record videos I don't necessarily have an extra 250mb to spend( many users effectively pay by the gig) everytime the developer feels like updating.
So why not ?
> One of our users, who lived in a house, had their internet provider cancel their contract due to enormous traffic generated during a month. It was extremely problematic as there was no other internet provider available around.
so, ¯\_(ツ)_/¯
The User had to pinky promise not to do it again, the service was ultimately restored.
Aside from the 8k bill nothing happened to them.
This article feels more like a self name and shame. I wouldn't trust these people to run code on my computer
This is still bad. I was really hoping the bug would have been something like "I put a 5 minute check in for devs to be able to wait and check and test a periodic update check, and forgot to revert it". That's what I expected, really.
That way I guess you get the caching of the DNS network for free, it uses basically one packet each way, encryption is still possible, and it can reduce the traffic greatly if a big org is running a thousand instances on the same network
I think it was written in Go. Might have been Syncthing
Seriously this alone makes me question everything about this app.
https://en.m.wikipedia.org/wiki/Knight_Capital_Group#2012_st...
440m usd
The url specifically asks Wikipedia to serve the mobile site.
The title should have been: "how a single line of code cost our users probably more than $8000"
> Write your auto-updater code very carefully.
You have to be soooo careful with this stuff. Especially because your auto-updater code can brick your auto-updater.
It looks like they didn't do any testing of their auto update code at all, otherwise they would have caught it immediately.
For those interested in this topic, and how other industries (e.g. Airline industry) deal with learning from or preventing failure: Sidney Dekker is the authority in this domain. Things like Restorative Just Culture, or Field guide to understanding human error could one day apply to our industry as well: https://sidneydekker.com/books.
I'll stick with open source. It may not be perfect, but at least I can improve it when it's doing something silly like checking for updates every 5 minutes.
The relevance is that instead of checking for a change every 5 minutes, the delay wasn't working at all, so the check ran as fast as possible in a tight loop. This was between a server and a blob storage account, so there was no network bottleneck to slow things down either.
It turns out that if you read a few megabytes 1,000 times per second all day, every day, those fractions of a cent per request are going to add up!
Novel dark pattern: You unchecked "Let us collect user data" but left "Automatically Update" checked... gotcha bitch!
What might be fun is figuring out all the ways this bug could have been avoided.
Another way to avoid this problem would have been using a form of “content addressable storage”. For those who are new, this is just a fancy way of saying make sure to store/distribute the hash (ex. Sha256) of what you’re distributing and store it on disk in a way that content can be effectively deduplicated by name.
It’s probably not so easy as to make it a rule, but most of the time, an update download should probably do this
The most obvious one is setting up billing alerts.
Past a certain level of complexity, you're often better off focusing on mitigation that trying to avoid every instance of a certain kind of error.
This is back in the Rails days, before they switch to Scala.
I heard that there was a fail-whale no one could solve related to Twitter's identity service. IIRC, it was called "Gizmoduck."
The engineer who built it had left.
They brought him in for half a day of work to solve the P0.
*Supposedly*, he got paid ~50K for that day of work.
Simultaneously outrageous but also reasonable if you've seen the inside of big tech. The ROI is worth it.
That is all.
Disclaimer: don't know if it's true, but the story is cool.
If just some JavaScript files change, you don't need to redownload the entire Chromium blob.
Electron really messed up a few things in this world
Yes, a single line of code is in the stack trace every time a bug happens. Why does every headline have to push this clickbait?
All errors occur at a single line in the program - and every single line is interconnected to the rest of the program, so it's an irrelevant statement.
I think that is the essence of what is wrong with the cloud costs. Defaulting to possibility for everyone to scale rapidly while in reality 99% have quite predictable costs month over month.
You want to spread out update rollouts in case of a catastrophic problem. The absolute minimum should be once a day at a random time of day, preferably roll out updates over multiple days.
In the grand scheme of things, $8k is not much money for a business, right? Like we can be pretty sure nobody at Google said “a-ha, if we don’t notify the users, we will be able sneak $8k out of their wallets at a time.” I think it is more likely that they don’t really care that much about this market, other than generally creating an environment where their products are well known.
Curious where the high-water mark is across all HNers (:
Our team had a bug that cost us about $120k over a week.
Another bug running on a large system had an unmeasurable cost. (Could $K, could be $M)
Databricks is happy to have us as a customer.
Seems like a great idea, surely nothing can go wrong with that which will lead to another blog post in the near future
Looking at the summary section, I'm not convinced these guys learned the right lesson yet.
Nothing has been learned in this post and it has costed him $8,000 because of inadequate testing.
It's best to save everyone by writing tests that prevent a $100 issue on your machine from becoming a costly $10M+ problem in production as the product scales after it has launched.
This won't be the last time and this is what 'vibe coding' doesn't consider and it it will introduce more issues like this.
Well, you should hire contractor to set console for you.
"Designed for MacOS", aah don't worry, you will have the money from apes back in the no time. :)
I’m sorry but it’s exactly cases like these that should be covered by some kind of test, especially When diving into a refactor. Admittedly it’s nice to hear people share their mistakes and horror stories, I would get some stick for this at work.
A giant ship’s engine failed. The ship’s owners tried one ‘professional’ after another but none of them could figure out how to fix the broken engine.
Then they brought in a man who had been fixing ships since he was young. He carried a large bag of tools with him and when he arrived immediately went to work. He inspected the engine very carefully, top to bottom.
Two of the ship’s owners were there watching this man, hoping he would know what to do. After looking things over, the old man reached into his bag and pulled out a small hammer. He gently tapped something. Instantly, the engine lurched into life. He carefully put his hammer away and the engine was fixed!!!
A week later, the owners received an invoice from the old man for $10,000.
What?! the owners exclaimed. “He hardly did anything..!!!”.
So they wrote to the man; “Please send us an itemised invoice.”
The man sent an invoice that read:
Tapping with a hammer………………….. $2.00
Knowing where to tap…………………….. $9,998.00
Ummm no. Even after this they haven't learned. Auto update check on app load and prompt user for download/update.
$229 per year on a closed source product and this is the level of quality you can expect.
You can have all the respect for users in the world, but if you write downright hazardous code then you're only doing them a disservice. What happened to all the metered internet plans you blasted for 3 months? Are you going to make those users whole?
Learning from and owning your mistake is great and all, but you shouldn't be proud or gloating about this in any way, shape, or form. It is a very awkward and disrespectful flex on your customers.
Good thing, this was not shopify/Duolingo/Msft, else the news would be, how AI saved us $8k by fixing a dangerous code and why AI will improve software quality.