https://www.federalregister.gov/documents/2023/06/09/2023-12...
On the surface a product like managed git repos would seem to be relatively straightforward to deal with, but these same regulated firms are also under tremendous scrutiny for access management, change management, SDLC, etc etc. They also have a huge software footprint which just increases the impact.
Self-hosting is obviously the traditional answer, but that's not exactly a simple one either.
Just an interesting problem.
Of all the many dependencies on cloud services, git is by far the last I’d worry overly much about.
In particular, I've moved a CI for a large repository between different CI systems. It was anything but trivial: you want to believe "it's just a YAML that runs commands, right? Translate the format, right?" but it's really not; differences between how CI systems map commands to machines, external integrations (e.g., in this case, into Action's artifacts system, or output system) etc. all make it more complicated.
But GitHub Actions are somewhat portable: there’s the standalone act [0] runner, and the Forgejo/Gitea Actions (e.g. on Codeberg [1]) that use act under the hood and are pretty much drop-in replacement – they even use GitHub-hosted actions transparently. It might not be a 100% compatible standard, but it’s pretty nice. It would be nice for others to follow lead!
I think unless you’ve been burned by having to move CI provider before it’s easy to lean in. I had to change from Travis many years ago because of pricing changes.
Anything you do in CI should be possible outside of CI, at least by some subset of users.
I don't think its malice. I just think its pretty uncommon for anyone to intentionally back out of a structural tech decision so it gets forgotten about and remains un-battle tested.. That or the timeline is longer than SaaS has been around.
GitHub actions is even worse, it seems like it was designed from the ground up to create lock in.
Nix helps a bit on the bootstrapping and dependency management problem, but won't save you from writing a script that is too tightly coupled to its runtime environment.
This is obviously more difficult in the Github actions ecosystem, but I have mostly used Gitlab CI so far. My CI pipelines mostly look like this:
image: ubuntu:24.04
before_script:
- apt-get install ...
script:
- ./ci/build-project.sh
after_script:
- ./ci/upload-build-artifacts.sh
And it doesn't protect you from a "forced exit" either. Github could terminate your contract, and change the terms of the license in a way that you found unacceptable, or even go out of business, and being self hosted would leave you in no better position than if you had used cloud with external backups. You can somewhat mitigate this risk by self hosting an open source solution, so that in the worst case scenario, you can fork the project and maintain it yourself, but there is still risk that the project could be abandoned, or possibly have the license changed in future versions.
To be clear, I'm not saying that you shouldn't self host and SaaS is always better. But it isn't a magic bullet that solves these problems.
Of course we have a small and mostly unchanging number of users, don't have to deal with DDoS attacks, and can schedule the fairly-infrequent updates during maintenance windows that are convenient for us (since we don't need 100% availability outside of US working hours).
I don't have the metrics in front of me, but I would say we've easily exceeded github.com's uptime in the last 12 months.
If that’s really the case, run another GitHub instance then. Not all tens of thousands of users need access to the same codebases. In the kind of environment described someone would want identity boundaries established around each project anyway…
on GHES you can use https://github.com/actions/actions-sync/ to pull the actions you want down to your local GHES instance, turn off the ability to automatically use actions from github.com via GitHub Connect, and use the list of actions you sync locally as your whitelist.
My employer did this for years. It worked very well. Once a day, pull each action that we had whitelisted into GHES and the runners would use those instead of the actions on github.com.
Hm not really. I manage the GHES instance at my employer and we have 15k active users. We haven't needed to scale horizontally, yet.
GHES is amazingly reliable. Every outage we have ever had has been self-inflicted; either we were too cheap to give it the resources it needed to handle the amount of users who were using it, or we tried to outsmart the recommended and supported procedures by doing things in a non-supported way.
Along the way we have learned to never deviate from the supported ways to do things, and to keep user API quota as small as possible (the team which managed this service prior to my team would increase quota per user anytime anyone asked, which was a capital-M Mistake.)
Rock-solid stability, for a company with 300+ microservices, 10+ big environments, 50+ microenvironments, who knows how many Jenkins pipelines (more than 900, I’ll tell you that). We deployed several times a day, each service on average had 3 weekly deployments.
As a company, I think GitHub (public) should do better, much better, given this is happening more frequently as of late, but if big companies (even medium ones) don’t have their own package caches, they are all in for a ride.
At a previous Startup we had GitHub + GitHub Actions, and we were on AWS. We setup some OCI images cache. Sure, if GitHub went down we could not deploy new stuff, but at least it wouldn’t take us down. If we really needed the pipelines, I suppose we could have setup some backup CLI or AWS CodePipeline (eww) workflows.
Either you go full in, or you'll better don't do it.
Kinda eliminates all those pennies saved (in theory) for outsourcing to "the cloud" if you have to duplicate your infra.
Hybrid has always seemed the most optimal approach, but there's always someone in charge who thinks spending money on safety nets is just wasted money.
If you actually care about uptime, then a real demo with usage is likely the better approach: switch over to your "backup" on a regular basis and make sure it works 100% as expected.
So really what I'm asking is "how strict are these audits really?"
Internal audits are always subject to gaps, but if the stated issue is correct "a load balancer config change gone pear shaped" an audit wouldn't have caught that necessarily.
Unless the audit wants to test their change control, deployment methods, and redundancy.
Are they changing all of their load balancers all at once? Seems non optimal. Maybe change only one at a time, or a small batch.
Are they propagating load balancer changes from a canary to production without vetting it's good?
Or did they vet it and they were wrong - some difference in their canary or analysis had a short coming?
And even if all of that was A-OK why did a mistake (and we all make mistakes) not get reverted quickly?
Were there insufficient internal controls to revert small mistakes and keep them from becoming site wide outages? And so on.
I suspect these kinds of discussions are happening. Or, maybe not. Who knows?
It's a 3rd party, and even if your whole organization's life depends on it you only know what they tell you.
Welcome to "the cloud".
If you self host as an ALTERNATIVE to the 3rd party you have all of the same problems - more because you know about them, and the 3rd party can make all these claims you can't verify until they fall over with a "load balancer misconfig" story you also can't verify.
If you self-host redundantly to a 3rd party you have no special benefit (it does the same thing) AND the additional cost of a redundant infrastructure.
Why not just have redundant 3rd parties (so-called "multi-cloud") if you can't or won't trust your 3rd party.
It's like saying "I can reduce the risk of my rental car failure by owning my own car", assuming your own car you keep undriven in your garage, doesn't have a dead battery, no gas, flat tires, and proves to be unusable for hauling.
The "cloud" was touted as the fix for all that nuisance in self-hosting. Magically Jeff's bit barn would work to five 9s of uptime, and you could sit back and write your code, unshackled to infra. Until Jeff's bit barn went tits up.
I say the "cloud" is just another guys data center behind an API.
You wanna cloud experience? Put an API in front of your own servers and burn a $100 bill.
you shift it from a problem of software reliability to a problem of physical infrastructure. at some point in the chain somebody has to do that, but i'd prefer the person doing that was somebody with DEEP experience in it that could give you some nice confident assurances.
>at some point in the chain somebody has to do that, but i'd prefer the person doing that was somebody with DEEP experience in it that could give you some nice confident assurances.
Yes. This is part of the reason why services like git are being moved outside the datacenter. Most of the product offerings on the market don't scale well, have terrible reliability and are still very expensive to run.
Depending on your size your requirements may be much lower and easier to manage. Github has to be all things to all people and that comes with complexity and that can make things more fragile.
I mean seriously people, a "cloud" is just someone elses' data center.
Am I missing something?
Same Datacenter. Same reliability infrastructure wise - power, earthquakes, tsunami, typhoons, black plague, monkey pox, etc.
But their virtual offerings are much less reliable than a standalone system by a lot (they guarantee to refund you the 25cents for your instance if it goes down, not the value of the service interruption or its cost. lol! Read that TOS)
What's the solution to their inherent unreliability? Redundancy at more cost.
Well, hey, you can rent two colocation facilities if you really need redundancy across geographic regions. And maybe you can just use your colo as a source for a CDN that is geographically diverse (for latency, not hurricanes).
Geographic diversity and HA is beyond most people? If you're the kind of business that needs that, you can hire the exact same people that Amazon Pip'd and fired because they didn't hit some arbitrary ticket metrics, to scale your business.
e.g. https://www.forbes.com/sites/lucianapaulise/2022/10/27/amazo...
When I'm trying to explain to people what it's like to work on this kind of software, I like to use an analogy: it's as though I have my own personal brick, or group of bricks, in the great pyramids of Egypt, just a a tiny piece of a stupefyingly, inconceivably larger whole -- and when I twist my chisel and strike my block just so, at exactly the right (or rather the wrong) angle, I can shake the very foundations of Egypt.
Is the risk of your git repos higher than a chip shortage causing you to lose access to the infrastructure you need? So many factors to consider. A chip shortage doesn't seem that unlikely with geopolitics.
The list of scenarios you mitigate for seem like they could very easily be an arbitrary list of the of scenarios a single person came up with.
The risk calculations are very primitive at the moment, I'm guessing they will be refined over time as industry feedback starts to resonate.
They are replacing everything with Github actions. I wonder what they are going to do when Github is down.
But you are right that I want reliable and easy-to-use services. And centralisation is often one way to go there.
As an interesting counterpoint: Git itself is decentralised and replaced centralised services like Subversion. And that made git easier to use, especially easier to get started with: no need for a server, no need to be online, just do `git init` in any old directory.
A GitHub-clone could be more decentralised, but they'd need to use that decentralisation to drive those other features that people actually care about day to day.
svn doesn’t require a server and there is no need to be online. It works perfectly fine over the file:// protocol.
Was that always the case? I remember it being quite a hassle to set up (following tutorials online), but that was about 15 to 20 years ago or so.
It's similar to how databases allow to begin parallel, concurrent, even contradictory transactions, and also guarantee serialized, consistent database state, and rejection of invalid updates, at commit time. Both aspects are utterly important.
So far, having individual small countries seems to keep the centralisation at bay for longer than just having states in a federation.
(Look at Germany, Austria, Australia, the USA for examples of the latter. Interestingly, the UK is legally not made of federal states, but in practice they have granted more autonomy to eg Scotland over the years. And everyone knows that Scotland would secede and get away with it, if there was a power grab by London. In that sense, they are more federal than the US, where secession is very much verboten.)
Portable VCS is simple. Portable anything with the integration everyone expects (issues connects to source which connects to builds which connects to releases) is hard. Git being so open and portable means it isn't a moat.
Then it'll be up to the nerds who manage to cobble together their own distributed version of everything--even if it's a significantly reduced definition of everything.
Or, if it's a political scenario, it may depend on how well we can coordinate en masse without the cut connection. If we can exceed a certain threshold then we'll have removed the incentive to cut it in the first place.
Decentralization can be hidden from the user, it's an implementation detail.
There's literally a popular decentralized social network.
It's less about the tech, and more about the execution.
Historically we can look at LimeWire or PopcornTime as an example.
Both decentralized, both popular due to the ease-of-use.
No there isn't. Not a single one.
There are a few federated social networks, which is a fancy way of saying that they are centralized networks that have (or can have, in principle) more than one "center".
In practice, the overwhelming majority of users of such networks gravitate towards one or a handful of large providers. And many of those providers actually refuse to federate with other providers unless they follow an ever-growing list of politically-charged rules. This is just centralization with extra steps.
If you don't account for the benefit, it looks irrational, but this is true of absolutely anything
- A canonical name and place on the web;
- Access policy enforcement (who can commit and when);
- The whole pull request thing, with tags, issues, review, discussion, etc linked to it;
- Code review, linked to said policy enforcement;
- Issue tracking, even as basic as what GitHub offers;
- A trusted store for signing keys, so commits are verified;
- CI/CD triggers and runners;
- A page with releases, including binary releases, and a CDN allowing to use the download links without fear.
This is way more than distrusted version tracking. Actually the above is not even married to Git; it could be as valuable with Mercurial, or even Perforce.
This is a large product, actually a combination of many potentially self-contained products. It should not be compared to Git, but rather to Gitea or BitBucket. Not all of this can be reasonably decentralized, though quite a bit can.
mg always kept history though. Git has always encougaged squashes and rebase to keep a linear history so that information was lost.
Interestingly, what GitHub mostly enforces is where your branches point to. Not who can make commits. That's mostly because of how git works, not because of any grand design on GitHub's part.
Out of the box, git does not offer that, and this does require a single point of enforcement.
Anyone can make any commit they want in git. That includes merge commits, too. GitHub mostly lets anyone push any commits they feel like, too. (What restrictions are there on pushing commits is mostly to deal with denial of service and people being a nuisance.)
Where the policing comes in is in giving rules for how these pointers (aka branches) can be mutated. OWNERS files, PR reviews, CI automation etc is all about controlling that mutation.
See also the new-ish merge queues[0], which really bring out that difference: the merge queue machinery makes the merge commit of your approved PR branch with 'main', runs the CI against that, and iff that passes, moves the pointer that is 'main' to point to the newly created commit.
It's exactly the same commit (with exactly the same hash), whether it passes the CI or not. The only difference is in whether it gets the official blessing of being pointed to by the official 'main'.
It really speaks to the design of git, that conceptually the only thing they need to lock down is who can mutate this very small amount of data, these handfuls of pointers. Everything else is (conceptually) immutable, and thus you don't need to care about who can eg make commits.
[0] Really a re-implementation of bors-ng.
I've used up 17h of CI time these two (slow) January weeks, for free, testing stuff across ~20 different OS/CPU combinations.
That's on just one "personal" project; a bigger dependency of that, of which I'm a maintainer, spends an order of magnitude more.
Can you (GP post, people complaining, not parent) blame us? Should we instead self host everything and beg for donations just to cover the costs?
As a professional software developer, I want tools that just work that I can rely on. GitHub 99.99% uptime is something I can rely on.
Daily: 8.6s
Weekly: 1m 0.48s
Monthly: 4m 21s
Quarterly: 13m 2.4s
Yearly: 52m 9.8s
If you assume that "uptime" means all tools are available
https://statusgator.com/services/github
this appears to be 45 minutes in just one day
Incident with Git Operations 30m Jan 14, 2025 9:01 AM Down
Incident with Git Operations 10m Jan 14, 2025 8:51 AM Down
Incident with Git Operations 5m Jan 14, 2025 8:46 AM Down
Not much margin to hit four 9's left for the rest of the year.
Their enterprise level SLA is only 99.9% (measured quarterly) and the remedy is a 10% credit, increasing to a 25% credit if they drop below 99%.
github basically shoves a webby frontend and workflows on top of someone else's work. That's all fine and good but github is not git.
As a professional IT consultant, I want tools, I use lots of other's and I also create my own and I also generally insist on hosting my own. I'm also a fair carpenter and several other trades. I have tools for all my trades and I look after all of them. That way I can guarantee quality - or at least reproducible and documented quality.
I'm sure you are fine with abrogating responsibility for stuff that you are not able to deal with - that's all good: Your choice.
EDIT: Sorry, forgot to say: "Yay cloud"
A mailing list can go down and nothing would happen. The main point is to post patches to the maintainer. The mailing list is for a public record of things.
The only centralised thing is repo hosting on kernel.org. And that isn't the only official place, you can get the repo published on googlesource or GitHub, so it isn't exactly central enough.
So Torvald's opted to "clone" the features of bitkeeper into an open source version he named 'git'.
That's the story I heard, no idea if it's true.
Source: A Git Story from https://blog.brachiosoft.com/en/posts/git/
Whenever you do a clone or an npm install or apt get or pip install, etc...
You choose github because your dependencies chose git
(And even among professionals, there's a big difference between Site Reliability Engineering and Software Engineering.)
Not your point, really, but fortunately, git is easily extensible. This in-repo issue tracker is surprisingly feature complete: https://github.com/git-bug/git-bug. Anyone else given it a whirl?
I believe Gitea has support for it, not sure to what extent.
Originally the plan was to PR the federation support to Gitea as well. I'm not sure if this is still the case, considering the rising tensions between the two projects and the fact that Forgejo is now a hard fork.
https://forgejo.org/faq/#is-there-a-roadmap-for-forgejo
I only use my Forgejo instance for myself currently so I haven't looked at the ActivityPub features of it before.
This is not true. The cheapest option is to not have services that require servers to maintain. Git continues to work if GitHub is down. So do shell scripts when CI is down. So why can’t we have an issue system where the underlying data is text files in a git branch?
I understand at scale you can pay people to optimize a process for the larger team, but there is a ton of unnecessary fragility before getting to that scale.
You don’t outsource things that prevent you from doing your core competency.
Building their software is - Github being down is currently preventing that for many companies.
AFAICT the internet was built on negativity.
Here's the 2nd post from a random USENET group I found:
https://www.usenetarchives.com/view.php?id=comp&mid=PDQ5ajZp...
Except if you have a release planned but most don't at that time, statistically.
Problem is that people get comfortable with pushing to branch -> deploying in dev and testing from there.
Nothing is built into git to let it actually run decentralized: there's no server or protocol where someone can register say, an identifying public key and then just have a network of peers all communicate updates to each other. It's even pretty damn unsafe to just run a repo through basic file-sync (I do this for myself with syncthing with bare repos in a special folder, which seems to work fine but I'm hardly loading it up to chase down why it doesn't).
Well there is https://github.com/git-bug/git-bug
Email and mailing lists?
Otherwise you can claim Facebook is distributed because you can email people links to Facebook pages.
That's the way the Linux kernel (the first Git repository) and Git [2] itself manage their codes. There's even a git send-email command, that prepare the commits as patches and send them following the using the correct template.
[1] Linux kernel, IIO subsystem: https://lore.kernel.org/linux-iio/
[2] Git mailing list: https://lore.kernel.org/git/
Like, yes, it's true. Unlike a banana, turtles have 4 movement-enabling things, they use them to move mostly forward and backwards and not sideways, and other things can ride on them. It's probably more of a car. But it's not a car.
Git has no issue tracker. It's really not a controversial statement. The git community has common practices using something else to work around that, but if that's all you need to say "therefore git has X" then you can claim git has a CI framework because everyone and their dog uses GitHub. Which also has email integrations.
A bug tracker is just assorted communication. One can easily build it over email.
You're just indulging in hyperbole for the sake of it. Nobody said git has an issue tracker in it.
Yes they did. That's what this comment thread is about. https://news.ycombinator.com/item?id=42691624
(Unless you're splitting some really fine hairs about what "in it" means?)
Which makes it a claim that those tools are git's distributed bug tracker.
A bug tracker and an issue tracker are basically the same term. So that's a claim that git has an issue tracker.
So when you come along and say "Nobody said git has an issue tracker in it." you are either wrong, or you're saying the words "in it" completely change the meaning of the sentence.
If it's the latter, that is a very unhelpful way to communicate, and is definitely splitting hairs. And honestly it's a strawman too because the comment you replied to wasn't using the words "in it". They were saying that you shouldn't say "git has" email. Which is a direct reference to the ancestor comment's claim. It was not hyperbole.
I'm not splitting hairs anywhere. I'm saying that the ancestor comment has the same meaning as "git has an issue tracker". That's not splitting. It's the opposite of splitting.
git@github.com: Permission denied (publickey).
Either GitHub didn't know how to communicate, or they were not sure about the real impact.This is bad.
But yeah, also status pages seem to be under the domain of non-engineers who are too concerned with how things look, vs. conveying useful information in a timely manner, and ultimately, fail at both.
fatal: clone of 'https://github.com/libsdl-org/freetype.git' into submodule path '~/src/SDL/SDL_ttf/external/freetype' failed
...
fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF fatal: fetch-pack: invalid index-pack output
...
fatal: clone of 'https://github.com/libsdl-org/harfbuzz.git' into submodule path '~/src/SDL/SDL_ttf/external/harfbuzz' failed Failed to clone 'external/harfbuzz'. Retry scheduled
...
Failed to clone 'external/freetype' a second time, aborting
For a problem that's supposedly "fixed" that's a whole lot of errors...
Then went to github status and calmed down.
Developer "snow day".
Update: They promise >99.9% on a quarterly basis for enterprise customers - https://github.com/github/docs/blob/main/content%2Fsite-poli...
Rereading the SLA, it looks like Github can have each service feature like issue, pull requests, git operations be down 0.1% and still not reimburse. In your head you might not account separately for each feature, but Github does.
The front-end is glacial nowadays and frequently has issues actually loading the page, actions frequently has some kind of panic attack and breaks or just grinds along a glacial speeds. The UX has gotten worse (why no merge-queue button?).
Navigating around takes so much time, it should probably have its own timesheet code.
"We've identified a cause of degraded git operations, which may affect other GitHub services that rely upon git. We're working to remediate."
For example, I doubt you would be able to easily merge any pull-requests or use the same CI/CD code for the same services without hacky solutions.
https://blog.gitea.com/welcome-to-gitea/
I was using Gitea for a long time, and then someone forked Gitea to create Forgejo. At this time, my installation of Gitea was already out of date a bit because I had previously been manually building and installing Gitea from source. Soon after Forgejo was created, it landed in FreeBSD ports and then it became available in the FreeBSD package manager.
So at this point, and having read a bit about Forgejo and seeing that Forgejo was maintained by people with connection to Codeberg, I thought “hey I need to migrate my current Gitea setup anyway. Either to Gitea installed from FreeBSD packages, or to something else. I might as well try Forgejo.”
And that’s how I ended up installing Forgejo and I’ve stuck with it since.
In a wonderful twist, we are relying on a couple modules served from GitHub!
if there's an error/timeout, they'll do a check of their status page so you don't get the standard 'error' but rather a 'dont worry, youre not doing anything wrong. its borked, our bad' message.
There's innumerable causes of this kind of failure that aren't rooted cloud service provider shenanigans
(Admittedly the duration of the outage does "feel" like an infra outage)
eats popcorn waiting for explanation of why they didn't catch it in non-prod