I never use Github Copilot; it does go down a lot, if their status page is to be believed; I don't really care when it goes down, because it going down doesn't bring down the rest of Github. I care about Github's uptime ignoring Copilot. Everyone's slice of what they care about is a little different, so the only correct way to speak on Github's uptime is to be precise and probably focus on a lot of the core stuff that tons of people care about and that's been struggling lately: Core git operations, website functionality, api access, actions, etc.
This is definitely true.
At the same time, none of the individual services has hit 3x9 uptime in the last 90 days [0], which is their Enterprise SLA [1] ...
> "Uptime" is the percentage of total possible minutes the applicable GitHub service was available in a given calendar quarter. GitHub commits to maintain at least 99.9% Uptime for the applicable GitHub service.
[0]: https://mrshu.github.io/github-statuses/
[1]: https://github.com/customer-terms/github-online-services-sla
(may have edited to add links and stuff, can't remember, one of those days)
The linked document in my previous comment has more detail.
They're not even struggling to get their average to three 9s, they're struggling to get ANY service to three 9s. They're struggling to get many services to two 9s.
Copilot may be the least stable at one 9, but the services I would consider most critical (Git & Actions) are also at one 9.
On the other hand the baseline minimal Github Enterprise plan with no features (no Copilot, GHAS, etc.) runs a medium sized company $1m+ per annum, not including pay-per-use extras like CI minutes. As an individual I'm not the target audience for that invoice, but I can envisage whomever is wanting a couple of 9s to go with it. As a treat.
Why defend a company that clearly doesn't care about its customers and see them as a money spigot to suck dry?
The five nines tech people usually are talking about is a fiction; the only place where the measure is really real is in networking, specifically service provider networking, otherwise it's often just various ways of cleverly slicing the data to keep the status screen green. A dead giveaway is a gander at the SLAs and all the ways the SLAs are basically worthless for almost everyone in the space.
See also all of the "1 hour response time" SLAs from open source wrapper companies. Yes, in one hour they will create a case and give you case ID. But that's not how they describe it.
GHA can’t even be called Swiss cheese anymore, it’s so much worse than that. Major overhauls are needed. The best we’ve got is Immutable Releases which are opt in on a per-repository basis.
You can pin actions versions to their hash. Some might say this is a best practice for now. It looks like this, where the comment says where the hash is supposed to point.
Old --> uses: actions/checkout@v4
New --> uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4
There is a tool to sweep through your repo and automate this: https://github.com/mheap/pin-github-actionLike: https://github.com/actions/checkout/tree/11bd71901bbe5b1630c...
So I'm pretty sure that for the same commit hash, I'll be executing the same content.
This article[0] gives a good overview of the challenges, and also has a link to a concrete attack where this was exploited.
[0]: https://nesbitt.io/2025/12/06/github-actions-package-manager...
TravisCI
Jenkins
scripts dir
Etc
The main desiderata with these kinds of action pinning tools is that they (1) leave a tag comment, (2) leave that comment in a format that Dependabot and/or Renovate understands for bumping purposes, and (3) actually put the full tag in the comment, rather than the cutesy short tag that GitHub encourages people to make mutable (v4.x.y instead of v4).
[1]: https://github.com/suzuki-shunsuke/pinact
Perhaps mixing the CI with the CD made that worse because usually deployment and delivery has complexities of its own. Back in the day you'd probably use Jenkins for the delivery piece, and the E2E nightlies, and use something more lightweight for running your tests and linters.
For that part I feel like all you need, really, is to be able to run a suite of well structured shell scripts. Maybe if you're in git you follow its hooks convention to execute scripts in a directory named after the repo event or something. Forget about creating reusable 'actions' which depend on running untrusted code.
Provide some baked in utilities to help with reporting status, caching, saving junit files and what have you.
The only thing that remains is setting up a base image with all your tooling in it. Docker does that, and is probably the only bit where you'd have to accept relying on untrusted third parties, unless you can scan them and store your own cached version of it.
I make it sound simpler than it is but for some reason we accepted distributed YAML-based balls of mud for the system that is critical to deploying our code, that has unsupervised access to almost everything. And people are now hooking AI agents into it.
These reusable actions are nothing but a convenience feature. This discussion isn't much different than any other supply chain, dependency, or packaging system vulnerability such as NPM, etc.
One slight disclaimer here is the ability of someone to run their own updated copy of an action when making a PR. Which could be used to exfil secrets. This one is NOT related to being dependent on unverified actions though.
(re-reading this came across as more harsh than I intended.. my bad on that. But am I missing something or is this the same issue that every open-source user-submitted package repository runs in to?)
[1] https://app.radicle.xyz/nodes/radicle.dpc.pw/rad%3Az2tDzYbAX...
> For us, availability is job #1, and this migration ensures GitHub remains the fast, reliable platform developers depend on
That went about as well as everyone thought back then.
Does anyone else remember back in ~2014-2015 sometime, when half the community was screaming at GitHub to "please be faster at adding more features"? I wish we could get back to platforms (or OSes for that matter) focusing in reliability and stability. Seems those days are long gone.
We have since switched to self hosted Forgejo instance. Unsurprisingly the search works.
The improvements to PR review have been nice though
I dunno, probably the worst UX downgrade so far, almost no PRs are "fully available" on page load, but requires additional clicks and scrolling to "unlock" all the context, kind of sucks.
Used to be you loaded the PR diff and you actually saw the full diff, except really large files. You could do CTRL+F and search for stuff, you didn't need to click to expand even small files. Reviewing medium/large PRs is just borderline obnoxious today on GH.
They have somehow found the worst possible amount of context for doing review. I tend to pull everything down to VS Code if I want to have any confidence these days.
That's only a valid sentiment if you only use the big players. Both of those have medium/smaller competitors that have shown (for decades) that they are extremely boring, therefore stable.
I'm at a much smaller outfit now so we have more freedom but I'd dread to think the arguments I would've had at the 4000+ employee companies I was at before.
(Note that "is this company financially viable in the long term future" is an important part of stability. Doesn't matter how rock solid the software is if the startup's bankrupt by the end of next year.)
It's just that everybody is using 100 tools and dependencies which themselves depend on 50 others to be working.
And then on top of all that, their traffic is probably skyrocketing like mad because of everyone else using AI coders. Look at popular projects -- a few minutes after an issue is filed they have sometimes 10+ patches submitted. All generating PRs and forks and all the things.
That can't be easy on their servers.
I do not envy their reliability team (but having been through this myself, if you're reading this GitHub team, feel free to reach out!).
1-4 incidents per month compared to about 1 daily.
Like they are down to one 9 availability and very very close to losing that to (90.2x%).
This also fit more closely to my personal experience, then the 99.900-99.989 range the article indicates...
Through honestly 99.9% means 8.76h downtime a year, if we say no more then 20min down time per 3 hours (sliding window), and no more then 1h a day, and >50% downtime being (localized) off-working hours (e.g. night, Sat,Sun) then 99.9% is something you can work with. Sure it would sometimes be slightly annoying. But should not cause any real issues.
On the other hand 90.21%... That is 35.73h outage a year. Probably still fine if for each location the working hour availability is 99.95% and the previous constraints are there. But uh, wtf. that just isn't right for a company of that size.
That’s… one 9 of reliability. You could argue the title understates the problem.
> You don't need every single service to be online in order to use GitHub.
Well that’s how they want you to use it, so it’s an epic failure in their intended use story. Another way to put this is ”if you use more GitHub features, your overall reliability goes down significantly and unpredictably”.
Look, I have never been obsessed with nines for most types of services. But the cloud service providers certainly were using it as major selling/bragging points until it got boring and old because of LLMs. Same with security. And GitHub is so upstream that downstream effects can propagate and cascade quite seriously.
These days it is very common that something like opening the diff view of a trivial PR takes 15-30 seconds to load. Sure, it will eventually load after a long wait or an F5, but it is still negatively impacting my productivity.
It seems that the same metric is about a magnitude worse than before.
https://github.com/customer-terms/github-online-services-sla
> GitHub commits to maintain at least 99.9% Uptime for the applicable GitHub service.
... and none of the individual services have hit 99.9% uptime in the last 90 days according to this site. 0_o
https://docs.github.com/en/enterprise-cloud@latest/organizat...
the pages got slower, rendering became a nightmare.
then they introduced GitHub actions (half baked) - again very unreliable
then they introduced Copilot - again not very reliable
it's easy to see why availability has gone down the drain.
are they still on the rails monolith ? they speak about it less these days ?
¹ Glossing over the "what they're getting in return" part. ² https://www.warpbuild.com/
People on lobsters a month ago were congratulating Github on achieving a single nine of uptime.[1]
I make jokes about putting all our eggs in one basket under the guise of “nobody got fired for buying x; but there are sure a lot of unemployed people”- but I think there’s an insidious conversation that always used to erupt:
“Hey, take it easy on them, it’s super hard to do ops at this scale”.
Which lands hard on my ears when the normal argument in favour of centralising everything is that “you can’t hope to run things as good as they do, since there’s economies of scale”.
These two things can’t be true simultaneously.. this is the evidence.
[0]: https://mrshu.github.io/github-statuses/
[1]: https://lobste.rs/s/00edzp/missing_github_status_page#c_3cxe...
Sure they can. Perhaps a useful example of something like this would be to consider cryptography. Crypto is ridiculously complex and difficult to do correctly. Most individual developers have no hope of producing good cryptographic code on the same scale and dependability of the big crypto libraries and organizations. At the same time these central libraries and organizations have bugs, mistakes and weaknesses that can and do cause big problems for people. None of that changes the fact that for most developers “rolling your own crypto” is a bad idea.
I’d go so far as to say that there are more crypto libraries than there are “default” options for SaaS Git VCS (Gitlab and Github are the mainstay in companies and maybe Azure Devops if you hate your staff- nobody sensible is using bitbucket) but for TLS implementations there’s RustTLS, GnuTLS, BoringSSL, LibreSSL, WolfSSL, NSS, and AWS-LC that come to mind immediately.
but then their status center isn't really trust-able anymore and a lot of temporary issues I have been running into seem to be temporary, partial, localized failures which sometimes fall under temp. slow to a point of usability. Temporary served outdated (by >30min) main/head. etc.
so that won't even show up in this statistics
I find it hard to believe that an Azure migration would be that detrimental to performance, especially with no doubt "unlimited credit" to play with?
You can provision Linux machines easily on Azure and... that's all you need? Or is the thinking that without bare metal NVMe mySQL it can't cope (which is a bit of a different problem tbf).
We wouldn’t couple so much if we knew reliability would be this low. It will influence future decisions.
A migration like this is a monumental undertaking to the level of where the only sensible way to do a migration like this is probably to not do it. I fully expect even worse reliability over the next few years before it'll get better.
The real problem today IMO is that Microsoft waited so long to drop the charade that they now felt like they had to rip the bandaid. From what I've heard the transition hasn't gone very smoothly at all, and they've mostly been given tight deadlines with little to no help from Microsoft counterparts.
Then Azure Dev Ops (formerly known as Visual Studio Team System) dead o n the ocean floor.
Although given how badly GitHub seems to be doing, perhaps it's better to be ignored.
There's clearly one small team that works on it. There are pros and cons to that.
It hasn't even got an obnoxious Copilot button yet for example, but on the other hand it was only relatively recently you could properly edit comments in markdown.
If the client has existing AzDo Pipelines then I'd suggest keeping them there.
When I saw his interview: https://thenewstack.io/github-ceo-on-why-well-still-need-hum... i thought "oh, there is some semblance of sanity at Microsoft".
This was after seeing those ridiculous PRs where microsoft engineers patiently deconstructed AI slop PRs they were forced to deal with on the open source repos they maintained.
When he was gone a few months later and github was folded into microsoft's org chart the writing was firmly on the wall.
Also of note is that the Microsoft org chart always showed GitHub in that structure while the org chart available to GitHub stopped at their CEO. Its not that they were finally rolled into Microsoft's org chart so much as they lifted the veil and stopped pretending.
Nonetheless it looks like he was both willing and able to push back on a good deal of the AI stupidity raining down from above and then he was removed and then, well, this...
I understand how appealing it is to build an AI coding agent and all that, but shouldn't they - above everything else - make sure they remain THE platform for code distribution, collaboration and alike? And it doesnt need to be humans, that can be agents as well.
They should serve the AI agent world first and foremost. Cause if they dont pull that off, and dont pull off building one of the best coding agents - whcih so far they didnt - there isn't much left.
There's so many new features needed in this new world. Really unclear why we hear so little about it, while maintainers smack the alarm bell that they're drowning in slop.
More recently:
Addressing GitHub's recent availability issues
https://github.blog/news-insights/company-news/addressing-gi...
(with a smattering of submissions here the last few weeks but no discussion)
That's the reason you hear the complaints: they're from people who no longer want to be using this product but have no choice.
Because Microsoft doesn't need to innovate or even provide good service to keep the flies glued, they do what they've been doing: focus all their resources on making the glue stickier rather than focusing on making people want to stay even if they had an option to leave.
Codespaces specifically is quite good for agent heavy teams. Launch a full stack runtime for PRs that are agent owned.
> keep hearing that Github is terrible
I do not doubt people are having issues and I'm sure there have been outages and problems, but none that have affected my work for weeks.GH is many things to many teams and my sense is that some parts of it are currently less stable than others. But the overall package is still quite good and delivers a lot of value, IMO.
There is a bit of an echo chamber effect with GH to some degree.
2026-02-27T10:11:51.1425380Z ##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled. 2026-02-27T10:11:56.2331271Z ##[error]The operation was canceled.
I had to disable the workflows.
GitHub support response has been
“ We recommend reviewing the specific job step this occurs at to identify any areas where you can lessen parallel operations and CPU/memory consumption at one time.”
That plus other various issues makes me start to think about alternatives, and it would have never occurred to me one year back.
[0] https://github.com/Barre/ZeroFS/actions/runs/22480743922/job...
Once we got the email that they were going to charge for self-hosted runners that was the final nail in the coffin for us. They walked it back but we've lost faith entirely in the platform and vision.
Just don’t like the slop that’s getting us there.
Gitlab
Bitbucket
Sourceforge
Forgejo
Codeberg
Radicle
Launchpad
Owned by companies that help the US Federal Government illegally spy on their own citizens and murder children overseas:
Github
This sounded crazy in 2020 when I said that in [0]. Now it doesn't in 2026 and many have realized how unreliable GitHub has become.
If there was a prediction market on the next time GitHub would have at least one major outage per week, you would be making a lot of money since it appears that AI chatbots such as Tay.ai, Zoe and Copilot are somewhat in charge of wrecking the platform.
Any other platform wouldn't tolerate such outages.