There’s this whole creation myth of how Git came to be that kind of paints Linus as some prophet reading from golden tablets written by the CS gods themselves.
Granted, this particular narrative in the blog post does humanise a bit more, remembering the stumbling steps, how Linus never intended for git itself to be the UI, how there wasn’t even a git commit command in the beginning, but it still paints the whole thing in somewhat romantic tones, as if the blob-tree-commit-ref data structure were the perfect representation of data.
One particular aspect that often gets left out of this creation myth, especially by the author of Github is that Mercurial had a prominent role. It was created by Olivia Mackall, another kernel hacker, at the same time as git, for the same purpose as git. Olivia offered Mercurial to Linus, but Linus didn’t look upon favour with it, and stuck to his guns. Unlike git, Mercurial had a UI at the very start. Its UI was very similar to Subversion, which at the time was the dominant VCS, so Mercurial always aimed for familiarity without sacrificing user flexibility. In the beginning, both VCSes had mind share, and even today, the mindshare of Mercurial lives on in hg itself as well as in worthy git successors such as jujutsu.
And the git data structure isn’t the only thing that could have ever possibly worked. It falls apart for large files. There are workaround and things you can patch on top, but there are also completely different data structures that would be appropriate for larger bits of data.
Git isn’t just plain wonderful, and in my view, it’s not inevitable either. I still look forward to a world beyond git, whether jujutsu or whatever else may come.
Here's one of the first threads where Matt (Olivia) introduces the project and benchmarks, but it seems like the list finds it unremarkable enough comparatively to not dig into it much:
https://lore.kernel.org/git/Pine.LNX.4.58.0504251859550.1890...
I agree that the UI is generally better and some decisions where arguably better (changeset evolution, which came much later, is pretty amazing) but I have a hard time agreeing that hg influenced Git in some fundamental way.
For the deadnaming comment, it wasn't out of disrespect, but when referring to an email chain, it could otherwise be confusing if you're not aware of her transition.
I wasn't sponsoring hg-git, I wrote it. I also wrote the original Subversion bridge for GitHub, which was actually recently deprecated.
https://github.blog/news-insights/product-news/sunsetting-su...
I assumed it was innocent. But the norm when naming a married woman or another person who changed their name is to call them their current name and append the clarifying information. Not vice versa. Jane Jones née Smith. Olivia (then Matt).
Is this not a case where it is justified, given that she at that time was named Matt, and it's crucial information to understand the mail thread linked to? I certainly would not understand at all without that context.
If you can avoid the need to disambiguate, you do that too. The name really is dead. You shouldn't use it if at all possible.
> One particular aspect that often gets left out of this creation myth, especially by the author of Github is that Mercurial had a prominent role
I'm not sure where you're getting your facts from.
Linus never cared about hg, but lots of people that cared about git at one point would also be at least familiar with some notions from hg.
I think the reason git then was successful was because it is a small, practical, and very efficient no-nonsense tool written in C. This made it much more appealing to many than the alternatives written in C++ or Python.
Human factors matter, as much as programmers like to pretend they don't.
For “backed by” read “initially written by”.
I don't particularly remember Linus making any push for git to be generally popular. While he was more than happy for other projects to use it and be his testing resource, his main concern was making something that matched his requirements for Linux maintenance. BitKeeper was tried and worked well¹, but there were significant licensing issues that caused heated discussion amongst some of the big kernel contributors (which boiled over into flame-wars more than once or twice), and those were getting worse rather than going away².
A key reason for Linus trying what he did with Git, rather than using one of the other open options that started around the same time or slightly before, was that branching and merging source trees as large as Linux could be rather inefficient in the others — this was important for the way Linux development was being managed.
Of course most other projects don't have the same needs as Linux, but git usually wasn't bad for them either and being used by the kernel's management did give it momentum from two directions: those working on the kernel also working on other projects and using it there too (spreading it out from within), and people further out thinking “well, if they use it, it must be worth trying (or trying first)” so it “won” some headspace by being the first DVCS people tried³, and they didn't try others like mercurial or fossil because git worked well (or well enough) so they just didn't get around to trying the others⁴ that would have worked just as well for them.
----
[1] Most people looking back seem to think/imply that BK was a flash in the pan, but Linus used it for a full couple of years.
[2] A significant problem that caused the separation, rather than it being because BK was technically deficient in some way for the Linux project, was people reverse engineering the protocol to get access to certain metadata that would otherwise have required using the paid version to see, which the BK owners were not at all happy about.
[3] So yes, human factors, but less directly related to one particular human that is Linus, more the project he was famous for.
[4] That sounds a lot more dismissive than I intended. Of course many did try multiple and found they preferred git, as well as those who did the same but went with one of the others because they were a better match for the needs of that person/project.
Outside of giving one of the highest visibility tech talks in history, at Google (back when Google was the mega hip FAANG), declaring Subversion (the then leading SCM) brain dead?
Marketing works in many different ways, as does signaling. Geeks wear suits, too, their suits just aren't composed of suit jackets and suit pants, they're composed of t-shirts and jeans.
I remember that more as berating the incumbent leader in non-distributed VCSs, than promoting a specific DVCS, and that git wasn't mature at that point (the move from BK had not happened). Though maybe my remembered timeline is muddled, do you have further reference to that talk so I can verify details?
Well, I guess I might as well play ChatGPT :-)
https://en.wikipedia.org/wiki/BitKeeper
https://en.wikipedia.org/wiki/Git
https://sandeep.ramgolam.com/blog/linus-torvalds-talks-about... -> https://www.youtube.com/watch?v=4XpnKHJAok8
Nobody was really happy with the VCS situation in 2005. Most people were still using CVS, or something commercial. SVN did exist, it had only just reached version 1.0 in 2004, but your platforms like SourceForge still only offered CVS hosting. SVN was considered to be a more refined CVS, but it wasn't that much better and still shared all the same fundamental flaws from its centralised nature.
On the other hand, "distributed" was a hot new buzzword in 2005. The recent success of Bittorrent (especially its hot new DHT feature) and other file sharing platforms had pushed the concept mainstream.
Even if it wasn't for the Bitkeeper incident, I do think we would have seen something pop up by 2008 at the latest. It might not have caught on as fast as git did, but you must remember the thing that shot git to popularity was GitHub, not the linux kernel.
The amazing flexibility of git appears to intimidate a lot of people, and many coders don't seem to build up a good mental model of what is going on. I've run a couple of git tutorials for dev teams, and the main feedback I get is "I had no idea git was so straightforward".
Linus absolutely had a couple of brilliant insights:
1. Content-addressable storage for the source tree.
2. Files do not matter: https://gist.github.com/borekb/3a548596ffd27ad6d948854751756...
At that time, I was using SVN and experimenting with Hg and Bazaar. Both were too "magical" for me, with unclear rules for merging, branching, rebasing.
Then came git. I read its description "source code trees, identified by their hashes, with file content movement deduced from diffs", and it immediately clicked. It's such an easy mental model, and you can immediately understand what operations mean.
I wish weekly for explicit renames.
> At that time, I was using SVN and experimenting with Hg and Bazaar. Both were too "magical" for me, with unclear rules for merging, branching, rebasing.
I have no idea what you mean.
> It's such an easy mental model, and you can immediately understand what operations mean.
Many people disagree clearly.
You can do that in git. `git mv` stores a hint in the commit that a file has been moved.
You just don't _have_ to do it.
No. It appears in git status but is not committed. And it disappears from git status if the file is modified enough.
git is just a tool to do stuff. It's name (chosen by that Finnish bloke) is remarkably apt - its for gits!
It's not Mecurial, nor github, nor is it anything else. Its git.
It wasn't invented for you or you or even you. It was a hack to do a job: sort out control of the Linux kernel source when Bit Keeper went off the rails as far as the Linux kernel devs were concerned.
It seems to have worked out rather well.
Can you talk a little bit about this? My assumption was that the only way to deal with large files properly was to go back to centralised VCS, I'd be interested to hear what different data structures could obviate the issue.
When I first used Git I thought YES! This is it. This is the one. The model was so compelling, the speed phenomenal.
I never again used anything else unless forced -- typically Subversion, mostly for inertia reasons.
What?
> Git isn’t just plain wonderful, and in my view, it’s not inevitable either.
I mean, the proof is in the pudding. So why did we end up with Git? Was it just dumb luck? Maybe. But I was there at the start for both Git and Mercurial (as I comment elsewhere in this post). I used them both equally at first, and as a Python aficionado should've gravitated to Mercurial.
But I like to understand how tools work, and I personally found Mercurial harder to understand, slower to use, and much less flexible. It was great for certain workflows, but if those workflows didn't match what you wanted to do, it was rigid (I can't really expound on this; it's been more than a decade). Surprisingly (as I was coding almost entirely in Python at the time), I also found it harder to contribute to than Git.
Now, I'm just one random guy, but here we are, with the not plain wonderful stupid (but extremely fast) directory content manager.
It's a relief to hear someone else say something like this, it's so rare to find anything but praise for mercurial in threads like these.
It was similar for me: In the early/mid 2010s I tried both git and mercurial after having only subversion experience, and found something with how mercurial handled branches extremely confusing (don't remember what, it's been so long). On the other hand, I found git very intuitive and have never had issues with it.
In fact, now that I've used the term "evolution", Terran life/DNA functions much the same way. Adaptability trumps perfection every time.
I think too many folks at the time thought that full immutability was what folks wanted and got hung up on that. Turns out that almost everyone wanted to hide their mistakes, badly structured commits, and typos out of the box.
It didn't help that mercurial was slower as well.
I'm good with this. In my over 25 years of professional experience, having used cvs, svn, perforce, and git, it's almost always a mistake keeping non-source files in the VCS. Digital assets and giant data files are nearly always better off being served from artifact repositories or CDN systems (including in-house flavors of these). I've worked at EA Sports and Rockstar Games and the number of times dev teams went backwards in versions with digital assets can be counted on the fingers of a single hand.
My last CAD file was 40GiB, and that wasn't a large one.
The idea that all sources are text means that art is never a source, and that many engineering disciplines are excluded.
There's a reason Perforce dominates in games and automotive, and it's not because people love Perforce.
I think the key issue is actually how to sensibly diff and merge these other formats. Levenshtein-distance-based diffing is good enough for many text-based formats (like typical program code), but there is scope for so much better. Perhaps progress will come from designing file formats (including binary formats) specifically with "diffability" in mind -- similar to the way that, say, Java was designed with IDE support in mind.
----
[0] for the purposes of change tracking and merging
[1] Stares aggressively at SSIS for its nasty package file format² and habit of saving parts of it in different orders apparently randomly so updating the text of an annotation can completely rearrange the saved file
[2] far from the only crime committed by SSIS I know, but one occasionally irritating enough to mention
Diffoscope does something similar, diff sorted stuff first, then if there are no changes, then report that, and show the unsorted diffs.
Possibly, though I might be concerned that the format has ordering oddities that it is unexpectedly sensitive to. Unlikely, but given how many other oddities DTS/SSIS has collected over the years I'd not be surprised!
Also, we weren't using Git in DayJob at the time we were actively developing with SSIS (maybe VSTS had an equivalent we could have used?), and we are now acting to remove the last vestiges of it from our workflows rather than spending time making it work better with them!
Remembering the real history matters, because preserving history is valuable by itself, but I'm also really glad that VCS is for most people completely solved, there's nothing besides Git you have to pay attention to, you learn it once and use it your whole career.
It's not built-in to git itself, but I remember seeing demos where git could be configured to use an external tool to do a visual diff any time git tried to show a diff of image files.
> and moved file tracking.
Check out -C and -M in the help for git log and blame. Git's move tracking is a bit weirder than others (it reconstructs moves/copies from history rather than recording them at commit), but I've found it more powerful than others because you don't need to remember a special "move" or "copy" command, plus it can track combining two files in a way others can't.
From what I hear most current new developers never really learn git, they learn a couple features of some git GUI.
And it's understandable, you're really understating what learning git (one of the messiest and worst documented pieces of software ever) well entails.
I find it a disgrace that we're stuck at git, actually.
If I had to actually use git on the CLI every day, I would probably complain a lot, but it's a pretty good experience when you're using Git Cola and GitHub.
It would be nice if it had native discussions, issues, and wikis, like Fossil does, having that all, decentralized seems like a good idea though.
Sun Microsystems (RIP) back then went with Mercurial instead of Git mainly because Mercurial had better support for file renames than Git did, but at Sun we used a rebase workflow with Mercurial even though Mercurial didn't have a rebase command. Sun had been using a rebase workflow since 1992. Rebase with Mercurial was wonky, but we were used to wonky workflows with Teamware anyways. Going with Mercurial was a mistake. Idk what Oracle does now internally, but I bet they use Git. Illumos uses Git.
A part of me thinks that there was a Sun users aversion to anything Linux related.
It wasn't that. It really was just about file renaming.
And that is partly why Mercurial lost. They insisted on being opinionated about workflows being merge-based. Git is not opinionated. Git lets you use merge workflows if you like that, and rebase workflows if you like that, and none of this judgment about devs editing local history -- how dare the VCS tell me what to do with my local history?!
One bigger difference I can think of is, Mercurial has permanently named branched (branch name is written in the commit), whereas in git branches are just named pointers. Mercurial got bookmarks in 2008 as an extension, and added to the core in 2011. If you used unnamed branches and bookmarks, you could use Mercurial exactly like git. But git was published in 2005.
Another is git's staging area. You can get pretty much the same functionality with repeatedly using `hg commit --amend` but again, in git the default gears you towards using the staging approach, in Mercurial you have specifically search for a way to get it to function this way.
Based on the fact that ecosystem torpedoed an entire major version of the language, and that there are a bazillion competing and incompatible dep managers, it seems that bet turned out well
Is it not? What are the alternatives?
The numbers I've seen say git has about 87-93% of the version control market share. That's just one of many reasons I think it is safe to say most professional developers disagree with you. I can understand someone prefering Perforce for their workflow (and yes, I have used it before). But saying Git is only "for hobby side projects" is just ridiculous. It has obviously proven its value for professional development work, even if it doesn't fit your personal taste.
In my career, I've used Svn, Git and something I think it was called VSS. Git has definitively caused less problems, it's also been easy to teach to newbies. And I think the best feature of Git is that people really really benefit from being taught the Git models and data structures (even bootcamp juniors on their first job), because suddenly they go from a magic incantation perspective to a problem-solving perspective. I've never experienced any other software which has such a powerful mental model.
That of course doesn't mean that Mercurial is not better; I've never used it. It might be that Mercurial would have all the advantages of git and then some. But if that were so, I think it would be hard to say that Git is at a local maximum.
Hmm, maybe Microsoft Visual Source Safe? I remember that. It was notorious for multiple reasons:
* Defaulted to requiring users to exclusively 'check out' files before modifying them. Meaning that if one person had checked out a file, no one else could edit that file until it was checked in again.
* Had a nasty habit of occasionally corrupting the database.
* Was rumored to be rarely or not at all used within Microsoft.
* Was so slow as to be nearly unusable if you weren't on the same LAN as the server. Not that a lot of people were working remotely back then (i.e. using a dial-up connection), but for those who were it was really quite bad.
The number of guides proclaiming the ease of Git is evidence that Git is not easy. Things that are actually easy do involve countless arguments about how easy they are.
I can teach an artist or designer who has never heard of version control how to use Perforce in 10 minutes. They’ll run into corner cases, but they’ll probably never lose work or get “into a bad state”.
Unless you're in ML, in which case it's a minimum of the loss function, not the utility function...
I think you meant "worse" for that first "better."
I hate to be that guy, but you should spend some time with jj. I thought the same, but jj takes this model, refines it, and gives you more power with fewer primitives. If you feel this way about git, but give it an honest try, I feel like you'd appreciate it.
Or maybe not. Different people are different :)
I have encountered Perforce, Mercurial, and git professionally throughout my career. Considering the prominence of git in the market, it must be obvious that git does some combination of things right. I myself have found git to be solid where the other salient choices have had drawbacks.
The use of git it so widespread that it is hardly a local minimum.
(And the fact that Mercurial supports history editing _now_ is irrelevant, that ship has long sailed.)
VHS won because it was cheaper and could record longer. Fidelity was similar at the recording speeds people used in practice.
It’s very likely that most if not all of the software stack you’re using to post your comment is managed with git.
A whole generation of programmers have only ever known Git and GitHub. They assume that since it is the standard it must be good. This is a fallacy.
Bad things can become popular and become entrenched even when better things exist. Replacing Git today would require something not just a little better but radically better. Git was always worse than Mercurial. It won because of GitHub. If MercurialHub had been invented instead we’d all be using that and would be much happier. Alas.
hard disagree. Git was always way better than Mercurial.
> It won because of GitHub.
I and most of the developers I have worked with over the years all used git for many years before ever even trying GitHub. GitHub obviously has helped adoption, but I'm not convinced that git would not have won even if GitHub had never existed.
How is Git better than Mercurial in any way nevermind "always way better". Serious question.
I'd possibly accept "the original Mercurial implementation was Python which was slow". And perhaps that's why Git won rather than GitHub. But I don't think so.
I don’t object to saying we can do better than git. But saying git “is for hobby side projects” is ridiculous. It’s fine for serious projects.
Git sucks for serious projects. It’s certainly what many people use. But I don’t think “we can do better” is a strong enough statement. Git is bad and sucks. It’s functional but bad. We can do much much better than Git.
I like to rant about Git because we will never do better unless people demand it. If people think it’s good enough then we’ll never get something better. That makes me sad.
MercurialHub was invented. It’s called Bitbucket, it was founded around the same time as GitHub, and it started out with Mercurial. People wanted Git, so Bitbucket was forced to switch to Git.
No. If Bitbucket competed with MercurialHub then MercurialHub would still have won and we’d all be much happier today.
Look, I get that you hate Git, but people had the choice of using Mercurial or Git – on the same platform even – and they overwhelmingly chose Git. You claiming that Mercurial would win in a fair fight ignores the fact that that fight did actually happen and Mercurial lost.
GitHub won. Not Git. IMHO.
> If MercurialHub had been invented instead we’d all be using that
This existed. We aren’t.
I showed it to a couple software entrepreneuers (Wild Tangent and Chromium), but they had no interest in it.
I never did anything else with it, and so it goes.
---
Consider that any D app is completely specified by a list of .module files and the tools necessary to compile them. Assign a unique GUID to each unique .module file. Then, an app is specified by a list of .module GUIDs. Each app is also assigned a GUID.
On the client's machine is stored a pool of already downloaded .module files. When a new app is downloaded, what is actually downloaded is just a GUID. The client sees if that GUID is an already built app in the pool, then he's done. If not, the client requests the manifest for the GUID, a manifest being a list of .module GUIDs. Each GUID in the manifest is checked against the client pool, any that are not found are downloaded and added to the pool.
Once the client has all the .module files for the GUIDs that make up an app, they can all be compiled, linked, and the result cached in the pool.
Thus, if an app is updated, only the changed .module files ever need to get downloaded. This can be taken a step further and a changed .module file can be represented as a diff from a previous .module.
Since .module files are tokenized source, two source files that differ only in comments and whitespace will have identical .module files.
There will be a master pool of .module files on WT's server. When an app is ready to release, it is "checked in" to the master pool by assigning GUIDs to its .module files. This master pool is what is consulted by the client when requesting .module files by GUID.
The D "VM" compiler, linker, engine, etc., can also be identified by GUIDs. This way, if an app is developed with a particular combination of tools, it can specify the GUIDs for them in the manifest. Hence the client will automatically download "VM" updates to get the exact tools needed to duplicate the app exactly.
https://edolstra.github.io/pubs/nspfssd-lisa2004-final.pdf
Another possibly related idea is the language Unison:
Check it out. The whitepaper's a fairly digestible read, too, and may get you excited about the whole concept (which is VERY different from how things are normally done, but ends up giving you guarantees)
Of course we didn't have them when the white paper was written, so that's fair but technology has moved on.
Consider three packages, A, B, and C. B has two versions, A and C have one.
- A-1.0.0 depends on B-2.0.0 and C-1.0.0. - C-1.0.0 depends on B-1.0.0.
If A gets a path to a file in B-2.0.0 and wants to share it with C (for example, C might provide binaries it can run on files, or C might be a daemon), it needs C to be in a mount namespace with B-2.0.0. However, without Nix-store-like directory structure, mounting B-2.0.0's files will overwrite B-1.0.0's, so C may fail to start or misbehave.
Namespaces don’t track transitive dependencies, guarantee reproducible builds, enable rollback, or let you deploy an exact closure elsewhere. They’re sandboxing tools—not package management or infra-as-code.
If anything, the two are complementary. You can use Nix to build a system with a precise closure, and namespaces to sandbox it further. But calling namespaces a "more complete solution" is like calling syscall filtering a replacement for source control.
Also, minor historical nit: most namespaces existed by the late 2000s; Nix’s whitepaper was written after that. So the premise isn’t even chronologically correct.
A reference from 1989:
https://books.google.com/books?id=CbsaONN5y1IC&pg=PP75#v=one...
And in any case you had a specific requirement above ("Given a collection of files, but not the git repo they're from, and libgit, I can't say if those files match a git tag hash"), and in fact this can be done!
I'm at a loss. You keep saying something can't be done, but it can, and it's not even hard.
At the end of the day none of us want "exactly this hash" we want "latest". Exact hashes and other reproducibility are things which are useful when debugging or providing traceability - valuable but also not the human side of the equation.
The GUID can certainly be a hash.
It can’t be, because a GUID is supposed to be a globally unique. The point is, it needs to instead be the hash of the content.
This can’t be an afterthought.
Theoretically speaking, UUIDs have a semantic guarantee that each generated identifier is unique across all systems, times, and contexts, whereas cryptographic hashes are deterministic functions (i.e. they produce the same output for the same input), there is no inherent randomness or timestamping, unless you deliberately inject it such as the way ioquake3 forks did with GUID.
UUIDv4's output size is 122 bits usable, so 1 in 2^122 chance of collision, whereas SHA-512 and BLAKE2b has 512 bits, which has a 2^256 collision resistance, bound by the birthday problem.
In any case, SHA-256, SHA-512, BLAKE2b (cryptographic hashes) are unique in practice, meaning they are extremely unlikely to collide, more so than UUIDv4, despite UUIDv4 being non-deterministic, while cryptographic hashes are deterministic.
Of course, you should still know when to use cryptographic hashes vs. UUIDs. UUIDs are good for database primary keys, identifying users globally, tracking events, and the rest, such as verifying file content, deduplicating data by content, and tamper detection is the job of a cryptographic hash.
But to get to the chase: GUIDs (Globally Unique Identifiers) are also known as UUIDs (Universally Unique Identifiers), so they are the same!
I hope this answers OP's (kbolino) question. He was right, GUIDs are the same as UUIDs. Parent confused GUIDs with cryptographic hashes, most likely.
---
FWIW, collision resistance (i.e. birthday bound) is not improved by post-quantum algorithms. It remains inherently limited by 2^{n/2}, no matter what, as long as they use hashing.
---
TL;DR: GUIDs (Globally Unique Identifiers) are also known as UUIDs (Universally Unique Identifiers), so they are the same, i.e. GUIDs and UUIDs are NOT different!
Tremulous (ioquake3 fork) had GUIDs from qkeys.
https://icculus.org/pipermail/quake3/2006-April/000951.html
You can see how qkeys are generated, and essentially a GUID is:
Cvar_Get("cl_guid", Com_MD5File(QKEY_FILE, 0), CVAR_USERINFO | CVAR_ROM);
So, in this case, GUID is the MD5 hash of the generated qkey file. See "CL_GenerateQKey" for details.> On startup, the client engine looks for a file called qkey. If it does not exist, 2KiB worth of random binary data is inserted into the qkey file. A MD5 digest is then made of the qkey file and it is inserted into the cl_guid cvar.
UUIDs have RFCs, GUIDs apparently do not, but AFAIK UUIDs are also named GUIDs, so...
https://en.wikipedia.org/wiki/Merkle_tree
Except that instead of a GUID, it's just a hash of the binary data itself, which ends up being more useful because it is a natural key and doesn't require storing a separate mapping
https://softwaremill.com/trying-out-unison-part-1-code-as-ha...
This is currently done in a haphazard way, not particularly organized.
I started using git around 2007 or so because that company I worked for at the time used ClearCase, without a doubt the most painful version manager I have ever used (especially running it from a Linux workstation). So I wrote a few scripts that would let me mirror a directory into a git repo, do all my committing in git, then replay those commits back to ClearCase.
I can't recall how Git came to me attention in the first place, but by late 2008 I was contributing patches to Git itself. Junio was a kind but exacting maintainer, and I learned a lot about contributing to open source from his stewardship. I even attended one of the early GitTogethers.
As far as I can recall, I've never really struggled with git. I think that's because I like to dissect how things work, and under the covers git is quite simple. So I never had too much trouble with its terribly baroque CLI.
At my next job, I was at a startup that was building upon a fork of Chromium. At the time, Chromium was using subversion. But at this startup, we were using git, and I was responsible for keeping our git mirror up-to-date. I also had the terrible tedious job of rebasing our fork with Chromium's upstream changes. But boy did I get good at resolving merge conflicts.
Git may be the CLI I've used most consistently for nearly two decades. I'm disappointed that GitHub became the main code-review tool for Git, but I'll never be disappointed that Git beat out Mercurial, which I always found overly ridged and was never able to adapt it to my workflow.
Ah, ClearCase! The biggest pain was in your wallet! I saw the prices my company paid per-seat for that privilege -- yikes!
ClearCase is a terrible version control system I wouldn't wish on my worst enemy, but it did have some good points that git still doesn't have. Large binary file support, configuration records, winkin, views.
With various big companies going towards giant monorepos and the local git repo just being a view into the super-centralized repo, I think they will re-invent parts of ClearCase.
I’ve never used other source control options besides git, and I sometimes wonder if I ever will!
I guess I started software dev at a magic moment pre-git but after SVN was basically everywhere, but it felt even more like it had been around forever vs the upstart git.
Version control systems where you didn't have shallow branches ( and thus each "branch" took a full copy / disk space of files) were awful.
version control systems which would have corruption data-bases (Here's to you Visual source safe) were awful.
Subversion managed to do better on all those issues, but it still didn't adequately solve distributed working issues.
It also didn't help that people often configured SVN to run with the option to add global locks back in, because they didn't understand the benefit of letting two people edit the same file at the same time.
I have a soft-spot for SVN. It was a lot better than it got credit for, but git very much stole the wind from under its sails by solving distributed (and critically, disconnected/offline) workflows just a bit better that developers could overlook the much worse UX, which remains bad to this day.
I think it was more that they were afraid that a merge might some day be non-trivial. Amazing how that fear goes away once you've actually had the experience.
(I had to check because of this thread. SVN and Git initial releases were apparently about 4 and a half years apart. I think it was probably about 6 years between the time I first used SVN and the time I first used Git.)
Doing `ci -l` on a file is better and faster than `cp fstab fstab.$(date +%Y%m%d.%H%M%S)`
Wikipedia tells me the initial release of Subversion was in late 2000, and for git it was 2005 - but although those were kinda just smack in the middle of my first years online, learning to code, starting with FLOSS work, and so on - I think those years were pretty important with the shift to the WWW and then web 2.0.
I basically don't remember a world without SVN, but that's probably because I just missed the cutoff and projects and companies were migrating from CVS from 2002 on or so, because the model was very similar and while it wasn't drop in, it made sense.
For git I want to say it took just a little longer, and the decentralized model was so different that people were hesitant, and before github in 2009 (I know it was founded in 2008, but my user id is below 50000 and it felt very much new and not at all widespread in non-rails circles before that) I would have called it a bit niche, actually - so it's more like a 7year span. But of course I was living in my bubble of university, and working for 2 small companies and as a freelancer in that time. I think bigger FLOSS projects only started migrating in droves after 2010/2011. But of course my timeline could be just as wrong :D
Subversion was so awful that it had to be replaced ASAP.
Subversion was basically a better CVS. My recollection is that plenty of people were more than happy to switch to CVS or Subversion (even on Windows) if it meant they could escape from something as legitimately awful as VSS. Whereas the switch from Subversion to Git or Mercurial had more to do with the additional powers of the newer tools than the problems of the older ones.
Small remark:
> As far as I can tell, this is the first time the phrase “rebase” was used in version control
ClearCase (which I had a displeasure to use) has been using the term "rebase" as well. Googling "clearcase rebase before:2005" finds [0] from 1999.
(by the way, a ClearCase rebase was literally taking up to half an hour on the codebase I was working on - in 2012; instant git rebases blew my mind).
[0] https://public.dhe.ibm.com/software/rational/docs/documentat...
Famous last words: "We'll do it the right way later!"
FWIW, I just found out you can sign commits using ssh keys. Due to how pinentry + gnupg + git has issues on OpenBSD with commit signing, I just moved to signing via ssh. I had a workaround, but it was a real hack, now no issues!
20 years, wow seems like yesterday I moved my work items from cvs to git. I miss one item in cvs ($Id$), but I learned to do without it.
A couple of differences:
- it's possible to specify signing keys in a file inside the repository, and configure git to verify on merge (https://github.com/wiktor-k/ssh-signing/). I'm using that for my dot config repo to make sure I'm pulling only stuff I committed on my machines.
- SSH has TPM key support via PKCS11 or external agents, this makes it possible to easily roll out hardware backed keys
- SSH signatures have context separation, that is it's not possible to take your SSH commit signature and repurpose it (unlike OpenPGP)
- due to SSH keys being small the policy file is also small and readable, compare https://github.com/openssh/openssh-portable/blob/master/.git... with equivalent OpenPGP https://gitlab.com/sequoia-pgp/sequoia/-/blob/main/openpgp-p...
Professionally, I went from nothing to RCS then to CVS then to git.
In all cases I was the one who insisted on using some kind of source code control in the group I worked with.
Not being an admin, I set up RCS on the server, then later found some other group that allowed us to use their CVS instance. Then when M/S bought github the company got religion and purchased a contract for git.
Getting people to use any kind of SC was a nightmare, this was at a fortune 500 company. When I left, a new hire saw the benefit of SC and took over for me :)
In the old days, loosing source happened a lot, I did not what that to happen when I was working at that company.
However, I don't think you would want to use the SHA, since that's somewhat meaningless to read. You would probably want to expand ID to `git describe SHA` so it's more like `v1.0.1-4-ga691733dc`, so you can see something more similar to a version number.
It also is a testament to the backwards compatibility of Git that even after 17 years, most of the contents of that book are still relevant.
This is actually the part I would be interested in, coming from a GitHub cofounder.
Also, in a PR, I find that people just switch to Files Changed, disregarding the sequence of the commits involved.
This intentional de-emphasis of the importance of commit messages and the individual commits leads to lower quality of the git history of the codebase.
I never understood the "git cli sucks" thing until I used jj. The thing is, git's great, but it was also grown, over time, and that means that there's some amount of incoherence.
Furthermore, it's a leaky abstraction, that is, some commands only make sense if you grok the underlying model. See the perennial complaints about how 'git checkout' does more than one thing. It doesn't. But only if you understand the underlying model. If you think about it from a workflow perspective, it feels inconsistent. Hence why newer commands (like git switch) speak to the workflow, not to the model.
Furthermore, some features just feel tacked on. Take stashing, for example. These are pseudo-commits, that exist outside of your real commit graph. As a feature, it doesn't feel well integrated into the rest of git.
Rebasing is continually re-applying `git am`. This is elegant in a UNIXy way, but is annoying in a usability way. It's slow, because it goes through the filesystem to do its job. It forces you to deal with conflicts right away, because git has no way of modelling conflicts in its data model.
Basically, git's underlying model is great, but not perfect, and its CLI was grown, not designed. As such, it has weird rough edges. But that doesn't mean it's a bad tool. It's a pretty darn good one. But you can say that it is while acknowledging that it does have shortcomings.
Take for example the "index" which is actually a useful thing with a bad name. Most tutorials start by explaining that the index is a staging area on which you craft your commit. Then why is it called index and not staging area? Incredibly bad name right there from the get go. If you ask what the word "index" means in computer science, people usually think of indices into an array, or something like a search index that enables faster searching. Git's index doesn't do any of that.
And git's model leaks so much implementation detail that many people mistake these for essential concepts; there are people who would tell you any version control system that doesn't have the "index" is not worth using because they don't allow one to craft beautiful commits. That's patently false as shown by jj and hg. This useful concept with a bad name becomes one amorphous thing that people cannot see past.
Nitpick, but that's not a distributed workflow. It's distributed because anyone can run patch locally and serve the results themselves. There were well known alternative git branches back then, like the "mm" tree run by Andrew Morton.
The distributed nature of git is one of the most confusing for people who have been raised on centralised systems. The fact that your "master" is different to my "master" is something people have difficulty with. I don't think it's a huge mental leap, but too many people just start using Gitlab etc. without anyone telling them.
Larry shortly told everyone he wasn't going to keep giving BK away for free, so Linus went off for a weekend and wrote a crappy "content manager" called git, on top of which perhaps he thought someone might write a proper VC system.
and here we are.
a side note was someone hacking the BitKeeper-CVS "mirror" (linear-ish approximation of the BK DAG) with probably the cleverest backdoor I'll ever see: https://blog.citp.princeton.edu/2013/10/09/the-linux-backdoo...
see if you can spot the small edit that made this a backdoor:
if ((options == (__WCLONE|__WALL)) && (current->uid = 0)) retval = -EINVAL;
help
The room erupted with applause and laughter.
A bk client was hacked by the audience in 2 minutes.
It was the most devastating take down I’ve ever seen of the attacks. Linus later said the “git” wasn’t Tridgell at all, but in fact Linus himself.
I think that was the only lca Linus missed for a few years either side.
Initially it was jarring to not get a different working directory for each branch, but I soon got used to it. Working in the same directory for multiple branches means that untracked files stay around - can be helpful for things like IDE workspace configuration, which is specific to me and the project, but not the branch.
You can of course have multiple clones of the repository - even clones of clones - but pushing/pulling branches from one to another is a lot more work than just checking out a branch in a different worktree.
My general working practice now is to keep release versions in their own worktree, and using the default worktree (where the .git directory lives) for development on the main branch. That means I don't need to keep resyncing up my external dependencies (node_modules, for example) when switching between working on different releases. But I can see a good overview of my branches, and everything on the remote, from any worktree.
Because Github was better than Bitbucket? Or maybe because of the influence of kernel devs?
Mercurial on windows was "download tortoisehg, use it", whereas git didn't have a good GUI and was full of footguns about line endings and case-insensitivity of branch names and the like.
Nowadays I use sublime merge on Windows and Linux alike and it's fine. Which solves the GUI issue, though the line ending issue is the same as it's always been (it's fine if you remember to just set it to "don't change line endings" globally but you have to remember to do that), and I'm not sure about case insensitivity of branch names.
Pretty sure Mercurial handles arbitrary filenames as UTF-8 encoded bytestrings, whether there was a problem with this in the past I can't recall, but would be very surprised if there was now.
Edit: does seem there at least used to be issues around this:
https://stackoverflow.com/questions/7256708/mercurial-proble...
though google does show at least some results for similar issues with git
Github was more popular than Bitbucke, so git unfortunately won.
And it isn't as if I haven't used RCS, SCCS, CVS, Clearcase, TFS, Subversion, Mercurial before having to deal with Git.
Once you transition your mental model from working branch with a staging area to working revision that is continuously tracking changes, it's very hard to want to go back.
In both cases, it's just metadata that tooling can extract.
Edit: then again, I've dealt with user error with the fragile semantics of trailers, so perhaps a header is just more robust?
But ignore all that: the actual _outcome_ we want is that it is just really nice to run 'jj gerrit send' and not think about anything else, and that you can pull changes back in (TBD) just as easily. I was not ever going to be happy with some solution that was like, "Do some weird git push to a special remote after you fix up all your commits or add some script to do it." That's what people do now, and it's not good enough. People hate that shit and rail at you about it. They will make a million reasons up why they hate it; it doesn't matter though. It should work out of the box and do what you expect. The current design does that now, and moving to use change-id headers will make that functionality more seamless for our users, easier to implement for us, and hopefully it will be useful to others, as well.
In the grand scheme it's a small detail, I guess. But small details matter to us.
While you're around, do you know why Jujutsu created its own change-id format (the reverse hex), rather than use hashes (like Git & Gerrit)?
Jujutsu mostly doesn't care about the real "format" of a ChangeId, though. It's "really" just any arbitrary Vec<u8> and the backend itself has to define in some way and describe a little bit; the example backend has a 64-byte change ID, for example.[1] To the extent the reverse hex format matters it's mostly used in the template language for rendering things to the user. But you could also extend that with other render methods too.
[1] https://github.com/jj-vcs/jj/blob/5dc9da3c2b8f502b4f93ab336b...
Signed-off-by: Alice <alice@example.com>
Signed-off-by: Bob <bob@example.com>
is totally fine, but Change-id: wwyzlyyp
Change-id: sopnqzkx
is not.I've also heard of issues with people copy/pasting commit messages and including bits of trailers they shouldn't have, I believe.
ignore, misread the above
Been using it on top of git, collaborating with people via Github repos for ~11 mos now. I'm more efficient than I was in git, and it's a smoother experience. Every once and a while I'll hit something that I have to dig into, but the Discord is great for help. I don't ever want to go back to git.
And yes, jj on top of git in colocated repos (https://jj-vcs.github.io/jj/v0.27.0/git-compatibility/#co-lo...).
If you set explicit bookmark/branch names when pushing to git remotes, no one can tell you use jj.
> How long did it take you to become proficient?
As with anything, it varies: I've heard some folks say "a few hours" and I've had friends who have bounced off two or three times before it clicks.
Personally, I did some reading about it, didn't really stick. One Saturday morning I woke up early and decided to give it a real try, and I was mostly good by the end of the day, swore of git entirely a week later.
> I assume the organization uses git and you use jujitsu locally, as a layer on top?
This is basically 100% of usage outside of Google, yeah. The git backend is the only open source one I'm aware of. Eventually that will change...
1. jj operates on revisions. Changes to revisions are tracked automatically whenever you run jj CLI
2. revisions are mutable and created before starting working on a change (unlike immutable commits, created after you are done)
3. you are not working on a "working directory" that has to be "commited", you are just editing the latest revision
everything just clicks and feels very natural and gets out of the way. Want to create a new revision, whether it's merge, a new branch, or even insert a revision between some other revisions? That's jj new. Want to move/reorder revisions? That's jj rebase. Want to squash or split revisions? That's jj squash and jj split respectively. A much more user-friendly conflict resolution workflow is a really nice bonus (although, given that jj does rebasing automatically, it's more of a requirement)
One notably different workflow difference, is absence of branches in the git sense and getting used to mainly referring individual revisions, but after understanding things above, such workflow makes perfect sense.
And yes, I use jj locally with remote git repos.
I think you need to check your history. In the early days, before closed/proprietary software, source code was often shared between programmers.
Lets look at text editors. They did not begin "closed source" - but companies have a team of people to help SELL their products, even if they are inferior to whats already out there.
Baically, once computers matured was an opportunity to make a buck. Companies started selling their products for a fee. I would not be surprised if the source code was included before someone realised people can just pay for the executable. More money can be made by excluding the source code so new updates can also be for a fee.
(Lets not talk about "End user Licence Agreements" in this post, OK)
The "dominance" of closed source is really about companies with money controlling the status quo, with lawyers and sales teams knowing how to push it in favour.
Companies like Micro$oft today have soo much money they dictate the direction our computers systems are going. They push it in a direction that favours them. They have a hand controlling the flow, like other big companies having a hand trying to change to stream for their intent and purposes.
This is why -- whether you love him or hate him, I have much respects for people like Richard Stallman or others like Linus Torvalds.. to name a few!
You want to talk about "innovation" ?? What do you think these "Closed source innovations" are build with? Software is created using a Programming language such as Python, C++, C, Javascript, etc... the VAST MAJORITY being free to use, under some sort of Open Source community!
Lets also look at large companies in general.. many of which are not innovating... they are just purchasing smaller companies that are trying to do new things... closed source software or not.
Lastly, lets also be honest that innovation is not created out of thin air -- everything is inspired by something previously.. whether a failed experiment or something already successful. New ideas come about with more failures.. but comes further ideas until, eventually, we have something successful!
Linux may be inspired by other Operating Systems, and those were inspired by other things, etc. Innovation is progressive. Point I am making, if any company found ANY opportunity to build a closed-source version to shut down a popular Open Source equiverlant... THEY WOULD DO IT!