[1] https://stackoverflow.com/questions/55998614/merge-made-by-r...
> Jujutsu keeps track of conflicts as first-class objects in its model; they are first-class in the same way commits are, while alternatives like Git simply think of conflicts as textual diffs. While not as rigorous as systems like Darcs (which is based on a formalized theory of patches, as opposed to snapshots), the effect is that many forms of conflict resolution can be performed and propagated automatically.
Ends up being circular if the author used LLM help for this writeup though there are no obvious signs of that.
Maybe that's obvious to most people, but it was a bit surprising to see it myself. It feels weird to think that LLMs are being trained on my code, especially when I'm painfully aware of every corner I'm cutting.
The article doesn't contain any LLM output. I use LLMs to ask for advice on coding conventions (especially in rust, since I'm bad at it), and sometimes as part of research (zstd was suggested by chatgpt along with comparisons to similar algorithms).
blocking openai ips did wonders for the ambient noise levels in my apartment. they're not the only ones obviously, but they're they only ones i had to block to stay sane
One of the funniest things I've started to notice from Gemini in particular is that in random situations, it talks with english with an agreeable affect that I can only describe as.. Indian? I've never noticed such a thing leak through before. There must be a ton of people in India who are generating new datasets for training.
Great argument for not using AI-assisted tools to write blog posts (especially if you DO use these tools). I wonder how much we're taking for granted in these early phases before it starts to eat itself.
For others, I highly recommend Git from the Bottom Up[1]. It is a very well-written piece on internal data structures and does a great job of demystifying the opaque git commands that most beginners blindly follow. Best thing you'll learn in 20ish minutes.
I think theoratically, Git delta-compression is still a lot more optimized for smaller repos. But for bigger repos where sharding storaged is required, path-based delta dictionary compression does much better. Git recently (in the last 1 year) got something called "path-walk" which is fairly similar though.
Hmm, dont be so hard on yourself!
proceeds to call ls from rust
Ok nevermind, although I dont think rust is the issue here.
(Tony I'm joking, thanks for the article)
And this way of versionning can be reused in other fields, as soon as have some kind of graph of data that can be modified independently but read all together then it makes sense.
P.S. Didn't know that plain '@' can be used instead of HEAD, but I guess it makes sense since you can omit both left and right parts of the expressions separated by '@'
That's a weird thing to put so close to the start. Compression is about the least interesting aspect of Git's design.
Some reading from 2021: https://jolynch.github.io/posts/use_fast_data_algorithms/
It is really hard to describe how slow sha256 is. Go sha256 some big files. Do you think it's disk IO that's making it take so long? It's not, you have a super fast SSD. It's sha256 that's slow.
Furthermore, if your input files are large enough that parallelizing across multiple cores makes sense, then it's generally better to change your data model to eliminate the existence of the large inputs altogether.
For example, Git is somewhat primitive in that every file is a single object. In retrospect it would have been smarter to decompose large files into chunks using a Content Defined Chunking (CDC) algorithm, and model large files as a manifest of chunks. That way you get better deduplication. The resulting chunks can then be hashed in parallel, using a single-threaded algorithm.
How about using sqlite for this? Then you wouldn't need to parse anything, just read/update tables. Fast indexing out of the box, too.
[0] https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki#...
I also use Fossil for lots of weird things. I created a forum game using Fossil’s ticket and forum features because it’s so easy to spin up and for my friends to sign in to.
At work we ended up using Fossil in production to manage configuration and deployment in a highly locked down customer environment where its ability to run as a single static binary, talk over HTTP without external dependencies, etc. was essential. It was a poor man’s deployment tool, but it performed admirably.
Fossil even works well as a blogging platform.
[0]: https://fossil-scm.org/home/doc/trunk/www/rebaseharm.md
I really enjoy how local-first it is, as someone who sometimes work without internet connection. That the data around "work" is part of the SCM as well, not just the code, makes a lot of sense to me at a high-level, and many times I wish git worked the same...
But yeah, fossil is interesting, and it's a crying shame its not more well known, for the exact reasons you point out.
It isn't though, Fossil integrates all the data around the code too in the "repository", so issues, wiki, documentation, notes and so on are all together, not like in git where most commonly you have those things on another platform, or you use something like `git notes` which has maybe 10% of the features of the respective Fossil feature.
It might be useful to scan through the list of features of Fossil and dig into it, because it does a lot more than you seem to think :) https://fossil-scm.org/home/doc/trunk/www/index.wiki
It is very easy to self host.
Not having staging is awkward at first but works well once you get used to it.
I prefer it for personal projects. In think its better for small teams if people are willing to adjust but have not had enough opportunities to try it.
I know this is only meant to be an educational project, but please avoid yaml (especially for anything generated). It may be a superset of json, but that should strongly suggest that json is enough.
I am aware I'm making a decade old complaint now, but we already have such an absurd mess with every tool that decided to prefer yaml (docker/k8s, swagger, etc.) and it never got any better. Let's not make that mistake again.
People just learned to cope or avoid yaml where they can, and luckily these are such widely used tools that we have plenty of boilerplate examples to cheat from. A new tool lacking docs or examples that only accepts yaml would be anywhere from mildly frustrating to borderline unusable.
"What's inside .git ?" - https://prakharpratyush.com/blog/7/