https://alchemists.io/articles/git_trailers
These are key-value structures data that can be included on a commit when it is created. These are used by some systems for attaching metadata. For example, Gerrit uses this for attaching its Change-Id.
https://www.postgresql.org/docs/17/sql-comment.html
This allows you to attach text to various database objects in PostgreSQL.
I wish PostgreSQL had a feature that was more like structured key-value database object metadata that could be edited.
Compare:
https://github.com/jchester/spc-kit/blob/eb2de71d815b0057e20...
To:
https://github.com/jchester/spc-kit/blob/main/sql/02-spc-int...
Basically the original rendering makes me look incompetent to a casual skimmer. Plus tools like JetBrains IDEs can suss out what comments belong to what DDL anyway.
The COMMENT feature isn't even a good choice for a VIEW, PROCEDURE, or FUNCTION, each of which already supports comments inline in the object definition on the server. No, the main benefits are adding comments to objects that DON'T retain them, like a TABLE, COLUMN, CONSTRAINT, ROLE, etc.
So in file 02-… you have your “create schema”, “create view” and so on. And then in file 03-… you have only the “comment on” statements that go with the things from file 02. And then file 04-… contains “create schema” and “create view” and so on, and file 05-… has the “comment on” statements for file 04-….
And in addition you could then add dash dash comments in 02 and 04 referring to files 03 and 05. And in file 03 and 05 at the top mention that these are valid SQL for PostgreSQL and that GitHub has trouble rendering them properly.
It’s a bit messy of course, but that’s why I say it’s a possible workaround rather than a great solution. Could be worth considering and trying, anyway.
I don't know why they mandate it to be the last trailer unless it's for regex reasons
It seems git trailers would now be the better place to put that information.
Regarding change ids: I wish git itself had those, as then also the tooling would understand them. Identifying commits by their commit messages is fragile, in particular when you may update those for an MR. While commit id truly identifies the commit in a unique way, it is not a useful identifier when the same changes could be moved on top of some other commit.
edit: Oh it looks like they are actually part of the commit, whereas notes aren't, so it wouldn't be a good replacement for my use.
Projects for which mutable changes are a unit of work are working on standardising that: https://lore.kernel.org/git/CAESOdVAspxUJKGAA58i0tvks4ZOfoGf...
They don't need git support, but it might eventually become first-class.
- That way, tests will be skipped when the contents of the commit are the same, while remaining insensitive to things like changes to the commit message or squashes.
- But they'll re-run in situations like reordering commits (for just the reordered range, and then the cache will work again for any unchanged commits after that). I think that's important because notes will follow the commits around as they're rewritten, even if the logical contents are now different due to reordering? Amending a commit or squashing two non-adjacent commits may also have unexpected behavior if it merges the notes from both sides and fails to invalidate the cache?
- This is how my `git test` command works https://github.com/arxanas/git-branchless/wiki/Command:-git-...
---
I've also seen use-cases might prefer to use/add other things to the cache key:
- The commit message: my most recent workflow involves embedding certain test commands in the message, so I actually do want to re-run the tests when the test commands change.
- The patch ID: if you specifically don't want to re-run tests when you rebase/merge with the main branch, or otherwise reorder commits.
Unfortunately, I don't have a good solution for those at present.
Of course, if the notes mechanism didn't exist, then I could have just used a local file.. But it's nice to see the messages in the git log.
But yeah, both kinds of keys would be useful for this purpose, depending on the exact needs.
I'm a big fan of conventional commits, and trailers seem like a better way of adding such metadata.
Is adding them manually to the commit message functionally equivalent to using the `--trailer` flag?
Yes. The flag is perfect for scripts but it's exactly equivalent to adding the text manually.
I mainly find them helpful for sticking to atomic commits. If a change doesn't align with the commit type, or it touches too many parts of the codebase, that means it should be in a separate commit.
Is there anything equivalent -- that handles tracking changes over commits etc better than GH -- that is more actively developed and friendly for integration with GH? I hate GH's code review tools with the heat of 10,000 suns.
To be honest, though, I find it easiest to create several branches with Jujutsu and then manually chain the MRs. That’s what glab does under the hood with glab stack commands. Looking forward to the code review tools in a future version.
For GitHub, though, I think Graphite is the best tool I’ve looked at so far, but I use GitLab at work so I’m not the best judge of GitHub tools for lack of experience using them at scale.
1. "Please change this" 2. <I change it, and force-push the change [cuz I don't like a messy git history]> 3. Comment keeps association with the original line and/or its new replacement.
Gerrit has no problem w/ this flow. GH and GL both can't do it.
GH wants to force you to put a pile of "fix" commits in and then either do a merge commit (eww) or squash the whole thing into one commit (not ideal in some cases).
This is based on what I remember (haven’t used gerrit in a while), so it may not be accurate.
I used gerrit in my previous job and miss using it. Would definitely prefer it over GitHub which is more popular (and convenient of course, can’t deny that).
I’d note that it works that way presently, but the teams behind git, gerrit, jj-vcs, and a couple of other relevant stakeholders have an email thread going in which, from what I understand, they discuss standardizing on the approach taken by jj-vcs:
https://lore.kernel.org/git/CAESOdVAspxUJKGAA58i0tvks4ZOfoGf...
The best way to track meta history is to have it baked into the VCS, so here Mercurial is king, and heptapod (a friendly fork of Gitlab meant to support Mercurial repos and concepts) apparently does a good job at it since it's used for Mercurial's own development (after they transitioned from mailing lists to Gerrit? to phabricator to Heptapod)
I think the problem is exacerbated by the fact that issue trackers follow fashion; and it’s more common that you are using the flavor of the week; and that flavor isn’t close to feature complete; and new features get added at a glacial pace.
I suppose this is a long winded way of stating how annoyed I am with branch names derived from linear ticket’s titles for tracking purposes, and I wish I could use some other form of metadata to associate commits with an issue, so that I could have more meaningful PR titles (enforced that I must use the linear branch name as the title).
Though I’ll admit that it’s an issue of a size that’s more appropriate to gripe about on the internet than try to change.
being able to use them with `git log` format is pretty cool.
Lack of support was a big problem.
What happens on interactive rebases, e.g. if I squash multiple commits into a single one?
I see the same problem with attaching notes to blobs and trees: It's not doing what you might think it does: It feels like it would attach metadata to a file or directory in the repo, but it really attaches the metadata to some specific content:
E.g. if you have a blob that encodes the string "Hello world!" and attach the note to it, would git associate that note with all files that have that content?
Also, if you change one file to "Hello, world!", would it lose the notes?
This is configurable. By default, amend and rebase operations will copy them forward. See git-config(1) under `notes.rewrite`.
> Set it to refs/notes/commits to enable rewriting for the default commit notes.
Why wouldn’t that at least be the default? Why is rewriting off by default in the first place?
The Acked-By and mailing list discussion link examples don't seem to be good examples. Both of these are likely already known when the commit is made. And git commit message basically can have an unlimited length, so you could very well copy all the discussions about the commit that happened on a forge into the commit message itself.
One use case I think might be a better example is to add a git note to a commit that has later been reverted.
Maybe I’m weird that way. I’ve had too many coworkers who don’t really even look at annotations to remind themselves why this code was written in the first place. They will just yolo and hope nobody ties the problems back to them. But once you’ve dealt with an irate customer who waited impatiently for a bug to be fixed, and only to have the bug be reintroduced a short time later, you may become more circumspect about bug fixes.
There’s often a refactor needed to fix multiple bugs at once. There’s often refactor can open up new feature opportunities, or performance improvements.
Discussion regarding a commit (is: review) and acknowledgment of a commit cannot happen before the commit has been made.
> One use case I think might be a better example is to add a git note to a commit that has later been reverted.
Commit messages are better for this use case. When you got blame a file, it shows the latest changes for that file. If a commit reverts changes from another commit, the newer commit that reverts the older commit will show up in the blame.
It can't happen before the commit on a feature branch, but it can happen before merging the commit back to the main development branch. Given that a rebase or merge commit is already frequently necessary to integrate changes from a feature branch after review is finished, I don't see why this type of info couldn't be added (or even required to exist) before merging.
Yes, you're supposed to avoid moving later commits across a rebase... but the reason you're supposed to avoid that is because git is so bad at it.
Where I agree with your take partially is that the UX for all of this in git is not great, and that ends up meaning that most people don't actually use git in this way. If the process of manually structuring the commits to clean up history on a feature branch were more intuitive, then I'd predict the issues of history-destroying rebases to essentially be moot; everyone would just present the commits for review exactly as we'd want them before review, and then we'd fast-forward merge into the main development branch without any issue. The problem is that doing this sort of restructuring after the fact isn't actually easy to learn how to do because of the poor ergonomics of the git CLI, so it's both tedious and error-prone for almost everyone. My perspective is that most of the concern around messing with history in git comes from being burnt by changes that were not intended by the one making them, and that workflows that avoid it (like merge commits) are essentially a compromise to avoid that risk by accepting a certain amount of cruft in the history of the main development branch. I don't blame anyone for using them, since the problems that make the rebase workflow harder are very real, but I don't think that the fact that rebase changes history is the real issue as much as it provides a mechanism for the actual underlying issues to manifest.
Git is a decentralized version control system. It is effectively a blockchain like Bitcoin but without the consensus mechanism, and just like transactions are final on the Bitcoin network, pushed commits are final in git. You can agree to use one branch over another, but if you are using git as designed (i.e. as a decentralized system), it can be confusing. Merge commits are how you can resolve that confusion.
It is a fundamental flaw. Either git needs to work better at using the history in all of its warty and real glory (for example offering a separate mutable presentation-layer commit-log in front of the immutable data-layer commit-log), or needs to provide better automation and mapping concepts that allow you handle incoming code that has a different history from the current branch.
The git UI however is notoriously terrible, so your complain about presentation is probably justified, but it git itself offers facilities to keep a clean history without clobbering branches without changing the fundamentals.
For example, you can decide to make your clean branches only out of merge commits, so if you look only at the clean branches you will have the nice history you expect, but each commit will have a reference to a parallel ugly branch so you can see everything if you want to, without rewriting. To avoir conflicts polluting your clean branch, first merge the clean branch into the ugly branch, resolve the conflicts and do everything you need to do to stay up to date, then merge back into the clean branch, with a nice merge commit message. The clean branches merge commits are your "presentation layer".
It won't be mutable, but mutable anything is a problem in a distributed system like git that gives you availability but as per the CAP theorem, not consistency.
Like if I have branch X and branch Y, and X is 1 commit ahead of Y, but I alter the comment of a commit in Y, now X is one or more commits behind Y and will not recognize identical code-changes as identical.
It gets worse if you squash a commit, where you start getting conflicts during merges and rebases even though the code-changes are the same.
I understand why these problems happen, and the ways to prevent it (don't rebase anything pushed), but it still underscores the fact that git doesn't properly accommodate its own love of rewriting history.
If git had a proper synced graph of history-rewriting actions (commit A was rewritten into commit B) then it would be able to provide better responses when doing a merge or reflist across rewritten branches.
That's a huge usability problem with git. Even as simple as a "rebase on merge" or "squash on merge" automation makes it impractical to push your topic-branch and keep working locally on that same branch while your topic-branch is being reviewed and tested, because git doesn't retain any concept of the fact that "this block of commits has been transformed into that block of commits and so you can consider them one-and-the-same".
I'm the git sme at my office and I deeply resent the amount of time I have to spent training juniors and students around git's jagged edges.
Ideally git should have a proper in-repo objects that reify a relationship between rebased/squashed/amended/etc commits and their predecessors and exposes that when you ask things like "hey is commit X in ref Y?" it could say "no, but there is a modified version of that commit".
There’s also a fun loophole where you can edit other people’s commits when doing a merge and attribute bugs to someone else. I caught someone doing this once (they were terrible at git) on account of I was the one who reviewed the code that got changed, and I specifically looked for that class of bug before approving it. Git blame and the commit history no longer agreed and I was able to show what happened.
Commits are actually snapshots of the entire repository, not just diffs, so even if the diff is the same, if the base is different, it is not the same commit. And when you rebase, all the old commits will stay there until you run the garbage collector, and only if they don't have a head.
Most of the rest of us do not work this way, but they still do. The rest of us also only have to deal with three way merges most of the time, instead of octopus merges. Though I jokingly call, “fixing an incorrect three way merge” a “five way merge” because you end up doing a star shaped pattern of diffs to re-resolve the code to retain the intents of all three versions. A to merge, B to merge, merge to HEAD~, A to HEAD~ and B to HEAD~
> Here is a plea for all forges: make code review metadata available offline, inside git.
I think this will fall on deaf ears as far as commercial forges like GitHub go, since as you yourself observe:
> But much of the value of git repos ends up locked into forges, like GitHub.
For-profit enterprises are not generally excited about commoditising their own value-add. This is not a jab at GitHub -- I think GitHub do everything right (offer a great service, a very generous free tier, and make it possible to extract all your data via API if you want to shift providers). It's just the nature of any commercial operation.
Most of the times, the commit message is 10+ lines while the change itself is -1/+1.
We use GitHub for repo hosting and a separate issue tracker to coordinate changes. It bothers me a lot that GitHub UI doesn't render markdown for commit messages. We all write really detailed and nicely formatted commit messages, and had to work out a commit message sync so the issue tracker can display related commit messages in full Markdown glory.
I think that's fine. Unix philosophy is to focus on one thing and do that well.
I vaguely recall dismissing Notes as a solution to my problems. I may be recollecting some of this wrong, but IIRC the problem with Notes is that they aren’t batteries included. It’s easier to cajole devs into using new tools if the setup is simple and it doesn’t complicate their workflow. Notes fails this litmus test. Set it on by default and make it come down with pull and up with push instead of a separate activity.
It’s hard to explain to them that things like “mis en place” aren’t OCD but table stakes for sophisticated activities.
So, yet I understand that some tools do too much, I don't think that this is the case here.
Shameless plug: I recently did a webinar on how the pickaxe options are better than `git-blame`) that you can find here: https://nofluffjuststuff.com/webinar/142/level_up_your_git_g... (Note: It requires you to provide an email address).
(They stopped tracking these changes a few years ago, probably because the pace of changes to Apache OpenOffice slowed down to a trickle, and there's no longer much to be gained by cherry-picking these few changes.)
Whatever is needed goes into commit message and referencing tickets in separate system is a feature not a bug - because JIRA or any other system is used to communicate with non developers. Like business analysts don't get access to code or repositories at all for example or support people don't get access to the repositories and code.
Yeah I can see how one could write front end to get the notes visible/editable by non developers but it still does not make any sense because BA/Support others don't care about specific commits and a single feature might fit into a commit but most likely does not. Even more fun is when you have multi repo and your feature touches couple services then git notes are quite useless because then you really need reference to outside system.
Yes, but isn't it insane? What is the benefit from treating your own product as a black box? Yet that's mainstream. Sometimes I have the analyst (not on my team, but from a team we share a monorepo with) asking me questions that can be answered literally with a line of code. And she's a technical kind, knows SQL and such. And we write very idiomatic, high level code. But still, culture cannot change itself until it dies due to inherent inefficiency.
We had a technical guy once where I worked that wanted to force sales guys to use LaTex to write documentation and requirements in and store it in GIT. I feel bad for the guy as he was laughed out by sales guys and he did not understood why because those are such a great tools…
If you have manual testers they have their own set of tools and most of them don’t use or have working knowledge of GIT.
For code reviews everyone is using PRs and it is not because they don’t know about GIT notes but because no one is doing reviews per commit and there is no support for discussion and other tools that are baked in forges PR flows.
You can always use git notes for yourself but it is as I called it gimmick feature. I can make a bet you will use it for a week maybe couple weeks and then just stop because in description it sounds good but in practice - no one is using it.
I'll take that bet.
https://gitlab.com/gitlab-org/gitlab/-/issues/15029
You have to log in to read it unfortunately, but any gitlab.com account should work.
> This feature request is being closed as our current focus isn't in this area.
But I have this in my IRC logs:
< _jwilk> TIL git-notes rewriting doesn't work properly when doing amend within rebase. :/
One thing that would be really useful though - if you could somehow use git notes to tell Git not to download some blobs by default that would be great. It would solve the "someone added 100 MB of binary files to this project 5 years ago and then deleted them" problem.
That would be too useful though so I wont hold by breath.
This was useful when migrating a piece of functionality into its own repo and you want to preserve history. Adding these forced version tags into commits would be quite messy in the new repo where you switch to a new versioning scheme.
I did <insert research notes> and found no other places in the code base where this needs to be fixed.
And as the cover letter for a single patch (if needed/not cowered by the commit message).And also like a commit message on the iterations on the patches. So for a patch series that go over three versions the note may say what updates where done in versions 2 and 3.
And other than that I use notes for:
- Private notes on how I’ve manually tested the commit
- Link to CI
- A localized changelog for customers (who are not technical)
https://news.ycombinator.com/item?id=43971620
And less recently:
I have an issue tracker file that can be added to a project. While it's technically plain text, the interface for the file ensures that a format is used, and the format ensures that changes reflect only a single ticket.
Just as long as no one edits the file using a different program, it will work just fine.
Don't think anyone uses it, though.
I don't know if rationale is something better suited for the git commit log, or tagged by code function to an external "rationale" system of record.
Overall it felt elegant, and needed no maintenance after setting it up, but honestly it was never used. I think the need to look back in time was rarer than expected, and git notes being hidden by default didn’t help for awareness.
That way when I need to cherry-pick that commit, or do something similar (bump again), I can search for the hash of the commit I'm looking at to find what might be missing.
UI is worse than git-notes but no need for additional setup to sync them.
And if not rebasing, since --fixup does not include the hash only the commit message, it's bad for this.
> I just blogged about the new git-notes functionality over at the [Pro Git blog](dead link)
The link is archived at https://web.archive.org/web/20100828155504/http://progit.org...
Appending information to the commit itself creates a new commit and all the commits that are based on the commit will also have to change consequently.
Git notes would be ideal for annotating commits that contain commit hashes used as breadcrumbs to inform the developer (usually me months later) about context around previous work. These hashes might have changed due to a rebase or from using disk space optimization tools that rewrite history like these:
https://rtyley.github.io/bfg-repo-cleaner/
https://github.com/rtyley/bfg-repo-cleaner
https://github.com/newren/git-filter-repo
https://github.com/tiavision/GitRewrite
See also:
https://stackoverflow.com/questions/5613345/how-to-shrink-th...
https://stackoverflow.com/questions/1398919/make-git-consume...
https://stackoverflow.com/questions/2116778/reduce-git-repos...
https://stackoverflow.com/questions/3119850/is-there-a-way-t...
https://stackoverflow.com/questions/38789265/git-delete-some...
https://stackoverflow.com/questions/16057391/git-free-disk-s...
https://stackoverflow.com/questions/31423525/how-to-reduce-d...
https://stackoverflow.com/questions/16854425/compact-reposit...
https://stackoverflow.com/questions/13999191/trimming-huge-g...
https://stackoverflow.com/questions/4515580/how-do-i-remove-...
These methods are all uniquely terrible in various ways. Most likely user error on my part. I need this technique:
1. Choose a range of commit hashes (or hashes before a commit) and remove them. This can be useful when splitting repos, for example on projects that started as backend+frontend where the frontend is being forked off in a new repo and the older backend portion needs to be removed from it for security/privacy.
2. Rebase all branches (including those that crossed the deleted portion) to preserve their structure but start/end as recently as possible. Optionally discard branches that were created and merged entirely within the deleted ported, unless they're the trunk of other branches that merge after the deleted portion.
3. Search for old commit hashes in commit messages and update them to the new hashes while rebasing.
4. Bonus points for updating stashes (or other git features) having any commit hashes in their names. Also for importing/exporting a list of important commit hashes for use in project management, such as updating hashes in comments on kanban boards like Jira.
5. More bonus points for searching for large files (such as app.js or other build artifacts) so that they can be stripped from commits in branches, preferably not on a main trunk like master.
If you followed this far, I could also use a technique that rebases merged branches so that they form a series of D shapes instead of overlapping B shapes (this is useful during git bisect). Ideally this would happen automatically or be enforced via rules on sites like GitHub and GitLab. I always rebase my branches before merging, but others can't be bothered.
https://www.atlassian.com/git/tutorials/merging-vs-rebasing
Where I'm going with this: I git reset and git cherry-pick constantly in my own branches before merging, so that each branch has a clean work history like the trunk. I think of this as quantum committing, because I keep exploring the problem space until I find a solution that collapses (merges) into the history.
The problem is that git GUIs are inadequate for this work. I need to be able to cut/copy/paste commits, drag and drop them for reordering, etc. It should also derive the commit diff needed to make a commit match a branch (or folder) rather than throwing a conflict in my face, so that it operates more like Apple's Time Machine. If I had this app, I could simply select all commits that I wanted to delete, it would ask me "this rewrites history, are you sure?", and then delete them and do the right thing for affected branches. It would also have infinite undo powered by git reflog.
The idea being that commit hashes should not take priority - it's all about the information. We should never be trapped by the state of the repo, because that creates anxiety.
So we're missing a tool to orchestrate git the way that Kubernetes orchestrates Docker.