Asking people to fit a meaningful description of the change into 50 characters is silly, and it's IMO the reason why so many of them just write "fix bug" and call it a day.
Someone else has posted the Google guide for CL (change list) messages, but let me boost the signal: https://google.github.io/eng-practices/review/developer/cl-d...
This is, I believe, still the best guide out there. When I'm coaching juniors, I recommend this guide, over the opinionated and outdated git "best practices" and I think the results are much better.
If a commit is sufficiently complex the long form could be 600 characters and the short form 200.
Can you give me an example of a commit where the "short, focused summary" can only be usefully-expressed in 200+ characters?
Notably, all of the "good" examples in https://google.github.io/eng-practices/review/developer/cl-d... have first lines under 72 characters.
Restrictions lead devs to write to useless messages like 'fixed a bug' rather than messages that are slightly verbose but actually useful.
Most messages wont be 200 chars. But id rather 200 chars of useful text than 72chars of useless text.
The real world is full of average devs that are average writers under time pressure to deliver code. Expecting them to deliver above average commit messages like googles examples is a pipe dream.
I think they should be (or at least "decent").
Should this reasoning be applied to other skills involved in software engineering? If someone never writes tests, or over-architects everything, or has terrible people skills, or never updates documentation, or doesn't bring attention to blockers, or constantly forgets to update the issue tracker, or doesn't follow the local code conventions, or works on random tasks and ignores high-priority ones, etc etc etc the solution isn't usually "don't ask them to do a thing they're bad at", it's "help them get better".
The important question is whether "skimmable commit messages" is a thing you care about enough to advocate for and teach people about. Maybe you don't, and that's fine.
> But id rather 200 chars of useful text than 72chars of useless text.
I completely agree with this. I just don't think those are the only two options.
> Expecting them to deliver above average commit messages like googles examples is a pipe dream.
This thread started with praise for that Google guide and I assumed you were of similar opinion given your reply. That's why I kept referring to it.
Almost daily, I use commit messages and history as part of understanding why a decision was made, why a seemingly obvious alternative wasn’t chosen, etc. seeing the commit title on every line, and hovering to see the full message has become a core editor feature for me.
It’s kind of like testing, the more I do it, the more I want to do it because the value is so consistently reinforced.
There’s nothing like being able to track down exactly why a decision was made 6 years ago in a part of the code base you are struggling to understand written by someone who left before you joined the team.
I don't understand why so many engineers are like this.
The other issue with tagging JIRA tickets is that junior developers will believe that to be enough (you can just read the ticket) and wont understand why they need to describe the change from a technical angle, when it's already described in JIRA.
To fail to do so is a gigantic missed opportunity in my opinion. You never know when you will need it.
Similarly with AI it is fairly simple to have eg a pre-merge check that validates the commit msg is somewhat useful. This could be implemented for example with GitHub org level checks that must run in a PR.
You can always generate a new commit message (or summary, alternative summary, etc) down the road with AI. You can never replace your mind being in the thick of a changeset.
These days, lots of my commit messages are drafted by AI after having chatted at length about the requirements. If the commit message is wrong or incomplete, I'll revise it by hand or maybe guide the AI in the right direction. That tends to be a much more useful and comprehensive description of the commit's intent than what I would naturally find worthwhile to write on my own.
OP's approach is interesting as well, at least in principle, and if it works well it might be the next best option in the absence of a chat log. It should just make sure to focus on extracting the "why" more than describing the "what".
I take issue with that statement. There's nothing "natural" about documentation. You're not "naturally disposed" to writing a certain level of documentation. It's a skill and a discipline. If you don't think it's worthwhile to write documentation, that's not a "natural failing". You're making a judgment, and any missing documentation is an error in judgment.
If a given project has time/budget to prioritize consistent rigorous documentation, of course it should consider doing so. AI's ability to reduce the cost of doing so is a good thing.
AI writing code and commit messages becomes a loop divorced from human reasoning. Future AIs will need to read the commit history to understand the evolution of the code, and if they're reading poor summaries from other AIs it's just polluting the context window.
Commit messages are documentation for humans and machines.
Writing commit messages is one of these mundane chores I’d gladly delegate to LLMs which are very very good at this kind of thing.
I mean, if you really know you code, you know it, there is no much value in reinforcing it in your head one more time via writing comprehensive commit messages - it’s a waste of time, imho.
This context should at the very least be linked.
But even if that were true, reading a two liner explanation is very obviously more time efficient than reviewing a whole commit diff.
Super common case: you got a subtle bug causing unexpected values in data. You know from the db or logs that it started on 2025-03-02. You check the deployment(s)of that day and there are ~20 of them.
You can quickly read 20 lines in the log and have a good guess of which is likely to be related or go for a round of re-reviewing 20 multi file pull requests and reverse engineer the context from code.
Almost all commits live in tandem with some large feature or change being made. The reason for absolutely all of them is the same - build the thing .
How do you expect someone to know what “the current task” was when they’re tracking down a bug 2 years down the line?
Most AI generate commit messages and PR descriptions are much too verbose and contain 0 additional informational value that couldn't be parsed from the code directly. Most of the time I'd rather read 2 sentences written by a human than a wall of text with redundant information.
You don't have to if you don't want to, but if you think "this commit message is just a summary of the changes made", you'll never write a useful commit message.
Anyhow, ADRs are good, but they stand for Architectural decisions, not every decision is at that level.
In general, if there's a better place to store explanations, do use it, but often, in many projects, commit messages are the least bad place; and it's enormously better to write there than nowhere at all.
I know the code...when I write it. But 2 weeks later all the context is gone, and that's just _for me_. For my colleagues who also have to be in that code, they don't even start with context.
I mean do what works for you, but understand the bulk of the work that this applies to is for >1 person shops with code bases too big to fit in ones head at all, much less for more than a day or so.
This was a delayed project running out of budget and everything had to be completed within a few months. However, the management in it's infinite wisdom also did a complete source code/issue management platform change.
The person who did the repo migration went with the default settings. (I believe this was the case. I forget the details. I would also only half blame the person because everything was rushed)
Everything up to that point was committed with "Made by so and so bot".
This was way ahead of Google was allegedly committing "well over 30%" of code by AI. I witnessed the true pioneers in this space.
> For the love of clean code history, let's remember we're not monkeys banging on keyboards; we're educated, civilized human beings.
This is just a complete braindead commit. Without looking at the code I could probably take 5 minutes to make sense of the commit message, being intrigued something interesting or important is happening. The message is massively over the top, it has way more text then actual code changes. It wastes time.
I am not against AI as a helper in various places. But if possible it should be an opt-in tool if deemed useful. If someone wants to get a summary about a non trivial commit, that can be useful. Even better if the committer writes about the intentions and reasons for the commit, so an AI could match those with the actual code. Don't reiterate whats happening in a patch. Give the meta that isn't there or less obvious. Please.
> The full path specification in `go build` was redundant given the context of how Go modules are structured. Simplifying the instructions improves clarity and reduces confusion for new users or contributors.
The explanation doesn't seem quite right. The module mentioned in that command was moved to the project root, in such a way that the command no longer needs to specify a path. So the full path specification wasn't redundant; the updated version of it, became redundant.
And all of this was done in a single commit. Better (disclaimer, I have no experience using Go. Actual Go developers probably don't even need to be told this much):
Use github.com/arpxspace/smartcommit namespace
`go.mod` is updated with an explicit package namespace, and import
statements adjusted to match.
and then: Rename main module
`cmd/smartcommit.go` is moved to `main.go`, and the README adjusted to
match. (Using this name allows omitting the main module path from the
build command.)
and then: Simplify prose in README.md
(No explanation required for the last one.)There's no need to justify that your changes are "in accordance with best practices", tell a story about "ongoing efforts" (unless you actually have other recent commits that you want to group together like that conceptually), etc. Commit messages are for other developers. Another developer who reads, in effect, "this change was made in the hopes that YOU will have an easier time contributing to the project"... is going to feel patronized.
But making fine-grained commits with short messages will help in the long run. No amount of prose in commit messages can actually organize the commits. Meanwhile, the AI's summary completely ignored a change I would recommend splitting out into a separate (third) commit.
> There's no need to justify that your changes are "in accordance with best practices", tell a story about "ongoing efforts" (unless you actually have other recent commits that you want to group together like that conceptually), etc.
LLMs are very prone to generalisation and marketing language like this. Despite being sycophants, they are also trained to speak as if they constantly have to justify and persuade.
This is called out in the Wikipedia meta I linked to in another comment. They're great red flags to look out for in any writing; humans, myself included, often used this kind of lazy construction!
If I needed this LLM output, which I probably don't, but let's say I do, I can just generate this myself. It's disrespectful to others to generate LLM output for them. Just send them the prompt. We all have access to these tools. It's like sending someone a pre-masticated apple. I have my own teeth, thank you.
You don’t need 98% of commit messages ever again.
Yes when you need those 2% most likely it is for important reasons but usually not so important to make all the other mulled over.
If you make a change to your codebase, normally you know what you want to achieve and why (otherwise... what are you even doing?). A commit message is just putting that in writing... that only takes a few seconds, often less than it takes to write the code.
So it's just a good habit to have. It forces you to think more about the changes you do & why, so it makes you a better software developer. Creating any new habit always takes some energy initially, but it's worth it.
https://google.github.io/eng-practices/review/developer/cl-d...
https://zulip.readthedocs.io/en/latest/contributing/commit-d...
It is fun you have to pre-prompt with: You are an expert software developer.
I also found i had to hold llama3.1's hand more than gpt4o but i suppose that is a given since it's a much smaller model.
That might indicate an underlying problem that can’t be fixed with AI. ;)
It baffles me how people think git log reading like a novella is a good idea. It’s not, because a cherrypick or merge can/will/should overwrite whatever clever garbage you put in there with its own message. Are you going to summarize Q4’s battle with scaling in a paragraph on the sprint merge to master? No. No you aren’t.
So, while I’m against commit messages saying “fixed”, obviously, I’m equally against a “conversation” in git.
If you have engineers on your team that are anal about commit messages, ask them to be as anal about features and teach them to git diff.
I like putting a minimal description in the message itself too so it's easier to skim the log, though.
Nope. Waste of bytes in my commit message header that are better done by git trailers.
Otherwise, I love the idea of the tool. I personally try to answer “why does this commit exist?” when I create commits.
I don't spend a lot of time on trying to come up with scopes etc, I just make sure that my commit does one thing that fits the label.
A few reasons to care about CCs:
- The first few characters of a commit message tell you immediately the type of change you should expect. This tells you part of the "what" at a glance. If you're looking for a bug fix, for example, you can safely ignore any other type of commit.
- Thinking about the type of change you're committing helps you create atomic commits. Anything that is not strictly related should go in a separate commit. Hopefully you already know why you should care about this.
- A conventional commit message also often includes the change scope. This is a handy way to indicate the subsystem that was changed, which is also useful for filtering, searching, aggregating, etc.
- They help with writing change logs. I'm a strong proponent of the idea that change logs shouldn't be just autogenerated dumps of commit messages, but carefully redacted for the intended audience, and CCs can help with grouping changes by type or scope. These days LLMs do a decent job at generating this type of changelog (even though it should still be manually reviewed and tweaked), and the additional metadata provided by CCs helps them make it more accurate.
> The first few characters of a commit message tell you immediately the type of change you should expect.
1. Why do I care about this particular classification of "type" of change?
2. "The first few characters" of the message aren't actually what I necessarily see first, anyway.
> If you're looking for a bug fix, for example, you can safely ignore any other type of commit.
1. If I'm looking for a bug fix, I'm using tools like git blame and git bisect.
2. How often do bugs actually get fixed by a single commit, that has that bug fix as their sole purpose, and which is recognized as a bug fix at the time of writing? I'm guessing it's much lower than one would naively expect.
3. If I'm looking for a bug fix, I'm looking for the fix for a specific bug, which is probably most recognizable by some bug tracker issue ID. (And if not, it's most searchable that by figuring out an ID and looking that up). So I'm scanning lines for a # symbol and a number, which I would definitely not expect to be at the start of the line.
> Thinking about the type of change you're committing helps you create atomic commits. Anything that is not strictly related should go in a separate commit. Hopefully you already know why you should care about this.
Yes. And I do this by thinking about a verb that naturally belongs at the beginning of the sentence (fragment) describing the commit. "Bugfix", "feature", and "enhancement" aren't actions.
The discipline of organizing commits is orthogonal to the discipline of labeling them.
> A conventional commit message also often includes the change scope.
One that is thoughtfully written by hand will naturally include the scope of the change any time that this concept is meaningful.
You may find no use for this information, or find that it clutters the subject line, but I've addressed both arguments in my previous replies, and won't repeat it here.
> If I'm looking for a bug fix [...]
I mentioned a bug fix as an example, so I'm not sure why you're so focused on that.
CCs introduce a convention to how type and scope are specified. You're not required to use any of the proposed types, and should certainly come up with scopes that make sense for your code base. How you use that information is up to you, and I mentioned several ways that I've personally found it useful.
> The discipline of organizing commits is orthogonal to the discipline of labeling them.
I don't follow. You organize things by grouping them according to some criteria, and labels are required to uniquely identify those groups.
> One that is thoughtfully written by hand will naturally include the scope of the change any time that this concept is meaningful.
Ehm, sure, but CCs provide a framework that standardizes the way information is presented in a well structured repository. A scope is something most repos already use; i.e. it's common to see `<scope>: ...` in repos that don't follow CCs. Attaching a type of commit is another bit of information that further groups commits, and the notation `<type>(<scope>): ...` is just the suggested convention. Which, BTW, naturally came from the way some projects were already structuring their commits.
Did you consider it was an example in their comment as well? Fewer features than bug fixes are single commits for example.
> You're not required to use any of the proposed types
> the notation `<type>(<scope>): ...` is just the suggested convention
Commits MUST be prefixed with a type, which consists of a noun, feat, fix, etc., followed by the OPTIONAL scope, OPTIONAL !, and REQUIRED terminal colon and space.
The type feat MUST be used when a commit adds a new feature to your application or library.
The type fix MUST be used when a commit represents a bug fix for your application.
A scope MAY be provided after a type. A scope MUST consist of a noun describing a section of the codebase surrounded by parenthesis, e.g., fix(parser):[1]
Regardless, blindly following any proposed software development convention or practice is rarely a good idea. It's smarter to get informed, pick and choose practices that make sense to you, and adapt them to your specific workflow.
If you don't find what the Conventional Commits specification proposes valuable, that's fine, but my argument is that it's shortsighted and a mistake. Cheers!
It gets in the way because it takes pride position at the start of the commit message. If you want to put it at the end that might be less annoying but I've never seen anyone do that.
I think the main point is that I almost never arrive at a commit by exhaustively reading through a list of commit messages. It's almost always through some other kind of link like git blame or a GitHub issue/PR.
It's also just really low value information. Imagine if every commit message started with how many lines of code it changed. That would be pretty annoying right?
Outside of that, I agree that it's silly to put it in the summary and seems to be a symptom of people writing crap commit messages.
If all you ever write or look at is the summary then obviously it needs to go in there or it'll never be seen.
Feature and fix are change set types. Not commit types. There is no difference if you squash. There is if you do not.
Many projects which do not use Conventional Commits include scope in commit messages.
Structured change type and scope help to write change logs. An issue tracker is a more useful place for this information in my view.
This metadata could also be added via trailers, but most Git UIs don't show them prominently, or at all. So prefixing the subject is still the way to go.
Honest to God, this would discourage frequent commits.
Which will lead to a lot of work not being committed.
Thoughtful messages are for PRs.
(On a good day, that is. Though even on a bad day I don't let "wip" commits into the main branch.)
Obligatory "show me the prompt" https://news.ycombinator.com/item?id=39374249
It generates a whole lot of text that makes me none the wiser as to why you wanted to do any of those changes. It feels like a robot trying to justify the changes post hoc. Which it of course is, so that's understandable.
Don't take this comment as rudeness BTW. It's cool that you're making a fun little tool. I'm assuming you care about writing more useful commit messages, so I thought I'd give you some feedback on that part.
The problem with this is that it still biases people towards including useless fluff. I'd almost rather have no commit mesasge whatsoever (so I at least know there's nothing of value there) rather than having to read through paragraphs of text to determine that there was nothing useful to read. I'd much rather have a terse one line sumamry that includes the gist of the intent of the change than a bunch of waffle.
(I'd rather have 1-2 paragraphs of a well-written, accurate description of the content than any of that, but AI unfortunately isn't capable of that).
The developer now has to choose "do I spend the time to make this commit message better?" or just skim it and say "yeah that's good enough."
This has nothing to do with self-documenting code. On the contrary, in fact, if I have to resort to checking git history that means the code is not self documenting.
From that description I thought it was going to generate the message directly from the diff and the message not being what you thought it would be would be a signal that the code isn't self documenting.
However, I feel like your approach here is a little backwards. By getting the AI to come up with the commit messages, you're actually removing the chance for the human, you, to practise and improve.
I'm a real fan of Kahneman's "thinking fast" and "thinking slow" paradigm. By asking the human to review and approve the commit message, you're allowing them to "think fast", instead of doing the challenging, deliberative "thinking slow" of actually writing what you mean.
While getting the LLM to ask you questions about what you did and why is better than just one-shotting the commit message from the diff, it still lets you reply "reactively" and instinctually, using your "fast" gut thinking, instead of engaging the slower attentive processes required to write from scratch.
Now there are a couple of other posters here critiquing the commit messages in this repo's history. I think that's fair, but by your own admission you are learning, and this is a small and new project! Probably most commits should be along the lines of "getting a thing working", not essays about the intricacies of character encoding:
https://dhwthompson.com/2019/my-favourite-git-commit
But the commits we can see are already demonstrating some of the pitfalls of LLM generated language.
From a recent commit,
"This update enhances user interaction by explicitly addressing scenarios with large diffs, directing users towards feasible actions and maintaining workflow continuity."
This comes after a detailed breakdown of the diff. It is too vague to stand alone without the preceding detail (e.g. 40k character limit) but also doesn't explain them. Why 40k characters? Why any limit at all? Words like "enhances" and "feasible" are filler - be concrete instead.
This article on wiki has fantastic advice about ways that LLM writing fails, more along the lines of what I've just pointed out:
https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
Writing well is hard, never "effortless" as your readme advertises. Sadly, good results have to come from hundreds of hours of hard and uncomfortable work. Truth is rare and precious and difficult to come by, and even when we glimpse it, turning it into words is a whole nother story. I hope you can continue to develop this tool to help you learn and train your own writing, rather than avoid it.
As best I can tell, lots of your commits seem to be including several unrelated changes.
This means commit messages become longer as they have to explain more things, and they also end up explaining the diff so that you can fit more on one page.
I'd suggest getting in the habit of making coherent commits with one change each. Some changes will be trivial, and the diff will be self explanatory. Then you can save your writing effort for commits that are challenging.
On the other hand, if I'm wrong and many changes to have to get bundled, then the commit message would be a good place to explain why.
I wrote more on the "primitives" and what I think of as the "physics" of commits here: https://crabmusket.net/2024/thoughts-on-git-commits-branches...
So let me link to my favorite author of consistently excellent commit messages, Jeff King on the git project itself:
https://github.com/git/git/commits?author=peff
To pick just one, here's a well explained single-line code change. It's subtle, so besides the excellent commit messages, he also adds a comment and a couple tests:
https://github.com/git/git/commit/1940a02dc1122d15706a7051ee...
Another example with an even greater ratio of explanation (10 paragraphs) to code (partial line change):
https://github.com/git/git/commit/8f32a5a6c050766bfa2827869e...
I'm not a fan of that commit as a commit, although it would make a great start for a blog post. The explanation of how the issue was tracked down, is not helpful in understanding what the issue is. On the other hand, while the author describes finding (and replacing) a non-ASCII character masquerading as a space, it would have been more interesting to know what character it was.
I agree that explaining the "why" is useful, but in this case I don't think it deserves much more detail than "ensure the file uses only ASCII characters to avoid a text encoding error while running tests in this specific manner". (I guess I can also see the argument for showing the error message for later searches, too....)
More on the subject: https://mtlynch.io/no-longer-my-favorite-git-commit/
> develop this tool to help you learn and train your own writing, rather than avoid it
Will be striving for this for sure.
By "backwards" I meant to suggest, have the LLM critique a commit message you wrote. Have it point out vague language, weasel words, generalisation and marketing terms.
The wiki article is good advice for writing in general, not limited to LLMs.
I like the implementation, and how it asks you questions to get you to answer why a change was made, instead of making things up, or simply regurgitating what the code does.
I still wouldn't trust it to be accurate and would have to review it, and I personally dislike the default "LLM style", and I wouldn't want to read these messages or subject other people to them, so I won't be using your tool, but thanks for building and sharing.
but in practice it's not a huge problem, IDE shows you the commit history of a specific file so bisecting changes is easy, there's only a few entries with the roughly correct date modifying the file you're looking for :D
"low quality code/commit messages" hasn't really slowed me down so far and probably won't in the future either
I mean, look at this abomination: https://github.com/denys-olleik/alternative-accounting/commi...