>> An eternal war rages between team “git log should be clean” vs. team “git log should have an accurate history.”
Having a clean commit history makes it easier to utilize commands like git blame and git bisect. I'm not convinced those who want to have an accurate history really understand what they want. For example, if after every single keystroke, I saved the file, added it to the index and ran git commit -F - <(date), then I certainly could have a comprehensive history detailing every single keystroke I made as I edited the code, but the history would not be very useful when looking back on it.
>> Team Clean
>> Methods: git merge --ff-only or git rebase && git merge (extreme clean freaks add the --squash option)
>> Pros: Linear history, git log is easy to read, git revert requires no thought.
>> Cons: You’re erasing history—you can no longer tell if two commits were written together on a single feature branch.
Could the con not be addressed by using the --no-ff even if it's a fast-foward merge? A merge commit in that situation points to the same tree as the HEAD commit of the branch that was merged, but the merge commit also has the information about the base commit and the head commit of the branch, so you know what commits were in the branch even after it's merged.
A lot of this stuff is written from the point of view of people pushing changes to a central repository. Git was actually designed from the point of view of people pulling changes between multiple forks and doing that at an enormous scale.
Team clean and team accurate both win here. If your git branch has an ugly history, you'll have a hard time convincing others to pull those ugly changes. They'll throw it back at you and tell you to clean that up using the tools that are available to you. That's why it's called a pull request and not a push demand. You are asking others to pull your changes and merge them.
Open source projects are a lot better at this than corporate projects where it is common for people to huddle around a central repository where everyone has write access like they are still using subversion 20 years ago.
With Linux, the only person that does not have to worry about people upstream from him is Linus Torvalds everybody else is pulling changes from others all the time and they are paranoid about making sure to not rewrite history. You only ever rebase your own changes and only before you share them with anyone else. Linus Torvalds runs a complex network of people from which lots of patches emerge (git patches, that is). If it's not clean, it won't get in. If it's not accurate, it won't get in. And they all pull changes from upstream. So if upstream no longer merges cleanly because you rewrote history on your fork, you have a problem. Simple solution: don't do that.
The linux commit history is an endless sequence of squashed merge commits (and some small commits). Some of them big, some of them small. It's both clean and accurate. And this simplifies the process of pulling those changes for the many forks. torvalds/linux actually has no branches, just an insane amount of forks (45k, and that's just on Github), which of course is the same thing in git. Some of those forks are where maintenance releases are created for older versions of linux.
Because of the way Linux development is structured, it's probably highly unlikely for maintainer to ever merge a fast-forwardable branch into their tree, but conflict-less (empty?) merges sure do happen. Linux doesn't try to keep its history linear in any way, so merges actually represent the actual graph of merged branches.
> It is valuable history: it is archaeological evidence of what has been attempted before.
Except that is not readily accessible. When looking at a particular section of code, you could either run git blame, or run git log and filter it to only list commits that updated that file.
Either way, you're not going to easily be able to find what previous implementations were tried, or why they were not used compared to the current implementation.
But if you make a clean commit with the working implementation and include an explanation of what other approaches were tried and why they weren't used, then someone could use git blame to get the information pretty easily.
It sure is valuable, which is why it's retained in your reflog. You don't have to push it out anywhere though (or at least not somewhere where it pretends to be part of the project).
My reflog can sometimes be useful for myself, but I can't imagine anyone ever taking time to go through someone else's (either literal or in form of commits) messy reflog. If that's your argument, then no, it's not exactly what I'd call "valuable".
My argument was that the only cases where going through unfiltered chain of WIP commits is somewhat valuable are well served by browsing through your reflog. I don't see any value in peeking at other people's unfinished stuff (unless doing something like pair programming or teaching). Any decisions that are worth noting go into commit messages anyway. That "archaeological evidence" has, in general, no reason to leave the local machine at all.
Maybe it does, but couldn’t the developer write more descriptively?
The argument is a non sequitur too: It does not follow that a git merge strategy that keeps all commits is at fault for a culture of writing bad commit messages.
Presenting that argument as a rationale for avoiding straight merges while preferring squashing and/or rewriting history is the strawman.
> Maybe it does, but couldn’t the developer write more descriptively?
Commonly while you're fixing things up there's nothing of worth to be descriptive about. "made test x pass" is a useful checkpoint during development, it's not a useful step to conserve forever.
> The argument is a non sequitur too: It does not follow that a git merge strategy that keeps all commits is at fault for a culture of writing bad commit messages.
I see your accusation of strawmanning was just projection. Thank you for the insight, have a nice day.
”Made test x pass” is still a bad commit message, as it says nothing about what changes were made to the codebase — which are always made when ”fixing things up”!
I did not intend for my comment to be taken as an accusation, as that would imply intention on your part. I simply meant to point out the logical fallacy in statements I read online, i.e. ”because someone on the Internet was wrong”.
Trying to gaslight me with personal accusations of psychological projection, however, is crossing a line. Do not do that to other people, it is violence.
You do not know what I think or feel, only what I write.
Spending excessive time authoring high quality commit messages for low value commits [during development] is a waste of time. In general it will end up costing you in pure velocity, and reduce how much can get done in a day.
You do know that temporary commits made locally that usually should never leave your development machine are not the ones that you push out to review and merge, right?
That works when your developer laptop can actually run all the tests. If you're writing code which runs on something like AIX then generally you need to push code to CI in order to get it tested at all. And even for stuff which should test correctly both locally and in CI a lot of time gets burned yak shaving fixing the CI configuration, which can't be done locally. With something like Windows there's often issues between the configuration of the CI builders/testers and what you're running in some kind of virtual environment locally.
The idyllic world of "when it passes on my machine it always passes in CI" doesn't exist once you start doing sufficiently complicated testing and have a sufficiently large test matrix.
And as you pull in dependencies (different version of the underlying language/framwork, different versions of upstream or downstream deps that you need to test against) everything gets messy and even if you could reproduce the full matrix on your laptop it would take a day to run to completion.
> That works when your developer laptop can actually run all the tests.
That's completely orthogonal. A remote branch that nobody looks at except you and your CI service is also part of what I'd consider "developer machine", you're just not necessarily sitting right in front of it. The important part is: these are not the commits that go anywhere further in the development process, and you certainly don't push them out to review. There's absolutely no reason for anyone to look at my "aaaa" and "fix" commits full of commented out code that serve as nothing more than an "undo" feature for what happened in my brain.
Because why not? It's an undo feature for my brain, and a way to trigger remote CI process. Using tools to help with thinking is not "a problem", and even printf debugging has its perfectly valid uses.
My temporary commits can be as messy as it's reasonable for them to remain helpful, because they're not influencing the review in any way.
"Should" is a funny word. I push funky commits to development branches often, because I always try to sync my work to the git host at the end of the workday in case I wakeup to a non-functioning laptop the next morning. Better to have a risk of ugly commits in a wip branch than risk losing work.
> Temporary commits .. are not the ones you push out to review, right?
Fully agree with this; I am a clean freak and groom the entire branch differential into sets of atomic or related commits before submitting a merge request.
> I push funky commits to development branches often, because I always try to sync my work to the git host at the end of the workday
I do that too, and also to get CI artifacts - but I consider that an "implementation detail" that doesn't affect any other part of the development process ;)
How often does someone go through your code commit-wise? What is the business value of grooming commits, if the reviewer only looks at the branch diff (as they very well should, lest they miss something in the big picture)?
They are valuable when using git blame to see the line in the context of the change it was part of and the associated commit message. That information could help avoid introducing a regression for example.
I go through individual commits during review quite regularly. Sometimes it's enough to look at the diff of the whole branch, but sometimes it's much more manageable to look at individual commits for meaningful review.
No, there's no reason to preserve commit messages you used during development.
When I am developing, I make many tiny commits with an automatically generated title ('Modify util/files.py') each time my tests pass, or really, when I do anything of value. (I use `git-infer`: https://github.com/rec/gitz/blob/master/git-infer)
This makes it impossible for me to lose work, and acts like a coarse-grained undo for me, where I can quickly move back and forth between spots that the tests worked if I decide I'm going the wrong way, or create a new branch, move back a bit, and make some changes and compare.
_Before anyone sees this code_ I rebase it down to a logical sequence extremely-carefully named and organized commits. (The word "manicured" has been used more than once.)
As I go through code review, I make tiny commits and at the end, rebase them into my carefully-named commits.
I create at least five commit IDs for each final commit I created. No one wants to see these.
I spend considerable time organizing everything so just the information you need to see is in the final commits. All the information should be there.
It helps in terms of maintenance. For example, when changing some code, you could look at the commits that arrested the lines of coffee you're going to modify. With well crafted commit messages, you may be able to avoid introducing a regression based on information in those past commit messages.
If you keep all the intermediary commits, it's pushing more process onto the dev in the middle of coding. Some people may do this well, but more often than not I do just see "WIP (test broken)" etc.
With a squash/rebase, the development has to write an accurate commit message once & they're doing it when they've fully understood the solution they've written.
It doesn’t seem logical to me that someone working on solving an intellectual puzzle would do so without having a consistent internal narrative of what they are doing.
If a developer can’t fit their change description on one line or don’t even know where to start describing the changes, chances are they could commit much more often.
The way I see it, a commit message like ”WIP (test broken)” is unprofessionalism bordering on self-sabotage.
> it's pushing more process onto the dev in the middle of coding
Uhm, how exactly? Splitting & merging things into sensible, atomic commits with proper commit messages is usually what you do right before pushing it out for someone else to see, definitely not "in the middle of coding".
Just always do a merge commit into your main branch, and you have the best of both worlds, don't you?
- A straight history of merge commits you can revert or bisect.
- Handy pointers to all the branches you merged, in case you ever need them (I never have, but I do use merge commits)
At some point I may learn to let go and go full squash-and-rebase, which is the same thing just without the handy pointers to the dev branches (which, if I ever needed them, I'm sure there's a git command to fish them out of the repo). But for now I'm OK
Merge commits add negative value to your commit history.
They are a memory of a commit ID and of a branch that no longer exist. They add complexity and empty information to your project for no useful value.
Also, for writing git tools, life is much easier if the commits form a tree. But a merge commit has _two_ parents. If you have even one merge commit, your commits are no longer a tree.
> Handy pointers to all the branches you merged, in case you ever need them (I never have
TOO MUCH INFORMATION.
No, get rid of it. Extract any information you might need into the code review or the branch.
I mean, that commit ID will probably float around your repo for the next six months, so it isn't like it's gone immediately.
Don't. There's no cost of retaining it and it's super useful to have it when needed. Are you trying to tell me that you never use tools like bisect or blame, and find reading commit messages with justification of the whole worth of a MR instead of a single logical atomic change that are then grouped into a MR more informative?
(unless you mean the exact pointer to the exact branch you worked on, in which case - sure, there's no need to keep it, rebase that branch while merging and let original commit IDs go, they're useless)
In a ff-only merge workflow with merge commits, all you get from a merge commit is grouping some commits together with some commit message. There's no cost to it since you can still easily consider the whole repo to be like a tree, and it makes browsing through commit logs easier for humans without making tools like bisect less useful, like squashing does.
Reading threads like this one, I get an impression that people who want to squash things away and remove information to "keep history clean" are simply not very proficient in using git as a tool that helps with development.
> Handy pointers to all the branches you merged, in case you ever need them
The one thing that GitHub does when it generates the merge commit (which it does even for fast-forward merges) is that it adds a link to the PR associated with that commit. It's a handy way to go back to the PR and see the review of the branch before it was merged.
For projects that use the email patch workflow like the Linux kernel or git itself, the merge commit message contains the branch name which can be used to find the mailing list discussion. In the git project, you can find the "what's cooking" emails to find the branch name and references to the message-id of the patch series cover letter.
I have absolutely no interest in "accurate history".
WHY. Do people really want to see my minor spelling mistakes and goofs that got corrected in review?
It's worth spending the extra time to have a completely clean and perfect log, if only for being able to read your commit history like a novel, but also for using `bisect` to find regressions and behavior changes.
---
I have gone to extremes. At some point this summer, I realized that running the tests with one collection of flags led to breakage, and this breakage had been there for almost two months, and it was important that this flag work. (Yes, it was also a testing/CI issue, and we fixed that.)
I found the issue with `git bisect` and then patched that change two months ago and rebased everything - a couple of hundred commits.
We all feel it is of key importance that all the tests pass at each commit ID, so no one had any issues with that and it really didn't take very long.
To rebase without changing modification dates, use
`git -c rebase.instructionFormat='%s%nexec GIT_COMMITTER_DATE="%cD" git commit --amend --no-edit' rebase -i`
> Methods: git merge --ff-only or git rebase && git merge (extreme clean freaks add the --squash option)
>> Pros: Linear history, git log is easy to read, git revert requires no thought.
Does it require no thought because it's fundamentally impossible? if you're doing -ff without squashing it's gonna get hard to figure out which commits you'll have to revert I think. All histories get merged into a single stream after all.
Merge commits provide best of both worlds. Too bad GitHub and friends refuse to render history with —first-parent. So we’re stuck with inane squash commits.
Having a clean commit history makes it easier to utilize commands like git blame and git bisect. I'm not convinced those who want to have an accurate history really understand what they want. For example, if after every single keystroke, I saved the file, added it to the index and ran git commit -F - <(date), then I certainly could have a comprehensive history detailing every single keystroke I made as I edited the code, but the history would not be very useful when looking back on it.
>> Team Clean
>> Methods: git merge --ff-only or git rebase && git merge (extreme clean freaks add the --squash option)
>> Pros: Linear history, git log is easy to read, git revert requires no thought.
>> Cons: You’re erasing history—you can no longer tell if two commits were written together on a single feature branch.
Could the con not be addressed by using the --no-ff even if it's a fast-foward merge? A merge commit in that situation points to the same tree as the HEAD commit of the branch that was merged, but the merge commit also has the information about the base commit and the head commit of the branch, so you know what commits were in the branch even after it's merged.