* Re: Mistakes in commit log messages [not found] ` <CANh=_JEO4-E79dPCLc3cRLi7=ftAzc+H1FC46eck1vJN3TD3Sg@mail.gmail.com> @ 2023-04-11 6:02 ` Eli Zaretskii 2023-04-11 14:01 ` Alan Mackenzie 0 siblings, 1 reply; 31+ messages in thread From: Eli Zaretskii @ 2023-04-11 6:02 UTC (permalink / raw) To: Jim Porter; +Cc: emacs-devel > From: Jim Porter <jporterbugs@gmail.com> > Date: Mon, 10 Apr 2023 14:52:15 -0700 > Cc: Alan Mackenzie <acm@muc.de>, philipk@posteo.net, luangruo@yahoo.com > > > On Mon, Apr 10, 2023 at 10:18 AM Jim Porter <jporterbugs@gmail.com> wrote: > > I looked into doing this, and I think it'd be possible to extend the > > existing commit-msg hook (in build-aux/git-hooks) to do this, at least > > using gawk. I don't really know awk though, so I'm sure my solution > > would be clumsy and probably gawk-specific. I wonder if we could make > > the hooks use Emacs Lisp... > > If someone could figure out how to disable this code on non-gawk awks, > I think the attached diff should do the trick. Any thoughts? I think a solution that doesn't use Gawk-specific features would be preferable, since no one said the mistakes are private only to users of GNU/Linux and MS-Windows, where Gawk is basically the only Awk. For the other readers of emacs-devel: this came from a private email I wrote to several of our active contributors telling them that their commit log messages included a substantial number of mistakes in file names mentioned in the log message. The admin/authors.el program discovered those mistakes while trying to generate attributions for who did what in Emacs (the etc/AUTHORS file). Someone suggested to augment our commit hooks to avoid such mistakes, at least those of them that can be easily detected by a simple script. The script suggested by Jim is below: > diff --git a/build-aux/git-hooks/commit-msg b/build-aux/git-hooks/commit-msg > index d0578bcfb46..cdc99f4b399 100755 > --- a/build-aux/git-hooks/commit-msg > +++ b/build-aux/git-hooks/commit-msg > @@ -45,6 +45,7 @@ at_sign= > > # Check the log entry. > exec $awk -v at_sign="$at_sign" -v cent_sign="$cent_sign" -v file="$1" ' > + @load "filefuncs" > BEGIN { > # These regular expressions assume traditional Unix unibyte behavior. > # They are needed for old or broken versions of awk, e.g., > @@ -129,6 +130,18 @@ at_sign= > status = 1 > } > > + /^* / { > + # Check that any filenames mentioned in the commit message > + # actually exist. Currently, this only prints a warning to > + # prevent potential issues with false positives. > + if(match($2, "[^:/][^:]*")) { > + FILE = substr($2, RSTART, RLENGTH) > + if(stat(FILE, type) < 0) { > + printf("Warning: file '\''%s'\'' in commit message not found\n", FILE) > + } > + } > + } > + > $0 ~ unsafe_gnu_url { > needs_rewriting = 1 > } ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-11 6:02 ` Mistakes in commit log messages Eli Zaretskii @ 2023-04-11 14:01 ` Alan Mackenzie 2023-04-11 14:57 ` Eli Zaretskii 2023-04-11 18:31 ` Jim Porter 0 siblings, 2 replies; 31+ messages in thread From: Alan Mackenzie @ 2023-04-11 14:01 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Jim Porter, emacs-devel Hello, Eli and Jim. On Tue, Apr 11, 2023 at 09:02:05 +0300, Eli Zaretskii wrote: > > From: Jim Porter <jporterbugs@gmail.com> > > Date: Mon, 10 Apr 2023 14:52:15 -0700 > > Cc: Alan Mackenzie <acm@muc.de>, philipk@posteo.net, luangruo@yahoo.com > > On Mon, Apr 10, 2023 at 10:18 AM Jim Porter <jporterbugs@gmail.com> wrote: > > > I looked into doing this, and I think it'd be possible to extend the > > > existing commit-msg hook (in build-aux/git-hooks) to do this, at least > > > using gawk. I don't really know awk though, so I'm sure my solution > > > would be clumsy and probably gawk-specific. I wonder if we could make > > > the hooks use Emacs Lisp... > > If someone could figure out how to disable this code on non-gawk awks, > > I think the attached diff should do the trick. Any thoughts? > I think a solution that doesn't use Gawk-specific features would be > preferable, since no one said the mistakes are private only to users > of GNU/Linux and MS-Windows, where Gawk is basically the only Awk. > For the other readers of emacs-devel: this came from a private email I > wrote to several of our active contributors telling them that their > commit log messages included a substantial number of mistakes in file > names mentioned in the log message. The admin/authors.el program > discovered those mistakes while trying to generate attributions for > who did what in Emacs (the etc/AUTHORS file). Someone suggested to > augment our commit hooks to avoid such mistakes, at least those of > them that can be easily detected by a simple script. > The script suggested by Jim is below: > > diff --git a/build-aux/git-hooks/commit-msg b/build-aux/git-hooks/commit-msg > > index d0578bcfb46..cdc99f4b399 100755 > > --- a/build-aux/git-hooks/commit-msg > > +++ b/build-aux/git-hooks/commit-msg > > @@ -45,6 +45,7 @@ at_sign= > > # Check the log entry. > > exec $awk -v at_sign="$at_sign" -v cent_sign="$cent_sign" -v file="$1" ' > > + @load "filefuncs" > > BEGIN { > > # These regular expressions assume traditional Unix unibyte behavior. > > # They are needed for old or broken versions of awk, e.g., > > @@ -129,6 +130,18 @@ at_sign= > > status = 1 > > } > > + /^* / { > > + # Check that any filenames mentioned in the commit message > > + # actually exist. Currently, this only prints a warning to > > + # prevent potential issues with false positives. > > + if(match($2, "[^:/][^:]*")) { > > + FILE = substr($2, RSTART, RLENGTH) > > + if(stat(FILE, type) < 0) { > > + printf("Warning: file '\''%s'\'' in commit message not found\n", FILE) > > + } > > + } > > + } > > + > > $0 ~ unsafe_gnu_url { > > needs_rewriting = 1 > > } After having to ask on the help-gawk mailing list how to do it, I've got a suggestion that uses only AWK, and checks for the existence of each file in a "* foo..." line by attempting to read the first line from it. It also reports an error if there are no such lines (it is possible the contributor forgot to include the "* " in his file lines). --- commit-msg 2023-01-15 15:01:05.006074916 +0000 +++ commit-msg.acm 2023-04-11 13:59:18.517300896 +0000 @@ -138,11 +138,24 @@ status = 1 } + /^\* [a-zA-Z0-9_.~#-]/ { + nfiles++ + if ((rc = (getline x < $2)) < 0) { + status = 1 + print "File " $2 " cannot be read: [" ERRNO "]" + } + close($2) + } + END { if (nlines == 0) { print "Empty commit message" status = 1 } + if (!nfiles) { + print "No file lines in commit message" + status = 1 + } if (status == 0 && needs_rewriting) { for (i = 1; i <= NR; i++) { line = input[i] -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-11 14:01 ` Alan Mackenzie @ 2023-04-11 14:57 ` Eli Zaretskii 2023-04-11 17:20 ` Alan Mackenzie 2023-04-11 18:31 ` Jim Porter 1 sibling, 1 reply; 31+ messages in thread From: Eli Zaretskii @ 2023-04-11 14:57 UTC (permalink / raw) To: Alan Mackenzie; +Cc: jporterbugs, emacs-devel > Date: Tue, 11 Apr 2023 14:01:48 +0000 > Cc: Jim Porter <jporterbugs@gmail.com>, emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > + /^\* [a-zA-Z0-9_.~#-]/ { > + nfiles++ > + if ((rc = (getline x < $2)) < 0) { > + status = 1 > + print "File " $2 " cannot be read: [" ERRNO "]" The error message should explicitly mention an error in the file name or its leading directories, at least if ERRNO indicates the file does not exist. "File FOO cannot be read is too "technical" and doesn't explain what we are checking. Also, I'm not sure what is CWD when this script runs. Suppose I invoke "git commit" from a subdirectory of the tree root -- does this find files in that case? > + if (!nfiles) { > + print "No file lines in commit message" > + status = 1 This was never an error before, so shouldn't be an error now. Log messages saying just Fix last change are quite common, and I don't want to disallow them. In any case, I suggest that you (and maybe others) try running with this for some time, to make sure there are no regressions. Thanks. P.S. Is 'getline' available in non-GNU Awk's? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-11 14:57 ` Eli Zaretskii @ 2023-04-11 17:20 ` Alan Mackenzie 2023-04-11 18:00 ` Eli Zaretskii 0 siblings, 1 reply; 31+ messages in thread From: Alan Mackenzie @ 2023-04-11 17:20 UTC (permalink / raw) To: Eli Zaretskii; +Cc: jporterbugs, emacs-devel Hello, Eli. On Tue, Apr 11, 2023 at 17:57:35 +0300, Eli Zaretskii wrote: > > Date: Tue, 11 Apr 2023 14:01:48 +0000 > > Cc: Jim Porter <jporterbugs@gmail.com>, emacs-devel@gnu.org > > From: Alan Mackenzie <acm@muc.de> > > + /^\* [a-zA-Z0-9_.~#-]/ { > > + nfiles++ > > + if ((rc = (getline x < $2)) < 0) { > > + status = 1 > > + print "File " $2 " cannot be read: [" ERRNO "]" > The error message should explicitly mention an error in the file name > or its leading directories, at least if ERRNO indicates the file does > not exist. "File FOO cannot be read is too "technical" and doesn't > explain what we are checking. You mean something like: print "Error in file or directory name " $2 " [" ERRNO "]" ? I'm beginning to get a bit concerned about what happens after a file has been removed from git, or renamed. At least "cannot be read: " is honest about what caused the error in the script. Maybe I could add another line suggesting how to bypass the checks in commit-msg. What do you think? > Also, I'm not sure what is CWD when this script runs. Suppose I > invoke "git commit" from a subdirectory of the tree root -- does this > find files in that case? The current working directory is always the main working directory (see man githooks(1)) unless it's a bare commit. > > + if (!nfiles) { > > + print "No file lines in commit message" > > + status = 1 > This was never an error before, so shouldn't be an error now. Log > messages saying just > Fix last change > are quite common, and I don't want to disallow them. Ah, OK. It's just I've been known to neglect the "* " at times. I'll remove that bit, and submit a revised patch soon. > In any case, I suggest that you (and maybe others) try running with > this for some time, to make sure there are no regressions. Will do. > Thanks. > P.S. Is 'getline' available in non-GNU Awk's? The gawk manual implies strongly it is available in POSIX AWKs. It might even explicitly say so somewhere. So I strongly suspect the answer to your question is yes, but perhaps not in every AWK. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-11 17:20 ` Alan Mackenzie @ 2023-04-11 18:00 ` Eli Zaretskii 0 siblings, 0 replies; 31+ messages in thread From: Eli Zaretskii @ 2023-04-11 18:00 UTC (permalink / raw) To: Alan Mackenzie; +Cc: jporterbugs, emacs-devel > Date: Tue, 11 Apr 2023 17:20:13 +0000 > Cc: jporterbugs@gmail.com, emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > > > + /^\* [a-zA-Z0-9_.~#-]/ { > > > + nfiles++ > > > + if ((rc = (getline x < $2)) < 0) { > > > + status = 1 > > > + print "File " $2 " cannot be read: [" ERRNO "]" > > > The error message should explicitly mention an error in the file name > > or its leading directories, at least if ERRNO indicates the file does > > not exist. "File FOO cannot be read is too "technical" and doesn't > > explain what we are checking. > > You mean something like: > > print "Error in file or directory name " $2 " [" ERRNO "]" > > ? I mean something like "File " $2 " does not exist; did you make a mistake in its name?" > I'm beginning to get a bit concerned about what happens after a file > has been removed from git, or renamed. This needs to be detected and not flagged as an error. > Maybe I could add another line suggesting how to bypass the checks > in commit-msg. What do you think? No, I think we shouldn't make it easy for people to bypass the test, if we want this to be useful. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-11 14:01 ` Alan Mackenzie 2023-04-11 14:57 ` Eli Zaretskii @ 2023-04-11 18:31 ` Jim Porter 2023-04-11 18:45 ` Eli Zaretskii 2023-04-12 9:32 ` Alan Mackenzie 1 sibling, 2 replies; 31+ messages in thread From: Jim Porter @ 2023-04-11 18:31 UTC (permalink / raw) To: Alan Mackenzie, Eli Zaretskii; +Cc: emacs-devel On 4/11/2023 7:01 AM, Alan Mackenzie wrote: > > + /^\* [a-zA-Z0-9_.~#-]/ { > + nfiles++ > + if ((rc = (getline x < $2)) < 0) { > + status = 1 > + print "File " $2 " cannot be read: [" ERRNO "]" > + } > + close($2) > + } One thing to be careful of here (which is why I did the 'match'/'substr' dance in my patch) is that we need to be able to handle lines like this correctly: * some/file.el: New file. Normally, awk would split that so that $2 is "some/file.el:", which isn't right. Maybe there's a better method than my patch though. I'm definitely not an awk expert (this is the first awk code I've ever written). Also, using 'getline' will work for checking files, but not directories. I'm not sure this ever comes up in practice, but it might occur once in a while if we import a large package into the Emacs tree. We could do a bit of feature-checking and upgrade to using 'stat' if we have a newish gawk. That way, gawk users would get proper checks for this (rare) case. (This might not be strictly necessary.) Finally, I think it would make sense to have this be a purely advisory warning for now so that we could check it into the Emacs tree soon-ish. As Eli suggests, we can try running with this hook locally, but since I've already added some Lisp code to generate the changelog for my workflow, I'm probably never going to trigger this myself. If this check is just a warning, I think we could be a little more aggressive in merging this, since it shouldn't break anyone's workflows. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-11 18:31 ` Jim Porter @ 2023-04-11 18:45 ` Eli Zaretskii 2023-04-11 19:27 ` Jim Porter 2023-04-12 9:32 ` Alan Mackenzie 1 sibling, 1 reply; 31+ messages in thread From: Eli Zaretskii @ 2023-04-11 18:45 UTC (permalink / raw) To: Jim Porter; +Cc: acm, emacs-devel > Date: Tue, 11 Apr 2023 11:31:55 -0700 > Cc: emacs-devel@gnu.org > From: Jim Porter <jporterbugs@gmail.com> > > Finally, I think it would make sense to have this be a purely advisory > warning for now so that we could check it into the Emacs tree soon-ish. How will this help, given that some people use Magit to commit and some use VC, not the Git commands from the shell? Are we sure the warnings will be seen, let alone acted upon? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-11 18:45 ` Eli Zaretskii @ 2023-04-11 19:27 ` Jim Porter 2023-04-11 19:36 ` Eli Zaretskii 0 siblings, 1 reply; 31+ messages in thread From: Jim Porter @ 2023-04-11 19:27 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, emacs-devel On 4/11/2023 11:45 AM, Eli Zaretskii wrote: >> Date: Tue, 11 Apr 2023 11:31:55 -0700 >> Cc: emacs-devel@gnu.org >> From: Jim Porter <jporterbugs@gmail.com> >> >> Finally, I think it would make sense to have this be a purely advisory >> warning for now so that we could check it into the Emacs tree soon-ish. > > How will this help, given that some people use Magit to commit and > some use VC, not the Git commands from the shell? Are we sure the > warnings will be seen, let alone acted upon? I don't know. If VC and Magit don't show messages from Git hooks, they probably should. Those messages are there for a reason, after all. However, warnings that only some users see would hopefully still help us get feedback on whether the validation works properly. Some miscellaneous thoughts on this: One issue with the current implementation (mentioned elsewhere in this thread) is that it doesn't work for deleting a file from the repo. Fixing this in the commit-msg hook is tricky. We can get the list of changed files via "git diff --name-only" (and then we could compare our commit message against that list), but there's a problem: I don't know of a good way to detect when the user is amending a commit[1]. For a normal commit, you'd get the changed files via something like "git diff --staged --name-only", but for an amended commit, you'd want to add "HEAD^" to that command. Another option might be to do this as a pre-push hook. By then, the commits are finalized, so we can more easily compare the commit message and its diff. This is also nice because when the commit-msg hook errors out, it throws away the commit message; quite annoying if you just made a small typo! However, it does mean that committers would need to be comfortable with amending their commit messages if they hit this error. (We could also check this in a post-commit hook so that committers are alerted *before* they try to push, but post-commit hooks can't abort the commit: it's already done! Maybe we could have a post-commit hook that warns the committer, and a pre-push hook that actually errors out.) [1] The only way I've seen to do this is to look at the arguments from the parent (Git) process that called the hook: <https://stackoverflow.com/questions/19387073/how-to-detect-commit-amend-by-pre-commit-hook>. I'm not sure how reliably that works, especially on platforms like MS-Windows... ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-11 19:27 ` Jim Porter @ 2023-04-11 19:36 ` Eli Zaretskii 2023-04-12 0:20 ` Jim Porter 2023-04-12 9:41 ` Alan Mackenzie 0 siblings, 2 replies; 31+ messages in thread From: Eli Zaretskii @ 2023-04-11 19:36 UTC (permalink / raw) To: Jim Porter; +Cc: acm, emacs-devel > Date: Tue, 11 Apr 2023 12:27:40 -0700 > Cc: acm@muc.de, emacs-devel@gnu.org > From: Jim Porter <jporterbugs@gmail.com> > > One issue with the current implementation (mentioned elsewhere in this > thread) is that it doesn't work for deleting a file from the repo. > Fixing this in the commit-msg hook is tricky. We can get the list of > changed files via "git diff --name-only" (and then we could compare our > commit message against that list), but there's a problem: I don't know > of a good way to detect when the user is amending a commit[1]. For a > normal commit, you'd get the changed files via something like "git diff > --staged --name-only", but for an amended commit, you'd want to add > "HEAD^" to that command. If this will cry wolf on every removal or rename of a file, then this cure is worse than the disease, IMNSHO. I'd prefer to waste a few hours of my time when preparing the tarball than risk annoying everyone with such false positives. > [1] The only way I've seen to do this is to look at the arguments from > the parent (Git) process that called the hook: > <https://stackoverflow.com/questions/19387073/how-to-detect-commit-amend-by-pre-commit-hook>. > I'm not sure how reliably that works, especially on platforms like > MS-Windows... The 'ps' part won't work on Windows. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-11 19:36 ` Eli Zaretskii @ 2023-04-12 0:20 ` Jim Porter 2023-04-13 6:18 ` Jim Porter 2023-04-12 9:41 ` Alan Mackenzie 1 sibling, 1 reply; 31+ messages in thread From: Jim Porter @ 2023-04-12 0:20 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, emacs-devel On 4/11/2023 12:36 PM, Eli Zaretskii wrote: >> Date: Tue, 11 Apr 2023 12:27:40 -0700 >> Cc: acm@muc.de, emacs-devel@gnu.org >> From: Jim Porter <jporterbugs@gmail.com> >> >> One issue with the current implementation (mentioned elsewhere in this >> thread) is that it doesn't work for deleting a file from the repo. >> Fixing this in the commit-msg hook is tricky. We can get the list of >> changed files via "git diff --name-only" (and then we could compare our >> commit message against that list), but there's a problem: I don't know >> of a good way to detect when the user is amending a commit[1]. For a >> normal commit, you'd get the changed files via something like "git diff >> --staged --name-only", but for an amended commit, you'd want to add >> "HEAD^" to that command. > > If this will cry wolf on every removal or rename of a file, then this > cure is worse than the disease, IMNSHO. I'd prefer to waste a few > hours of my time when preparing the tarball than risk annoying > everyone with such false positives. Yeah, I think doing this in a different hook would be less error-prone. Below is a demonstration of a post-commit hook that would warn the user about files listed in the commit message that aren't in the diff. The only potential for error I see is when creating or deleting entire directories, but that should be fixable by adding the parent directories of touched files into the 'CHANGES' variable. Since this is a post-commit hook, it's only advisory, and can't error out. However, we could do something similar for a pre-push hook, except we'd iterate over each commit that's about to be pushed. A pre-push *could* error out, thus preventing users from pushing any badly-formatted commits upstream. If we did something like this, I think both post-commit and pre-push hooks would be useful. The former gives committers immediate feedback on their commit message (if they read the output), while the latter is what ensures that the upstream commit messages are properly-formatted. -------------------------------------------------- #!/bin/sh git log -1 --format=%B | awk ' BEGIN { # Collect all the files touched in the last commit. while (("git log -1 --name-status --format=" | getline) > 0) { for (i = 2; i <= NF; i++) { CHANGES[$i] = 1 } } } /^\* / { if (match($2, "[^:/][^:]*")) { FILE = substr($2, RSTART, RLENGTH) if (! (FILE in CHANGES)) { printf("File %s listed in commit message, but not in diff\n", FILE) status = 1 } } } END { if (status != 0) { print "Bad commit; please see the file '\''CONTRIBUTE'\''" } } ' ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-12 0:20 ` Jim Porter @ 2023-04-13 6:18 ` Jim Porter 2023-04-13 6:49 ` Eli Zaretskii 0 siblings, 1 reply; 31+ messages in thread From: Jim Porter @ 2023-04-13 6:18 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1488 bytes --] On 4/11/2023 5:20 PM, Jim Porter wrote: > If we did something like this, I think both post-commit and pre-push > hooks would be useful. The former gives committers immediate feedback on > their commit message (if they read the output), while the latter is what > ensures that the upstream commit messages are properly-formatted. Attached is a mostly-working pair of hooks (post-commit and pre-push) to check this. There's still a bit more work to do though: 1. Clean up the output a bit 2. Make it work on non-gawk awks Both of those should be pretty easy though. Here's an example of the error if you try to push: -------------------------------------------------- In commit 89f2dbb773: My commit message File lisp/bad.el listed in commit message, but not in diff Push aborted; please see the file 'CONTRIBUTE' error: failed to push some refs to 'jporter@git.sv.gnu.org:/srv/git/emacs.git' -------------------------------------------------- For the post-commit hook, we just pass the current commit SHA to our awk script, which checks that SHA to make sure its commit message is correct. For the pre-push hook, we need to compare the local ref that we're pushing to the remote ref, and then check every commit between the two (the Git documentation explains a bit about how the hook works[1]). Assuming the general strategy looks ok, I'll finish this up and add more documentation for what's going on. [1] https://git-scm.com/docs/githooks#_pre_push [-- Attachment #2: 0001-WIP-Add-Git-hooks-to-check-filenames-listed-in-the-c.patch --] [-- Type: text/plain, Size: 4068 bytes --] From ac0e2bbcc99b07ea5185041f5a066c36cc1e42ac Mon Sep 17 00:00:00 2001 From: Jim Porter <jporterbugs@gmail.com> Date: Wed, 12 Apr 2023 23:03:31 -0700 Subject: [PATCH] [WIP] Add Git hooks to check filenames listed in the commit message * build-aux/git-hooks/commit-msg-files.awk: * build-aux/git-hooks/post-commit: * build-aux/git-hooks/pre-push: New files. --- build-aux/git-hooks/commit-msg-files.awk | 39 +++++++++++++++++++ build-aux/git-hooks/post-commit | 19 +++++++++ build-aux/git-hooks/pre-push | 49 ++++++++++++++++++++++++ 3 files changed, 107 insertions(+) create mode 100644 build-aux/git-hooks/commit-msg-files.awk create mode 100644 build-aux/git-hooks/post-commit create mode 100644 build-aux/git-hooks/pre-push diff --git a/build-aux/git-hooks/commit-msg-files.awk b/build-aux/git-hooks/commit-msg-files.awk new file mode 100644 index 00000000000..1dd89dcdea0 --- /dev/null +++ b/build-aux/git-hooks/commit-msg-files.awk @@ -0,0 +1,39 @@ +function get_commit_changes(commit_sha, changes, i, j, len, bits, filename) { + # Collect all the files touched in the specified commit. + while ((("git log -1 --name-status --format= " commit_sha) | getline) > 0) { + for (i = 2; i <= NF; i++) { + len = split($i, bits, "/") + for (j = 1; j <= len; j++) { + if (j == 1) + filename = bits[j] + else + filename = filename "/" bits[j] + changes[filename] = 1 + } + } + } +} + +function check_commit_msg_files(commit_sha, changes, good, msg, filename) { + get_commit_changes(commit_sha, changes) + good = 1 + + while ((("git log -1 --format=%B " commit_sha) | getline) > 0) { + if (! msg) + msg = $0 + + if (/^\* / && match($2, "[^:/][^:]*")) { + filename = substr($2, RSTART, RLENGTH) + if (! (filename in changes)) { + if (good) { + printf("In commit %s: %s\n", substr(commit_sha, 1, 10), msg) + } + printf(" File %s listed in commit message, but not in diff\n", + filename) + good = 0 + } + } + } + + return good +} diff --git a/build-aux/git-hooks/post-commit b/build-aux/git-hooks/post-commit new file mode 100644 index 00000000000..35c902d8270 --- /dev/null +++ b/build-aux/git-hooks/post-commit @@ -0,0 +1,19 @@ +#!/bin/sh + +git rev-parse HEAD | awk ' + # FIXME: Do this a POSIX-compatible way. + @include ".git/hooks/commit-msg-files.awk" + + /^[a-z0-9]{40}$/ { + if (! check_commit_msg_files($0)) { + status = 1 + } + } + + END { + if (status != 0) { + print "Bad commit message; please see the file '\''CONTRIBUTE'\''" + } + exit status + } +' diff --git a/build-aux/git-hooks/pre-push b/build-aux/git-hooks/pre-push new file mode 100644 index 00000000000..3329bf59e1d --- /dev/null +++ b/build-aux/git-hooks/pre-push @@ -0,0 +1,49 @@ +#!/bin/sh + +awk -v origin_name="$1" ' + @include ".git/hooks/commit-msg-files.awk" + + # If the local SHA is all zeroes, ignore it. + $2 ~ /^0{40}$/ { + next + } + + $2 ~ /^[a-z0-9]{40}$/ { + newref = $2 + + # If the remote SHA is all zeroes, go backwards until we find a SHA on + # an origin branch. + if ($4 ~ /^0{40}$/) { + back = 0 + while ((("git branch -r -l '\''" origin_name "/*'\'' --contains " \ + newref "~" back) | getline) == 0) { + back++ + } + + ("git rev-parse " newref "~" back) | getline oldref + if (!(oldref ~ /^[a-z0-9]{40}$/)) { + # The SHA is misformatted?! + exit 2 + } + } else if ($4 ~ /^[a-z0-9]{40}$/) { + oldref = $4 + } else { + # The remote SHA is misformatted?! + exit 2 + } + + # Iterate over every SHA after oldref, up to (and including) newref. + while ((("git rev-list --reverse " oldref ".." newref) | getline) > 0) { + if (! check_commit_msg_files($0)) { + status = 1 + } + } + } + + END { + if (status != 0) { + print "Push aborted; please see the file '\''CONTRIBUTE'\''" + } + exit status + } +' -- 2.25.1 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-13 6:18 ` Jim Porter @ 2023-04-13 6:49 ` Eli Zaretskii 2023-04-13 7:47 ` Robert Pluim 2023-04-15 3:41 ` Jim Porter 0 siblings, 2 replies; 31+ messages in thread From: Eli Zaretskii @ 2023-04-13 6:49 UTC (permalink / raw) To: Jim Porter; +Cc: acm, emacs-devel > Date: Wed, 12 Apr 2023 23:18:37 -0700 > From: Jim Porter <jporterbugs@gmail.com> > Cc: acm@muc.de, emacs-devel@gnu.org > > On 4/11/2023 5:20 PM, Jim Porter wrote: > > If we did something like this, I think both post-commit and pre-push > > hooks would be useful. The former gives committers immediate feedback on > > their commit message (if they read the output), while the latter is what > > ensures that the upstream commit messages are properly-formatted. > > Attached is a mostly-working pair of hooks (post-commit and pre-push) to > check this. There's still a bit more work to do though: > > 1. Clean up the output a bit > 2. Make it work on non-gawk awks Please be sure to test the part that finds the file names on all the log messages in the repository. The things people do in logs will sometimes surprise you. In particular, a '*' after leading whitespace doesn't necessarily flag a file name, see, for example, the following commits: 92d75e5c53241ac76e8fdcb6fc66ade68354687c 0bd96806ef1a0f0d2d3f48cdb1204b7e393ab036 eff42dc0af741cc56c52d7d9577d29fc16f9f665 b5f70c239e87e5f38fd70181ef75cd28a43a8b41 Also, it looks like your script doesn't recognize file names in a line that starts with a semi-colon, as in this commit: commit 74ddfe811f980122816ba831bea18ca18afedb85 Author: Eli Zaretskii <eliz@gnu.org> AuthorDate: Sat Apr 8 19:14:15 2023 +0300 Commit: Eli Zaretskii <eliz@gnu.org> CommitDate: Sat Apr 8 19:14:15 2023 +0300 ; * doc/misc/calc.texi (Rewrites Tutorial): Fix a typo (bug#62658). > In commit 89f2dbb773: My commit message > File lisp/bad.el listed in commit message, but not in diff What about the opposite: a file is mentioned in the diff, but not in the commit message? Or maybe that is allowed, and we shouldn't block it? Thanks. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-13 6:49 ` Eli Zaretskii @ 2023-04-13 7:47 ` Robert Pluim 2023-04-15 3:41 ` Jim Porter 1 sibling, 0 replies; 31+ messages in thread From: Robert Pluim @ 2023-04-13 7:47 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Jim Porter, acm, emacs-devel >>>>> On Thu, 13 Apr 2023 09:49:23 +0300, Eli Zaretskii <eliz@gnu.org> said: Eli> What about the opposite: a file is mentioned in the diff, but not in Eli> the commit message? Or maybe that is allowed, and we shouldn't block Eli> it? From CONTRIBUTE: - There is no need to mention files such as NEWS and MAINTAINERS, or to indicate regeneration of files such as 'lib/gnulib.mk', in the ChangeLog entry. "There is no need" means you don't have to, but you can if you want to. Iʼll admit I tend to the prescriptive side of things, and think all changed files should be mentioned in the commit message. Robert -- ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-13 6:49 ` Eli Zaretskii 2023-04-13 7:47 ` Robert Pluim @ 2023-04-15 3:41 ` Jim Porter 2023-04-15 5:45 ` Jim Porter 2023-04-15 7:08 ` Eli Zaretskii 1 sibling, 2 replies; 31+ messages in thread From: Jim Porter @ 2023-04-15 3:41 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, emacs-devel [-- Attachment #1: Type: text/plain, Size: 2858 bytes --] Ok, I think this patch should work, though I'll continue testing it locally before I merge it. We could also merge this patch sooner if we temporarily made the pre-push hook advisory (i.e. doesn't block a push). That way, others can try these hooks out without it blocking anyone's work. Then when we're happy with it, we can make the pre-push hook error out on bad commit messages. On 4/12/2023 11:49 PM, Eli Zaretskii wrote: > Please be sure to test the part that finds the file names on all the > log messages in the repository. The things people do in logs will > sometimes surprise you. In particular, a '*' after leading whitespace > doesn't necessarily flag a file name, see, for example, the following > commits: > > 92d75e5c53241ac76e8fdcb6fc66ade68354687c This works without errors with my latest patch. (Though it's not smart enough to recognize the "src/comp.c" as a file to check, since there's no leading "* ".) I think that's probably ok; it'd be hard to detect cases like that reliably. > 0bd96806ef1a0f0d2d3f48cdb1204b7e393ab036 This fails, correctly I think. The first line of the commit message is "* Rename `comp--typeof-builtin-types'", which I don't think we should allow (going forward, at least). Since it's in the first line, we *could* treat that specially, but I'm not sure it's worth the complexity. Committers can just avoid starting lines with "*" (for example, by using " *" instead). > eff42dc0af741cc56c52d7d9577d29fc16f9f665 This also works without errors. An indented "*" is ok, and not checked by these hooks. > b5f70c239e87e5f38fd70181ef75cd28a43a8b41 This fails for two reasons: first, there's a line like this: "* buffer-match-p and match-buffers", which is recognized as a list of file names. That looks like auto-fill-mode (or something similar) adding a "* " where it shouldn't. That's happened to me before, so I'd be glad for the hook to catch this. It also fails because of this line: "* lisp/window.el (display-buffer-assq-regexp): Mention what happens". That's correct too, since "lisp/window.el" wasn't changed in this commit. If you apply my patch, you can also test out other commits via "echo COMMIT-SHA | awk -f build-aux/git-hooks/commit-msg-files.awk". (I'll do this locally as well.) > Also, it looks like your script doesn't recognize file names in a line > that starts with a semi-colon, as in this commit: I fixed this case, though as far as I can tell, authors.el doesn't look at lines like this. (I could be wrong, since I just read over that code briefly.) > What about the opposite: a file is mentioned in the diff, but not in > the commit message? Or maybe that is allowed, and we shouldn't block > it? I think that's ok. Robert Pluim mentions a couple of cases, and we probably also want to allow commit messages like: "; Fix last change". [-- Attachment #2: 0001-Add-Git-hooks-to-check-filenames-listed-in-the-commi.patch --] [-- Type: text/plain, Size: 8210 bytes --] From f849fa082f0d7c8c3d472120e91e155b8700c65c Mon Sep 17 00:00:00 2001 From: Jim Porter <jporterbugs@gmail.com> Date: Wed, 12 Apr 2023 23:03:31 -0700 Subject: [PATCH] Add Git hooks to check filenames listed in the commit message * build-aux/git-hooks/commit-msg-files.awk: * build-aux/git-hooks/post-commit: * build-aux/git-hooks/pre-push: New files... * autogen.sh: ... add them. --- autogen.sh | 2 +- build-aux/git-hooks/commit-msg-files.awk | 91 ++++++++++++++++++++++++ build-aux/git-hooks/post-commit | 29 ++++++++ build-aux/git-hooks/pre-push | 66 +++++++++++++++++ 4 files changed, 187 insertions(+), 1 deletion(-) create mode 100644 build-aux/git-hooks/commit-msg-files.awk create mode 100755 build-aux/git-hooks/post-commit create mode 100755 build-aux/git-hooks/pre-push diff --git a/autogen.sh b/autogen.sh index af4c2ad14df..71d7ac89abf 100755 --- a/autogen.sh +++ b/autogen.sh @@ -340,7 +340,7 @@ hooks= tailored_hooks= sample_hooks= -for hook in commit-msg pre-commit prepare-commit-msg; do +for hook in commit-msg pre-commit prepare-commit-msg post-commit pre-push commit-msg-files.awk; do cmp -- build-aux/git-hooks/$hook "$hooks/$hook" >/dev/null 2>&1 || tailored_hooks="$tailored_hooks $hook" done diff --git a/build-aux/git-hooks/commit-msg-files.awk b/build-aux/git-hooks/commit-msg-files.awk new file mode 100644 index 00000000000..c1edccdf3e5 --- /dev/null +++ b/build-aux/git-hooks/commit-msg-files.awk @@ -0,0 +1,91 @@ +# Check the file list of GNU Emacs change log entries for each commit SHA. + +# Copyright 2023 Free Software Foundation, Inc. + +# This file is part of GNU Emacs. + +# GNU Emacs is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# GNU Emacs is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +function get_commit_changes(commit_sha, changes, i, j, len, bits, filename) { + # Collect all the files touched in the specified commit. + while ((("git log -1 --name-status --format= " commit_sha) | getline) > 0) { + for (i = 2; i <= NF; i++) { + len = split($i, bits, "/") + for (j = 1; j <= len; j++) { + if (j == 1) + filename = bits[j] + else + filename = filename "/" bits[j] + changes[filename] = 1 + } + } + } +} + +function check_commit_msg_files(commit_sha, verbose, changes, good, msg, \ + filenames_str, filenames, i) { + get_commit_changes(commit_sha, changes) + good = 1 + + while ((("git log -1 --format=%B " commit_sha) | getline) > 0) { + if (verbose && ! msg) + msg = $0 + + # Find lines that reference files. We look at any line starting + # with "*" (possibly prefixed by "; ") where the file part starts + # with an alphanumeric character. The file part ends if we + # encounter any of the following characters: [ ( < { : + if (/^(; )?\*[ \t]+[[:alnum:]]/ && match($0, "[[:alnum:]][^[(<{:]*")) { + # There might be multiple files listed on this line, separated + # by a comma and/or space. Iterate over each of them. + split(substr($0, RSTART, RLENGTH), filenames, "[[:blank:],][[:blank:]]*") + for (i in filenames) { + if (length(filenames[i]) && ! (filenames[i] in changes)) { + if (good) { + # Print a header describing the error. + if (verbose) + printf("In commit %s \"%s\"...\n", substr(commit_sha, 1, 10), msg) + printf("Files listed in commit message, but not in diff:\n") + } + printf(" %s\n", filenames[i]) + good = 0 + } + } + } + } + + return good +} + +BEGIN { + if (reason == "pre-push") + verbose = 1 +} + +/^[a-z0-9]{40}$/ { + if (! check_commit_msg_files($0, verbose)) { + status = 1 + } +} + +END { + if (status != 0) { + if (reason == "pre-push") + error_msg = "Push aborted" + else + error_msg = "Bad commit message" + printf("%s; please see the file 'CONTRIBUTE'\n", error_msg) + } + exit status +} diff --git a/build-aux/git-hooks/post-commit b/build-aux/git-hooks/post-commit new file mode 100755 index 00000000000..4c30ec76e02 --- /dev/null +++ b/build-aux/git-hooks/post-commit @@ -0,0 +1,29 @@ +#!/bin/sh +# Check the file list of GNU Emacs change log entries after committing. + +# Copyright 2023 Free Software Foundation, Inc. + +# This file is part of GNU Emacs. + +# GNU Emacs is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# GNU Emacs is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +# Prefer gawk if available, as it handles NUL bytes properly. +if type gawk >/dev/null 2>&1; then + awk="gawk" +else + awk="awk" +fi + +git rev-parse HEAD | $awk -v reason=post-commit \ + -f .git/hooks/commit-msg-files.awk diff --git a/build-aux/git-hooks/pre-push b/build-aux/git-hooks/pre-push new file mode 100755 index 00000000000..136f84e6691 --- /dev/null +++ b/build-aux/git-hooks/pre-push @@ -0,0 +1,66 @@ +#!/bin/sh +# Check the file list of GNU Emacs change log entries before pushing. + +# Copyright 2023 Free Software Foundation, Inc. + +# This file is part of GNU Emacs. + +# GNU Emacs is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# GNU Emacs is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +# Prefer gawk if available, as it handles NUL bytes properly. +if type gawk >/dev/null 2>&1; then + awk="gawk" +else + awk="awk" +fi + +# Standard input receives lines of the form: +# <local ref> SP <local name> SP <remote ref> SP <remote name> LF +$awk -v origin_name="$1" ' + # If the local SHA is all zeroes, ignore it. + $2 ~ /^0{40}$/ { + next + } + + $2 ~ /^[a-z0-9]{40}$/ { + newref = $2 + # If the remote SHA is all zeroes, this is a new object to be + # pushed (likely a branch). Go backwards until we find a SHA on + # an origin branch. + if ($4 ~ /^0{40}$/) { + back = 0 + while ((("git branch -r -l '\''" origin_name "/*'\'' --contains " \ + newref "~" back) | getline) == 0) { + + # Only look back at most 1000 commits, just in case... + if (back++ > 1000) + break; + } + + ("git rev-parse " newref "~" back) | getline oldref + if (!(oldref ~ /^[a-z0-9]{40}$/)) { + # The SHA is misformatted! Skip this line. + next + } + } else if ($4 ~ /^[a-z0-9]{40}$/) { + oldref = $4 + } else { + # The SHA is misformatted! Skip this line. + next + } + + # Print every SHA after oldref, up to (and including) newref. + system("git rev-list --reverse " oldref ".." newref) + } +' | $awk -v reason=pre-push -f .git/hooks/commit-msg-files.awk -- 2.25.1 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-15 3:41 ` Jim Porter @ 2023-04-15 5:45 ` Jim Porter 2023-04-15 7:15 ` Eli Zaretskii 2023-04-15 20:54 ` Jim Porter 2023-04-15 7:08 ` Eli Zaretskii 1 sibling, 2 replies; 31+ messages in thread From: Jim Porter @ 2023-04-15 5:45 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1870 bytes --] On 4/14/2023 8:41 PM, Jim Porter wrote: > Ok, I think this patch should work, though I'll continue testing it > locally before I merge it. We could also merge this patch sooner if we > temporarily made the pre-push hook advisory (i.e. doesn't block a push). > That way, others can try these hooks out without it blocking anyone's > work. Then when we're happy with it, we can make the pre-push hook error > out on bad commit messages. Here are a couple small fixes after testing against the 2000 latest commits (though I haven't examined *all* the errors in detail). I fixed the hooks so that they actually close the pipes we use, and also made a regexp a bit more strict. Previously, it split file names by a comma and/or a space, but there are a few files with commas in their names, e.g. "test/lisp/net/eudc-resources/dc=gnu,dc=org.ldif". Now it handles those properly. Testing this, there are a bunch of commits where the first line is of the form "* Do something". Maybe the hooks should ignore cases like that, or we could just disallow that going forward. I don't have a strong opinion. There's also one more commit I'm not quite sure what to do about: 0a95a81d8d36722ccf030a6194ecd953fc257a59. It has this in the commit message: This fixes bug #61144. If the space around the * is "symmetric" we leave foo * bar unfontified, a multiplication operation. If it is "asymmetric" we fontify it as a pointer declaration. The second line of this excerpt is treated like a file entry. Maybe our hook could allow that if it were clever enough, or maybe this is a rare enough occurrence that we should just have committers reformat the message slightly. Almost all the other errors are due to commits from packages that were imported into the Emacs repo and their files moved around during the import (e.g. Eglot, use-package). [-- Attachment #2: 0001-Add-Git-hooks-to-check-filenames-listed-in-the-commi.patch --] [-- Type: text/plain, Size: 8386 bytes --] From 6e2a73dfba7c0b1a7b4000f9305b666911f4a171 Mon Sep 17 00:00:00 2001 From: Jim Porter <jporterbugs@gmail.com> Date: Wed, 12 Apr 2023 23:03:31 -0700 Subject: [PATCH] Add Git hooks to check filenames listed in the commit message * build-aux/git-hooks/commit-msg-files.awk: * build-aux/git-hooks/post-commit: * build-aux/git-hooks/pre-push: New files... * autogen.sh: ... add them. --- autogen.sh | 3 +- build-aux/git-hooks/commit-msg-files.awk | 96 ++++++++++++++++++++++++ build-aux/git-hooks/post-commit | 29 +++++++ build-aux/git-hooks/pre-push | 70 +++++++++++++++++ 4 files changed, 197 insertions(+), 1 deletion(-) create mode 100644 build-aux/git-hooks/commit-msg-files.awk create mode 100755 build-aux/git-hooks/post-commit create mode 100755 build-aux/git-hooks/pre-push diff --git a/autogen.sh b/autogen.sh index af4c2ad14df..6127e7b24f4 100755 --- a/autogen.sh +++ b/autogen.sh @@ -340,7 +340,8 @@ hooks= tailored_hooks= sample_hooks= -for hook in commit-msg pre-commit prepare-commit-msg; do +for hook in commit-msg pre-commit prepare-commit-msg post-commit \ + pre-push commit-msg-files.awk; do cmp -- build-aux/git-hooks/$hook "$hooks/$hook" >/dev/null 2>&1 || tailored_hooks="$tailored_hooks $hook" done diff --git a/build-aux/git-hooks/commit-msg-files.awk b/build-aux/git-hooks/commit-msg-files.awk new file mode 100644 index 00000000000..da066b83bdd --- /dev/null +++ b/build-aux/git-hooks/commit-msg-files.awk @@ -0,0 +1,96 @@ +# Check the file list of GNU Emacs change log entries for each commit SHA. + +# Copyright 2023 Free Software Foundation, Inc. + +# This file is part of GNU Emacs. + +# GNU Emacs is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# GNU Emacs is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +function get_commit_changes(commit_sha, changes, cmd, i, j, len, \ + bits, filename) { + # Collect all the files touched in the specified commit. + cmd = ("git log -1 --name-status --format= " commit_sha) + while ((cmd | getline) > 0) { + for (i = 2; i <= NF; i++) { + len = split($i, bits, "/") + for (j = 1; j <= len; j++) { + if (j == 1) + filename = bits[j] + else + filename = filename "/" bits[j] + changes[filename] = 1 + } + } + } + close(cmd) +} + +function check_commit_msg_files(commit_sha, verbose, changes, good, \ + cmd, msg, filenames_str, filenames, i) { + get_commit_changes(commit_sha, changes) + good = 1 + + cmd = ("git log -1 --format=%B " commit_sha) + while ((cmd | getline) > 0) { + if (verbose && ! msg) + msg = $0 + + # Find lines that reference files. We look at any line starting + # with "*" (possibly prefixed by "; ") where the file part starts + # with an alphanumeric character. The file part ends if we + # encounter any of the following characters: [ ( < { : + if (/^(; )?\*[ \t]+[[:alnum:]]/ && match($0, "[[:alnum:]][^[(<{:]*")) { + # There might be multiple files listed on this line, separated + # by spaces (and possibly a comma). Iterate over each of them. + split(substr($0, RSTART, RLENGTH), filenames, ",?[[:blank:]]+") + for (i in filenames) { + if (length(filenames[i]) && ! (filenames[i] in changes)) { + if (good) { + # Print a header describing the error. + if (verbose) + printf("In commit %s \"%s\"...\n", substr(commit_sha, 1, 10), msg) + printf("Files listed in commit message, but not in diff:\n") + } + printf(" %s\n", filenames[i]) + good = 0 + } + } + } + } + close(cmd) + + return good +} + +BEGIN { + if (reason == "pre-push") + verbose = 1 +} + +/^[a-z0-9]{40}$/ { + if (! check_commit_msg_files($0, verbose)) { + status = 1 + } +} + +END { + if (status != 0) { + if (reason == "pre-push") + error_msg = "Push aborted" + else + error_msg = "Bad commit message" + printf("%s; please see the file 'CONTRIBUTE'\n", error_msg) + } + exit status +} diff --git a/build-aux/git-hooks/post-commit b/build-aux/git-hooks/post-commit new file mode 100755 index 00000000000..4c30ec76e02 --- /dev/null +++ b/build-aux/git-hooks/post-commit @@ -0,0 +1,29 @@ +#!/bin/sh +# Check the file list of GNU Emacs change log entries after committing. + +# Copyright 2023 Free Software Foundation, Inc. + +# This file is part of GNU Emacs. + +# GNU Emacs is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# GNU Emacs is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +# Prefer gawk if available, as it handles NUL bytes properly. +if type gawk >/dev/null 2>&1; then + awk="gawk" +else + awk="awk" +fi + +git rev-parse HEAD | $awk -v reason=post-commit \ + -f .git/hooks/commit-msg-files.awk diff --git a/build-aux/git-hooks/pre-push b/build-aux/git-hooks/pre-push new file mode 100755 index 00000000000..b0185a97b28 --- /dev/null +++ b/build-aux/git-hooks/pre-push @@ -0,0 +1,70 @@ +#!/bin/sh +# Check the file list of GNU Emacs change log entries before pushing. + +# Copyright 2023 Free Software Foundation, Inc. + +# This file is part of GNU Emacs. + +# GNU Emacs is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# GNU Emacs is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +# Prefer gawk if available, as it handles NUL bytes properly. +if type gawk >/dev/null 2>&1; then + awk="gawk" +else + awk="awk" +fi + +# Standard input receives lines of the form: +# <local ref> SP <local name> SP <remote ref> SP <remote name> LF +$awk -v origin_name="$1" ' + # If the local SHA is all zeroes, ignore it. + $2 ~ /^0{40}$/ { + next + } + + $2 ~ /^[a-z0-9]{40}$/ { + newref = $2 + # If the remote SHA is all zeroes, this is a new object to be + # pushed (likely a branch). Go backwards until we find a SHA on + # an origin branch. + if ($4 ~ /^0{40}$/) { + back = 0 + cmd = ("git branch -r -l '\''" origin_name "/*'\'' --contains " \ + newref "~" back) + while ((cmd | getline) == 0) { + + # Only look back at most 1000 commits, just in case... + if (back++ > 1000) + break; + } + close(cmd) + + cmd = ("git rev-parse " newref "~" back) + cmd | getline oldref + if (!(oldref ~ /^[a-z0-9]{40}$/)) { + # The SHA is misformatted! Skip this line. + next + } + close(cmd) + } else if ($4 ~ /^[a-z0-9]{40}$/) { + oldref = $4 + } else { + # The SHA is misformatted! Skip this line. + next + } + + # Print every SHA after oldref, up to (and including) newref. + system("git rev-list --reverse " oldref ".." newref) + } +' | $awk -v reason=pre-push -f .git/hooks/commit-msg-files.awk -- 2.25.1 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-15 5:45 ` Jim Porter @ 2023-04-15 7:15 ` Eli Zaretskii 2023-04-15 10:44 ` Alan Mackenzie 2023-04-15 20:54 ` Jim Porter 1 sibling, 1 reply; 31+ messages in thread From: Eli Zaretskii @ 2023-04-15 7:15 UTC (permalink / raw) To: Jim Porter; +Cc: acm, emacs-devel > Date: Fri, 14 Apr 2023 22:45:44 -0700 > From: Jim Porter <jporterbugs@gmail.com> > Cc: acm@muc.de, emacs-devel@gnu.org > > There's also one more commit I'm not quite sure what to do about: > 0a95a81d8d36722ccf030a6194ecd953fc257a59. It has this in the commit message: > > This fixes bug #61144. If the space around the * is "symmetric" we > leave foo > * bar unfontified, a multiplication operation. If it is > "asymmetric" we > fontify it as a pointer declaration. > > The second line of this excerpt is treated like a file entry. Maybe our > hook could allow that if it were clever enough, or maybe this is a rare > enough occurrence that we should just have committers reformat the > message slightly. The latter, IMO. There's absolutely no reason for a log message to have a '*' as the first non-whitespace character of a line. Fixing that is easy. Btw, Alan, this is a good example of your log messages, which are filled using a much larger value of fill-column that the default. The resulting ChangeLog entries appear ugly and stand out from the rest. Would you please adjust your customizations to use a smaller fill-column value, like at most 76, when writing commit log messages? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-15 7:15 ` Eli Zaretskii @ 2023-04-15 10:44 ` Alan Mackenzie 2023-04-15 11:00 ` Eli Zaretskii 2023-04-21 22:16 ` Filipp Gunbin 0 siblings, 2 replies; 31+ messages in thread From: Alan Mackenzie @ 2023-04-15 10:44 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Jim Porter, emacs-devel Hello, Eli. On Sat, Apr 15, 2023 at 10:15:19 +0300, Eli Zaretskii wrote: > > Date: Fri, 14 Apr 2023 22:45:44 -0700 > > From: Jim Porter <jporterbugs@gmail.com> > > Cc: acm@muc.de, emacs-devel@gnu.org > > There's also one more commit I'm not quite sure what to do about: > > 0a95a81d8d36722ccf030a6194ecd953fc257a59. It has this in the commit message: > > This fixes bug #61144. If the space around the * is "symmetric" we > > leave foo > > * bar unfontified, a multiplication operation. If it is > > "asymmetric" we > > fontify it as a pointer declaration. > > The second line of this excerpt is treated like a file entry. Maybe our > > hook could allow that if it were clever enough, or maybe this is a rare > > enough occurrence that we should just have committers reformat the > > message slightly. > The latter, IMO. There's absolutely no reason for a log message to > have a '*' as the first non-whitespace character of a line. Fixing > that is easy. It can happen easily as a result of filling. I think it would be good if one of Jim's scripts flagged it up as an error. What is less easy than it might be, is correcting an erroneous commit message and submitting the commit again using this correction. The erroneous message remains in ..../.git/COMMIT_EDITMSG but I'm not sure how to get it back into one's editor at the next commit attempt. Maybe the error message could give some help, here. > Btw, Alan, this is a good example of your log messages, which are > filled using a much larger value of fill-column that the default. The > resulting ChangeLog entries appear ugly and stand out from the rest. > Would you please adjust your customizations to use a smaller > fill-column value, like at most 76, when writing commit log messages? I've been using a fill-column of 78, to match the maximum allowed width. I'll try cutting that down to 63, as recommended in CONTRIBUTE. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-15 10:44 ` Alan Mackenzie @ 2023-04-15 11:00 ` Eli Zaretskii 2023-04-21 22:16 ` Filipp Gunbin 1 sibling, 0 replies; 31+ messages in thread From: Eli Zaretskii @ 2023-04-15 11:00 UTC (permalink / raw) To: Alan Mackenzie; +Cc: jporterbugs, emacs-devel > Date: Sat, 15 Apr 2023 10:44:42 +0000 > Cc: Jim Porter <jporterbugs@gmail.com>, emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > > Btw, Alan, this is a good example of your log messages, which are > > filled using a much larger value of fill-column that the default. The > > resulting ChangeLog entries appear ugly and stand out from the rest. > > Would you please adjust your customizations to use a smaller > > fill-column value, like at most 76, when writing commit log messages? > > I've been using a fill-column of 78, to match the maximum allowed width. When we produce ChangeLog from the Git logs, the messages are indented on tab stop, so 78 is way too large. Even just "git log" will show you that 78 is too large, since Git indents the log messages by 4 spaces. > I'll try cutting that down to 63, as recommended in CONTRIBUTE. Thank you. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-15 10:44 ` Alan Mackenzie 2023-04-15 11:00 ` Eli Zaretskii @ 2023-04-21 22:16 ` Filipp Gunbin 1 sibling, 0 replies; 31+ messages in thread From: Filipp Gunbin @ 2023-04-21 22:16 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Eli Zaretskii, Jim Porter, emacs-devel Hi Alan, On 15/04/2023 10:44 +0000, Alan Mackenzie wrote: [...] > What is less easy than it might be, is correcting an erroneous commit > message and submitting the commit again using this correction. The > erroneous message remains in ..../.git/COMMIT_EDITMSG but I'm not sure > how to get it back into one's editor at the next commit attempt. > Maybe the error message could give some help, here. [...] *vc-log* remembers previous commit messages, and they can be brought back by M-p and M-n. Or are you talking about something different here? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-15 5:45 ` Jim Porter 2023-04-15 7:15 ` Eli Zaretskii @ 2023-04-15 20:54 ` Jim Porter 2023-04-15 21:23 ` Jim Porter 2023-04-21 4:59 ` Jim Porter 1 sibling, 2 replies; 31+ messages in thread From: Jim Porter @ 2023-04-15 20:54 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, emacs-devel [-- Attachment #1: Type: text/plain, Size: 3743 bytes --] On 4/14/2023 10:45 PM, Jim Porter wrote: > Here are a couple small fixes after testing against the 2000 latest > commits (though I haven't examined *all* the errors in detail). Ok, I've now gone over the 5000 latest commits in some closer detail. Here are the main classes of error I see: 1. Commit message lines starting with "*" that aren't file entries. 2. Commit messages from Eglot's import (these aren't *really* a problem, and I think authors.el is smart enough to handle them; if something like this happened in future, we could just use "git push --no-verify" to ignore this hook). 3. Obviously-wrong typos of file names (including files listed in the commit message that exist, but aren't in the diff). I think erroring out on all three of these cases makes sense. Number 2 isn't truly an error, but it would be hard to detect that reliably in the hook (at least not without adding a list of exceptions), and it's rare enough that we can just use "--no-verify" as needed (or fix up the commit messages during the import). There are a few cases that are more interesting though: A. Wildcards in filenames ------------------------- 626f2f7441, 623db40dd1, 4b5414abbc, c32212bf96, f4bfe7834a, 23c1ee6989, 7e185bc9ba, 36474a1e49, 5fb8b20fa3, c45bfd3c4a, fcc7373b28, 8e9835c673, 952550258d, d1a7d16f8e, dc083ebc4e, 5f74397490, 0f85f2c0e5, 49bad2a0a6, 5a77517e7d, 24b9515da0, ae3904bb5d, 228d9d615d, 8461cfc8fc, e402887d5d, ab7dddea90, 9e4f11a163, 6342264ef7 These cases use wildcards in the file names. I think we should avoid doing this, for the same reason the GNU ChangeLog standards say to avoid doing it for function names: "If you mention the names of the modified functions or variables, it’s important to name them in full. ... Subsequent maintainers will often search for a function name to find all the change log entries that pertain to it; if you abbreviate the name, they won’t find it when they search."[1] Therefore, I didn't do anything about these in the latest revision of the hook: they're still errors. For committers that don't want to list every file explicitly, they could just add an entry for the directory, *without* any wildcards. B. File entry for a directory, using a trailing "/" --------------------------------------------------- 9b80fe55f9, 28d0654943, d9c94e93b7, 7d0dc31833, 16975078b4, 07235678a4, c55f4055dd, 9451ea0a05, f342b7c969 I think this is perfectly ok, so I've fixed the hook to allow this (see attached). C. Line-wrapped file entries ---------------------------- 8675f4136c, 0e25a39e69, 7c79eea51d, 335a5fd173, 31f8ae53be These are cases where file entries extend multiple lines. In the first example, a single file is split onto two lines, which I don't think is feasible to identify. In the other cases, the entries are formatted like so: * path/to/file.txt, path/to/other/file.txt: Change something. I've partially fixed this so that it can properly parse the first line, but handling the second line is tricky, so I didn't bother. It would be better if committers used this format anyway: * path/to/file.txt: * path/to/other/file.txt: Change something. D. Merge commit with file entries (and no diff) ----------------------------------------------- 289b457cac This is an odd case: it's a merge commit that mentions a bunch of file changes, but there's no diff in this commit. I'm not really sure what to do about this one, but maybe we shouldn't add file entries for cases like this under normal circumstances. If there's an exception to that rule, we can always push a commit like this with "--no-verify". [1] https://www.gnu.org/prep/standards/html_node/Style-of-Change-Logs.html#Style-of-Change-Logs [-- Attachment #2: 0001-Add-Git-hooks-to-check-filenames-listed-in-the-commi.patch --] [-- Type: text/plain, Size: 8987 bytes --] From db23241d5edd61acca14899c804eb39e13385879 Mon Sep 17 00:00:00 2001 From: Jim Porter <jporterbugs@gmail.com> Date: Wed, 12 Apr 2023 23:03:31 -0700 Subject: [PATCH] Add Git hooks to check filenames listed in the commit message * build-aux/git-hooks/commit-msg-files.awk: * build-aux/git-hooks/post-commit: * build-aux/git-hooks/pre-push: New files... * autogen.sh: ... add them. --- autogen.sh | 3 +- build-aux/git-hooks/commit-msg-files.awk | 113 +++++++++++++++++++++++ build-aux/git-hooks/post-commit | 29 ++++++ build-aux/git-hooks/pre-push | 70 ++++++++++++++ 4 files changed, 214 insertions(+), 1 deletion(-) create mode 100644 build-aux/git-hooks/commit-msg-files.awk create mode 100755 build-aux/git-hooks/post-commit create mode 100755 build-aux/git-hooks/pre-push diff --git a/autogen.sh b/autogen.sh index af4c2ad14df..6127e7b24f4 100755 --- a/autogen.sh +++ b/autogen.sh @@ -340,7 +340,8 @@ hooks= tailored_hooks= sample_hooks= -for hook in commit-msg pre-commit prepare-commit-msg; do +for hook in commit-msg pre-commit prepare-commit-msg post-commit \ + pre-push commit-msg-files.awk; do cmp -- build-aux/git-hooks/$hook "$hooks/$hook" >/dev/null 2>&1 || tailored_hooks="$tailored_hooks $hook" done diff --git a/build-aux/git-hooks/commit-msg-files.awk b/build-aux/git-hooks/commit-msg-files.awk new file mode 100644 index 00000000000..4566905657d --- /dev/null +++ b/build-aux/git-hooks/commit-msg-files.awk @@ -0,0 +1,113 @@ +# Check the file list of GNU Emacs change log entries for each commit SHA. + +# Copyright 2023 Free Software Foundation, Inc. + +# This file is part of GNU Emacs. + +# GNU Emacs is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# GNU Emacs is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +### Commentary: + +# This script accepts a list of (unabbreviated) Git commit SHAs, and +# will then iterate over them to check that any files mentioned in the +# commit message are actually present in the commit's diff. If not, +# it will print out the incorrect file names and return 1. + +# You can also pass "-v reason=pre-push", which will add more-verbose +# output, indicate the abbreviated commit SHA and first line of the +# commit message for any improper commits. + +### Code: + +function get_commit_changes(commit_sha, changes, cmd, i, j, len, \ + bits, filename) { + # Collect all the files touched in the specified commit. + cmd = ("git log -1 --name-status --format= " commit_sha) + while ((cmd | getline) > 0) { + for (i = 2; i <= NF; i++) { + len = split($i, bits, "/") + for (j = 1; j <= len; j++) { + if (j == 1) + filename = bits[j] + else + filename = filename "/" bits[j] + changes[filename] = 1 + } + } + } + close(cmd) +} + +function check_commit_msg_files(commit_sha, verbose, changes, good, \ + cmd, msg, filenames_str, filenames, i) { + get_commit_changes(commit_sha, changes) + good = 1 + + cmd = ("git log -1 --format=%B " commit_sha) + while ((cmd | getline) > 0) { + if (verbose && ! msg) + msg = $0 + + # Find lines that reference files. We look at any line starting + # with "*" (possibly prefixed by "; ") where the file part starts + # with an alphanumeric character. The file part ends if we + # encounter any of the following characters: [ ( < { : + if (/^(; )?\*[ \t]+[[:alnum:]]/ && match($0, /[[:alnum:]][^[(<{:]*/)) { + # There might be multiple files listed on this line, separated + # by spaces (and possibly a comma). Iterate over each of them. + split(substr($0, RSTART, RLENGTH), filenames, ",?([[:blank:]]+|$)") + + for (i in filenames) { + # Remove trailing slashes from any directory entries. + sub(/\/$/, "", filenames[i]) + + if (length(filenames[i]) && ! (filenames[i] in changes)) { + if (good) { + # Print a header describing the error. + if (verbose) + printf("In commit %s \"%s\"...\n", substr(commit_sha, 1, 10), msg) + printf("Files listed in commit message, but not in diff:\n") + } + printf(" %s\n", filenames[i]) + good = 0 + } + } + } + } + close(cmd) + + return good +} + +BEGIN { + if (reason == "pre-push") + verbose = 1 +} + +/^[a-z0-9]{40}$/ { + if (! check_commit_msg_files($0, verbose)) { + status = 1 + } +} + +END { + if (status != 0) { + if (reason == "pre-push") + error_msg = "Push aborted" + else + error_msg = "Bad commit message" + printf("%s; please see the file 'CONTRIBUTE'\n", error_msg) + } + exit status +} diff --git a/build-aux/git-hooks/post-commit b/build-aux/git-hooks/post-commit new file mode 100755 index 00000000000..4c30ec76e02 --- /dev/null +++ b/build-aux/git-hooks/post-commit @@ -0,0 +1,29 @@ +#!/bin/sh +# Check the file list of GNU Emacs change log entries after committing. + +# Copyright 2023 Free Software Foundation, Inc. + +# This file is part of GNU Emacs. + +# GNU Emacs is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# GNU Emacs is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +# Prefer gawk if available, as it handles NUL bytes properly. +if type gawk >/dev/null 2>&1; then + awk="gawk" +else + awk="awk" +fi + +git rev-parse HEAD | $awk -v reason=post-commit \ + -f .git/hooks/commit-msg-files.awk diff --git a/build-aux/git-hooks/pre-push b/build-aux/git-hooks/pre-push new file mode 100755 index 00000000000..b0185a97b28 --- /dev/null +++ b/build-aux/git-hooks/pre-push @@ -0,0 +1,70 @@ +#!/bin/sh +# Check the file list of GNU Emacs change log entries before pushing. + +# Copyright 2023 Free Software Foundation, Inc. + +# This file is part of GNU Emacs. + +# GNU Emacs is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# GNU Emacs is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +# Prefer gawk if available, as it handles NUL bytes properly. +if type gawk >/dev/null 2>&1; then + awk="gawk" +else + awk="awk" +fi + +# Standard input receives lines of the form: +# <local ref> SP <local name> SP <remote ref> SP <remote name> LF +$awk -v origin_name="$1" ' + # If the local SHA is all zeroes, ignore it. + $2 ~ /^0{40}$/ { + next + } + + $2 ~ /^[a-z0-9]{40}$/ { + newref = $2 + # If the remote SHA is all zeroes, this is a new object to be + # pushed (likely a branch). Go backwards until we find a SHA on + # an origin branch. + if ($4 ~ /^0{40}$/) { + back = 0 + cmd = ("git branch -r -l '\''" origin_name "/*'\'' --contains " \ + newref "~" back) + while ((cmd | getline) == 0) { + + # Only look back at most 1000 commits, just in case... + if (back++ > 1000) + break; + } + close(cmd) + + cmd = ("git rev-parse " newref "~" back) + cmd | getline oldref + if (!(oldref ~ /^[a-z0-9]{40}$/)) { + # The SHA is misformatted! Skip this line. + next + } + close(cmd) + } else if ($4 ~ /^[a-z0-9]{40}$/) { + oldref = $4 + } else { + # The SHA is misformatted! Skip this line. + next + } + + # Print every SHA after oldref, up to (and including) newref. + system("git rev-list --reverse " oldref ".." newref) + } +' | $awk -v reason=pre-push -f .git/hooks/commit-msg-files.awk -- 2.25.1 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-15 20:54 ` Jim Porter @ 2023-04-15 21:23 ` Jim Porter 2023-04-16 5:43 ` Eli Zaretskii 2023-04-21 4:59 ` Jim Porter 1 sibling, 1 reply; 31+ messages in thread From: Jim Porter @ 2023-04-15 21:23 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, emacs-devel On 4/15/2023 1:54 PM, Jim Porter wrote: > On 4/14/2023 10:45 PM, Jim Porter wrote: >> Here are a couple small fixes after testing against the 2000 latest >> commits (though I haven't examined *all* the errors in detail). > > Ok, I've now gone over the 5000 latest commits in some closer detail. Oh, and just to give an idea of how common incorrect commit messages are, my hook says that out of the last 5000 commits, 866 have a problem, excluding the Eglot commits. If I also ignore lines starting with "*" that have no "/" in them (these are almost always due to an unnecessary leading "*" on the first line of the commit), we get down to 196 bad commit messages, or 3.9% of commits. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-15 21:23 ` Jim Porter @ 2023-04-16 5:43 ` Eli Zaretskii 2023-04-16 20:06 ` Jim Porter 0 siblings, 1 reply; 31+ messages in thread From: Eli Zaretskii @ 2023-04-16 5:43 UTC (permalink / raw) To: Jim Porter; +Cc: acm, emacs-devel > Date: Sat, 15 Apr 2023 14:23:02 -0700 > From: Jim Porter <jporterbugs@gmail.com> > Cc: acm@muc.de, emacs-devel@gnu.org > > Oh, and just to give an idea of how common incorrect commit messages > are, my hook says that out of the last 5000 commits, 866 have a problem, > excluding the Eglot commits. > > If I also ignore lines starting with "*" that have no "/" in them (these > are almost always due to an unnecessary leading "*" on the first line of > the commit), we get down to 196 bad commit messages, or 3.9% of commits. And that's _a_lot_, no matter how you count them. Which should give people here some idea why I raised this issue in the first place: I needed to fix all those cases by hand while working on the pretest tarball. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-16 5:43 ` Eli Zaretskii @ 2023-04-16 20:06 ` Jim Porter 2023-04-16 20:19 ` Michael Albinus 2023-04-17 2:22 ` Eli Zaretskii 0 siblings, 2 replies; 31+ messages in thread From: Jim Porter @ 2023-04-16 20:06 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, emacs-devel On 4/15/2023 10:43 PM, Eli Zaretskii wrote: > And that's _a_lot_, no matter how you count them. Which should give > people here some idea why I raised this issue in the first place: I > needed to fix all those cases by hand while working on the pretest > tarball. Yeah, it's quite a bit more than I expected. Overall, I think the hooks are working pretty well now, and all the false positives should (hopefully) be resolved. Assuming there are no further issues, I'll merge them in a few days. Worst-case scenario, committers would just need to pass "--no-verify" to skip the hooks (or uninstall them temporarily), and then file a bug about any problems they find with the hooks. Another, more-general thing we could consider is to add a job on EMBA that tries to run through the automatable steps of making pretest tarballs, just to see if they complete without errors. Then we could set that to run, say, once a week, so we could identify problems earlier and fix them (e.g. by adding new Git hooks, or just sending a message to emacs-devel about it). ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-16 20:06 ` Jim Porter @ 2023-04-16 20:19 ` Michael Albinus 2023-04-17 2:22 ` Eli Zaretskii 1 sibling, 0 replies; 31+ messages in thread From: Michael Albinus @ 2023-04-16 20:19 UTC (permalink / raw) To: Jim Porter; +Cc: Eli Zaretskii, acm, emacs-devel Jim Porter <jporterbugs@gmail.com> writes: Hi Jim, > Another, more-general thing we could consider is to add a job on EMBA > that tries to run through the automatable steps of making pretest > tarballs, just to see if they complete without errors. Then we could > set that to run, say, once a week, so we could identify problems > earlier and fix them (e.g. by adding new Git hooks, or just sending a > message to emacs-devel about it). Patches welcome :-) Best regards, Michael. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-16 20:06 ` Jim Porter 2023-04-16 20:19 ` Michael Albinus @ 2023-04-17 2:22 ` Eli Zaretskii 2023-04-17 7:28 ` Michael Albinus 1 sibling, 1 reply; 31+ messages in thread From: Eli Zaretskii @ 2023-04-17 2:22 UTC (permalink / raw) To: Jim Porter; +Cc: acm, emacs-devel > Date: Sun, 16 Apr 2023 13:06:58 -0700 > Cc: acm@muc.de, emacs-devel@gnu.org > From: Jim Porter <jporterbugs@gmail.com> > > Another, more-general thing we could consider is to add a job on EMBA > that tries to run through the automatable steps of making pretest > tarballs, just to see if they complete without errors. Then we could set > that to run, say, once a week, so we could identify problems earlier and > fix them (e.g. by adding new Git hooks, or just sending a message to > emacs-devel about it). That'd be too much, I think. We only need to do these jobs once in a blue moon, so having it bother us once a week would be an unnecessary annoyance, I think. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-17 2:22 ` Eli Zaretskii @ 2023-04-17 7:28 ` Michael Albinus 0 siblings, 0 replies; 31+ messages in thread From: Michael Albinus @ 2023-04-17 7:28 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Jim Porter, acm, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: Hi Eli, >> Another, more-general thing we could consider is to add a job on EMBA >> that tries to run through the automatable steps of making pretest >> tarballs, just to see if they complete without errors. Then we could set >> that to run, say, once a week, so we could identify problems earlier and >> fix them (e.g. by adding new Git hooks, or just sending a message to >> emacs-devel about it). > > That'd be too much, I think. We only need to do these jobs once in a > blue moon, so having it bother us once a week would be an unnecessary > annoyance, I think. It's up to us. We could define another schedule, how often to run. We could run it only when a given tag (specified by a regexp) has been applied in git. We could mark to run this job only manually. A combination of this. Whatever. Best regards, Michael. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-15 20:54 ` Jim Porter 2023-04-15 21:23 ` Jim Porter @ 2023-04-21 4:59 ` Jim Porter 1 sibling, 0 replies; 31+ messages in thread From: Jim Porter @ 2023-04-21 4:59 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, emacs-devel On 4/15/2023 1:54 PM, Jim Porter wrote: > On 4/14/2023 10:45 PM, Jim Porter wrote: >> Here are a couple small fixes after testing against the 2000 latest >> commits (though I haven't examined *all* the errors in detail). > > Ok, I've now gone over the 5000 latest commits in some closer detail. I've now merged this patch as commit 4416262f59f and sent a separate announcement to the list so that (hopefully) everyone will see it. I'm reasonably sure I've ironed out all the bugs, but just let me know if I missed something. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-15 3:41 ` Jim Porter 2023-04-15 5:45 ` Jim Porter @ 2023-04-15 7:08 ` Eli Zaretskii 1 sibling, 0 replies; 31+ messages in thread From: Eli Zaretskii @ 2023-04-15 7:08 UTC (permalink / raw) To: Jim Porter; +Cc: acm, emacs-devel > Date: Fri, 14 Apr 2023 20:41:49 -0700 > Cc: acm@muc.de, emacs-devel@gnu.org > From: Jim Porter <jporterbugs@gmail.com> > > > Also, it looks like your script doesn't recognize file names in a line > > that starts with a semi-colon, as in this commit: > > I fixed this case, though as far as I can tell, authors.el doesn't look > at lines like this. (I could be wrong, since I just read over that code > briefly.) authors.el ignores such lines because they don't appear in ChangeLog files: they are ignored when generating ChangeLog from the Git log. (authors.el works on ChangeLog* files, not on the Git logs.) But that doesn't mean we don't want the file names in those logs to be correct; this script we are discussing is not just for the benefit of producing AUTHORS with less manual work, it is about our commit logs being cleaner. Thanks. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-11 19:36 ` Eli Zaretskii 2023-04-12 0:20 ` Jim Porter @ 2023-04-12 9:41 ` Alan Mackenzie 2023-04-12 10:14 ` Eli Zaretskii 1 sibling, 1 reply; 31+ messages in thread From: Alan Mackenzie @ 2023-04-12 9:41 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Jim Porter, emacs-devel Hello, Eli. On Tue, Apr 11, 2023 at 22:36:48 +0300, Eli Zaretskii wrote: > > Date: Tue, 11 Apr 2023 12:27:40 -0700 > > Cc: acm@muc.de, emacs-devel@gnu.org > > From: Jim Porter <jporterbugs@gmail.com> > > One issue with the current implementation (mentioned elsewhere in this > > thread) is that it doesn't work for deleting a file from the repo. > > Fixing this in the commit-msg hook is tricky. We can get the list of > > changed files via "git diff --name-only" (and then we could compare our > > commit message against that list), but there's a problem: I don't know > > of a good way to detect when the user is amending a commit[1]. For a > > normal commit, you'd get the changed files via something like "git diff > > --staged --name-only", but for an amended commit, you'd want to add > > "HEAD^" to that command. > If this will cry wolf on every removal or rename of a file, then this > cure is worse than the disease, IMNSHO. I'd prefer to waste a few > hours of my time when preparing the tarball than risk annoying > everyone with such false positives. OK, so checking the existence of files isn't a good way to tackle the problem. Jim's got some good ideas for an alternative. Coming back to the other part of the problem, what exactly happened when you ran `authors'? When I tried it yesterday, apart from me not having the locale en_US.UTF-8, it ran to completion, producing the output and a list of missing files together with their "authors". What am I missing? [ .... ] -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-12 9:41 ` Alan Mackenzie @ 2023-04-12 10:14 ` Eli Zaretskii 0 siblings, 0 replies; 31+ messages in thread From: Eli Zaretskii @ 2023-04-12 10:14 UTC (permalink / raw) To: Alan Mackenzie; +Cc: jporterbugs, emacs-devel > Date: Wed, 12 Apr 2023 09:41:29 +0000 > Cc: Jim Porter <jporterbugs@gmail.com>, emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > Coming back to the other part of the problem, what exactly happened when > you ran `authors'? When I tried it yesterday, apart from me not having > the locale en_US.UTF-8, it ran to completion, producing the output and a > list of missing files together with their "authors". What am I missing? You are missing the changes to the databases of authors.el and to the ChangeLog.4 file as it was originally generated from Git logs, which I made shortly prior to the pretest. Of course "M-x authors" runs cleanly now: that was why I needed to do all that work back then. AFAIR, most of your mistakes were corrected in ChangeLog.4 itself, before it was committed for the first time, so there's no easy way of seeing what I fixed there. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Mistakes in commit log messages 2023-04-11 18:31 ` Jim Porter 2023-04-11 18:45 ` Eli Zaretskii @ 2023-04-12 9:32 ` Alan Mackenzie 1 sibling, 0 replies; 31+ messages in thread From: Alan Mackenzie @ 2023-04-12 9:32 UTC (permalink / raw) To: Jim Porter; +Cc: Eli Zaretskii, emacs-devel Hello, Jim. On Tue, Apr 11, 2023 at 11:31:55 -0700, Jim Porter wrote: > On 4/11/2023 7:01 AM, Alan Mackenzie wrote: > > > > + /^\* [a-zA-Z0-9_.~#-]/ { > > + nfiles++ > > + if ((rc = (getline x < $2)) < 0) { > > + status = 1 > > + print "File " $2 " cannot be read: [" ERRNO "]" > > + } > > + close($2) > > + } > One thing to be careful of here (which is why I did the 'match'/'substr' > dance in my patch) is that we need to be able to handle lines like this > correctly: > * some/file.el: New file. Yes, you're right, thanks. These little things are always tripping me up. [ .... ] > Also, using 'getline' will work for checking files, but not directories. > I'm not sure this ever comes up in practice, but it might occur once in > a while if we import a large package into the Emacs tree. We could do a > bit of feature-checking and upgrade to using 'stat' if we have a newish > gawk. That way, gawk users would get proper checks for this (rare) case. > (This might not be strictly necessary.) This, too. > Finally, I think it would make sense to have this be a purely advisory > warning for now so that we could check it into the Emacs tree soon-ish. > As Eli suggests, we can try running with this hook locally, but since > I've already added some Lisp code to generate the changelog for my > workflow, I'm probably never going to trigger this myself. If this check > is just a warning, I think we could be a little more aggressive in > merging this, since it shouldn't break anyone's workflows. I think Eli's already decided the idea of checking for the existence of files isn't a good one, because of the problem of deleting and renaming files. But also, I think having the script just warn is not useful. Git purposefully makes it impossible to correct mistakes in a commit message, so we need to make sure the things are as correct as possible to begin with. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2023-04-21 22:16 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <835ya5m4p0.fsf@gnu.org> [not found] ` <ZDPkykCsW3i30UR9@ACM> [not found] ` <83v8i4arzt.fsf@gnu.org> [not found] ` <CANh=_JF0CEPDsWZSuyy9ymByma2LxcypP90O3-LQ+KhoJ8cqvg@mail.gmail.com> [not found] ` <CANh=_JEO4-E79dPCLc3cRLi7=ftAzc+H1FC46eck1vJN3TD3Sg@mail.gmail.com> 2023-04-11 6:02 ` Mistakes in commit log messages Eli Zaretskii 2023-04-11 14:01 ` Alan Mackenzie 2023-04-11 14:57 ` Eli Zaretskii 2023-04-11 17:20 ` Alan Mackenzie 2023-04-11 18:00 ` Eli Zaretskii 2023-04-11 18:31 ` Jim Porter 2023-04-11 18:45 ` Eli Zaretskii 2023-04-11 19:27 ` Jim Porter 2023-04-11 19:36 ` Eli Zaretskii 2023-04-12 0:20 ` Jim Porter 2023-04-13 6:18 ` Jim Porter 2023-04-13 6:49 ` Eli Zaretskii 2023-04-13 7:47 ` Robert Pluim 2023-04-15 3:41 ` Jim Porter 2023-04-15 5:45 ` Jim Porter 2023-04-15 7:15 ` Eli Zaretskii 2023-04-15 10:44 ` Alan Mackenzie 2023-04-15 11:00 ` Eli Zaretskii 2023-04-21 22:16 ` Filipp Gunbin 2023-04-15 20:54 ` Jim Porter 2023-04-15 21:23 ` Jim Porter 2023-04-16 5:43 ` Eli Zaretskii 2023-04-16 20:06 ` Jim Porter 2023-04-16 20:19 ` Michael Albinus 2023-04-17 2:22 ` Eli Zaretskii 2023-04-17 7:28 ` Michael Albinus 2023-04-21 4:59 ` Jim Porter 2023-04-15 7:08 ` Eli Zaretskii 2023-04-12 9:41 ` Alan Mackenzie 2023-04-12 10:14 ` Eli Zaretskii 2023-04-12 9:32 ` Alan Mackenzie
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).