unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: Mistakes in commit log messages
       [not found]       ` <CANh=_JEO4-E79dPCLc3cRLi7=ftAzc+H1FC46eck1vJN3TD3Sg@mail.gmail.com>
@ 2023-04-11  6:02         ` Eli Zaretskii
  2023-04-11 14:01           ` Alan Mackenzie
  0 siblings, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2023-04-11  6:02 UTC (permalink / raw)
  To: Jim Porter; +Cc: emacs-devel

> From: Jim Porter <jporterbugs@gmail.com>
> Date: Mon, 10 Apr 2023 14:52:15 -0700
> Cc: Alan Mackenzie <acm@muc.de>, philipk@posteo.net, luangruo@yahoo.com
> 
> 
> On Mon, Apr 10, 2023 at 10:18 AM Jim Porter <jporterbugs@gmail.com> wrote:
> > I looked into doing this, and I think it'd be possible to extend the
> > existing commit-msg hook (in build-aux/git-hooks) to do this, at least
> > using gawk. I don't really know awk though, so I'm sure my solution
> > would be clumsy and probably gawk-specific. I wonder if we could make
> > the hooks use Emacs Lisp...
> 
> If someone could figure out how to disable this code on non-gawk awks,
> I think the attached diff should do the trick. Any thoughts?

I think a solution that doesn't use Gawk-specific features would be
preferable, since no one said the mistakes are private only to users
of GNU/Linux and MS-Windows, where Gawk is basically the only Awk.

For the other readers of emacs-devel: this came from a private email I
wrote to several of our active contributors telling them that their
commit log messages included a substantial number of mistakes in file
names mentioned in the log message.  The admin/authors.el program
discovered those mistakes while trying to generate attributions for
who did what in Emacs (the etc/AUTHORS file).  Someone suggested to
augment our commit hooks to avoid such mistakes, at least those of
them that can be easily detected by a simple script.

The script suggested by Jim is below:

> diff --git a/build-aux/git-hooks/commit-msg b/build-aux/git-hooks/commit-msg
> index d0578bcfb46..cdc99f4b399 100755
> --- a/build-aux/git-hooks/commit-msg
> +++ b/build-aux/git-hooks/commit-msg
> @@ -45,6 +45,7 @@ at_sign=
>  
>  # Check the log entry.
>  exec $awk -v at_sign="$at_sign" -v cent_sign="$cent_sign" -v file="$1" '
> +  @load "filefuncs"
>    BEGIN {
>      # These regular expressions assume traditional Unix unibyte behavior.
>      # They are needed for old or broken versions of awk, e.g.,
> @@ -129,6 +130,18 @@ at_sign=
>      status = 1
>    }
>  
> +  /^* / {
> +    # Check that any filenames mentioned in the commit message
> +    # actually exist.  Currently, this only prints a warning to
> +    # prevent potential issues with false positives.
> +    if(match($2, "[^:/][^:]*")) {
> +      FILE = substr($2, RSTART, RLENGTH)
> +      if(stat(FILE, type) < 0) {
> +        printf("Warning: file '\''%s'\'' in commit message not found\n", FILE)
> +      }
> +    }
> +  }
> +
>    $0 ~ unsafe_gnu_url {
>      needs_rewriting = 1
>    }



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-11  6:02         ` Mistakes in commit log messages Eli Zaretskii
@ 2023-04-11 14:01           ` Alan Mackenzie
  2023-04-11 14:57             ` Eli Zaretskii
  2023-04-11 18:31             ` Jim Porter
  0 siblings, 2 replies; 31+ messages in thread
From: Alan Mackenzie @ 2023-04-11 14:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Jim Porter, emacs-devel

Hello, Eli and Jim.

On Tue, Apr 11, 2023 at 09:02:05 +0300, Eli Zaretskii wrote:
> > From: Jim Porter <jporterbugs@gmail.com>
> > Date: Mon, 10 Apr 2023 14:52:15 -0700
> > Cc: Alan Mackenzie <acm@muc.de>, philipk@posteo.net, luangruo@yahoo.com


> > On Mon, Apr 10, 2023 at 10:18 AM Jim Porter <jporterbugs@gmail.com> wrote:
> > > I looked into doing this, and I think it'd be possible to extend the
> > > existing commit-msg hook (in build-aux/git-hooks) to do this, at least
> > > using gawk. I don't really know awk though, so I'm sure my solution
> > > would be clumsy and probably gawk-specific. I wonder if we could make
> > > the hooks use Emacs Lisp...

> > If someone could figure out how to disable this code on non-gawk awks,
> > I think the attached diff should do the trick. Any thoughts?

> I think a solution that doesn't use Gawk-specific features would be
> preferable, since no one said the mistakes are private only to users
> of GNU/Linux and MS-Windows, where Gawk is basically the only Awk.

> For the other readers of emacs-devel: this came from a private email I
> wrote to several of our active contributors telling them that their
> commit log messages included a substantial number of mistakes in file
> names mentioned in the log message.  The admin/authors.el program
> discovered those mistakes while trying to generate attributions for
> who did what in Emacs (the etc/AUTHORS file).  Someone suggested to
> augment our commit hooks to avoid such mistakes, at least those of
> them that can be easily detected by a simple script.

> The script suggested by Jim is below:

> > diff --git a/build-aux/git-hooks/commit-msg b/build-aux/git-hooks/commit-msg
> > index d0578bcfb46..cdc99f4b399 100755
> > --- a/build-aux/git-hooks/commit-msg
> > +++ b/build-aux/git-hooks/commit-msg
> > @@ -45,6 +45,7 @@ at_sign=

> >  # Check the log entry.
> >  exec $awk -v at_sign="$at_sign" -v cent_sign="$cent_sign" -v file="$1" '
> > +  @load "filefuncs"
> >    BEGIN {
> >      # These regular expressions assume traditional Unix unibyte behavior.
> >      # They are needed for old or broken versions of awk, e.g.,
> > @@ -129,6 +130,18 @@ at_sign=
> >      status = 1
> >    }

> > +  /^* / {
> > +    # Check that any filenames mentioned in the commit message
> > +    # actually exist.  Currently, this only prints a warning to
> > +    # prevent potential issues with false positives.
> > +    if(match($2, "[^:/][^:]*")) {
> > +      FILE = substr($2, RSTART, RLENGTH)
> > +      if(stat(FILE, type) < 0) {
> > +        printf("Warning: file '\''%s'\'' in commit message not found\n", FILE)
> > +      }
> > +    }
> > +  }
> > +
> >    $0 ~ unsafe_gnu_url {
> >      needs_rewriting = 1
> >    }

After having to ask on the help-gawk mailing list how to do it, I've got
a suggestion that uses only AWK, and checks for the existence of each
file in a "* foo..." line by attempting to read the first line from it.
It also reports an error if there are no such lines (it is possible the
contributor forgot to include the "* " in his file lines).



--- commit-msg	2023-01-15 15:01:05.006074916 +0000
+++ commit-msg.acm	2023-04-11 13:59:18.517300896 +0000
@@ -138,11 +138,24 @@
     status = 1
   }
 
+  /^\* [a-zA-Z0-9_.~#-]/ {
+    nfiles++
+    if ((rc = (getline x < $2)) < 0) {
+      status = 1
+      print "File " $2 " cannot be read: [" ERRNO "]"
+    }
+    close($2)
+  }
+
   END {
     if (nlines == 0) {
       print "Empty commit message"
       status = 1
     }
+    if (!nfiles) {
+      print "No file lines in commit message"
+      status = 1
+    }
     if (status == 0 && needs_rewriting) {
       for (i = 1; i <= NR; i++) {
 	line = input[i]


-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-11 14:01           ` Alan Mackenzie
@ 2023-04-11 14:57             ` Eli Zaretskii
  2023-04-11 17:20               ` Alan Mackenzie
  2023-04-11 18:31             ` Jim Porter
  1 sibling, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2023-04-11 14:57 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: jporterbugs, emacs-devel

> Date: Tue, 11 Apr 2023 14:01:48 +0000
> Cc: Jim Porter <jporterbugs@gmail.com>, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> +  /^\* [a-zA-Z0-9_.~#-]/ {
> +    nfiles++
> +    if ((rc = (getline x < $2)) < 0) {
> +      status = 1
> +      print "File " $2 " cannot be read: [" ERRNO "]"

The error message should explicitly mention an error in the file name
or its leading directories, at least if ERRNO indicates the file does
not exist.  "File FOO cannot be read is too "technical" and doesn't
explain what we are checking.

Also, I'm not sure what is CWD when this script runs.  Suppose I
invoke "git commit" from a subdirectory of the tree root -- does this
find files in that case?

> +    if (!nfiles) {
> +      print "No file lines in commit message"
> +      status = 1

This was never an error before, so shouldn't be an error now.  Log
messages saying just

  Fix last change

are quite common, and I don't want to disallow them.

In any case, I suggest that you (and maybe others) try running with
this for some time, to make sure there are no regressions.

Thanks.

P.S. Is 'getline' available in non-GNU Awk's?



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-11 14:57             ` Eli Zaretskii
@ 2023-04-11 17:20               ` Alan Mackenzie
  2023-04-11 18:00                 ` Eli Zaretskii
  0 siblings, 1 reply; 31+ messages in thread
From: Alan Mackenzie @ 2023-04-11 17:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: jporterbugs, emacs-devel

Hello, Eli.

On Tue, Apr 11, 2023 at 17:57:35 +0300, Eli Zaretskii wrote:
> > Date: Tue, 11 Apr 2023 14:01:48 +0000
> > Cc: Jim Porter <jporterbugs@gmail.com>, emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>

> > +  /^\* [a-zA-Z0-9_.~#-]/ {
> > +    nfiles++
> > +    if ((rc = (getline x < $2)) < 0) {
> > +      status = 1
> > +      print "File " $2 " cannot be read: [" ERRNO "]"

> The error message should explicitly mention an error in the file name
> or its leading directories, at least if ERRNO indicates the file does
> not exist.  "File FOO cannot be read is too "technical" and doesn't
> explain what we are checking.

You mean something like:

    print "Error in file or directory name " $2 " [" ERRNO "]"

?  I'm beginning to get a bit concerned about what happens after a file
has been removed from git, or renamed.  At least "cannot be read: " is
honest about what caused the error in the script.  Maybe I could add
another line suggesting how to bypass the checks in commit-msg.  What do
you think?

> Also, I'm not sure what is CWD when this script runs.  Suppose I
> invoke "git commit" from a subdirectory of the tree root -- does this
> find files in that case?

The current working directory is always the main working directory (see
man githooks(1)) unless it's a bare commit.

> > +    if (!nfiles) {
> > +      print "No file lines in commit message"
> > +      status = 1

> This was never an error before, so shouldn't be an error now.  Log
> messages saying just

>   Fix last change

> are quite common, and I don't want to disallow them.

Ah, OK.  It's just I've been known to neglect the "* " at times.  I'll
remove that bit, and submit a revised patch soon.

> In any case, I suggest that you (and maybe others) try running with
> this for some time, to make sure there are no regressions.

Will do.

> Thanks.

> P.S. Is 'getline' available in non-GNU Awk's?

The gawk manual implies strongly it is available in POSIX AWKs.  It
might even explicitly say so somewhere.  So I strongly suspect the
answer to your question is yes, but perhaps not in every AWK.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-11 17:20               ` Alan Mackenzie
@ 2023-04-11 18:00                 ` Eli Zaretskii
  0 siblings, 0 replies; 31+ messages in thread
From: Eli Zaretskii @ 2023-04-11 18:00 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: jporterbugs, emacs-devel

> Date: Tue, 11 Apr 2023 17:20:13 +0000
> Cc: jporterbugs@gmail.com, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > > +  /^\* [a-zA-Z0-9_.~#-]/ {
> > > +    nfiles++
> > > +    if ((rc = (getline x < $2)) < 0) {
> > > +      status = 1
> > > +      print "File " $2 " cannot be read: [" ERRNO "]"
> 
> > The error message should explicitly mention an error in the file name
> > or its leading directories, at least if ERRNO indicates the file does
> > not exist.  "File FOO cannot be read is too "technical" and doesn't
> > explain what we are checking.
> 
> You mean something like:
> 
>     print "Error in file or directory name " $2 " [" ERRNO "]"
> 
> ?

I mean something like

  "File " $2 " does not exist; did you make a mistake in its name?"

> I'm beginning to get a bit concerned about what happens after a file
> has been removed from git, or renamed.

This needs to be detected and not flagged as an error.

> Maybe I could add another line suggesting how to bypass the checks
> in commit-msg.  What do you think?

No, I think we shouldn't make it easy for people to bypass the test,
if we want this to be useful.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-11 14:01           ` Alan Mackenzie
  2023-04-11 14:57             ` Eli Zaretskii
@ 2023-04-11 18:31             ` Jim Porter
  2023-04-11 18:45               ` Eli Zaretskii
  2023-04-12  9:32               ` Alan Mackenzie
  1 sibling, 2 replies; 31+ messages in thread
From: Jim Porter @ 2023-04-11 18:31 UTC (permalink / raw)
  To: Alan Mackenzie, Eli Zaretskii; +Cc: emacs-devel

On 4/11/2023 7:01 AM, Alan Mackenzie wrote:
>   
> +  /^\* [a-zA-Z0-9_.~#-]/ {
> +    nfiles++
> +    if ((rc = (getline x < $2)) < 0) {
> +      status = 1
> +      print "File " $2 " cannot be read: [" ERRNO "]"
> +    }
> +    close($2)
> +  }

One thing to be careful of here (which is why I did the 'match'/'substr' 
dance in my patch) is that we need to be able to handle lines like this 
correctly:

   * some/file.el: New file.

Normally, awk would split that so that $2 is "some/file.el:", which 
isn't right. Maybe there's a better method than my patch though. I'm 
definitely not an awk expert (this is the first awk code I've ever written).

Also, using 'getline' will work for checking files, but not directories. 
I'm not sure this ever comes up in practice, but it might occur once in 
a while if we import a large package into the Emacs tree. We could do a 
bit of feature-checking and upgrade to using 'stat' if we have a newish 
gawk. That way, gawk users would get proper checks for this (rare) case. 
(This might not be strictly necessary.)

Finally, I think it would make sense to have this be a purely advisory 
warning for now so that we could check it into the Emacs tree soon-ish. 
As Eli suggests, we can try running with this hook locally, but since 
I've already added some Lisp code to generate the changelog for my 
workflow, I'm probably never going to trigger this myself. If this check 
is just a warning, I think we could be a little more aggressive in 
merging this, since it shouldn't break anyone's workflows.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-11 18:31             ` Jim Porter
@ 2023-04-11 18:45               ` Eli Zaretskii
  2023-04-11 19:27                 ` Jim Porter
  2023-04-12  9:32               ` Alan Mackenzie
  1 sibling, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2023-04-11 18:45 UTC (permalink / raw)
  To: Jim Porter; +Cc: acm, emacs-devel

> Date: Tue, 11 Apr 2023 11:31:55 -0700
> Cc: emacs-devel@gnu.org
> From: Jim Porter <jporterbugs@gmail.com>
> 
> Finally, I think it would make sense to have this be a purely advisory 
> warning for now so that we could check it into the Emacs tree soon-ish. 

How will this help, given that some people use Magit to commit and
some use VC, not the Git commands from the shell?  Are we sure the
warnings will be seen, let alone acted upon?



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-11 18:45               ` Eli Zaretskii
@ 2023-04-11 19:27                 ` Jim Porter
  2023-04-11 19:36                   ` Eli Zaretskii
  0 siblings, 1 reply; 31+ messages in thread
From: Jim Porter @ 2023-04-11 19:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, emacs-devel

On 4/11/2023 11:45 AM, Eli Zaretskii wrote:
>> Date: Tue, 11 Apr 2023 11:31:55 -0700
>> Cc: emacs-devel@gnu.org
>> From: Jim Porter <jporterbugs@gmail.com>
>>
>> Finally, I think it would make sense to have this be a purely advisory
>> warning for now so that we could check it into the Emacs tree soon-ish.
> 
> How will this help, given that some people use Magit to commit and
> some use VC, not the Git commands from the shell?  Are we sure the
> warnings will be seen, let alone acted upon?

I don't know. If VC and Magit don't show messages from Git hooks, they 
probably should. Those messages are there for a reason, after all. 
However, warnings that only some users see would hopefully still help us 
get feedback on whether the validation works properly.

Some miscellaneous thoughts on this:

One issue with the current implementation (mentioned elsewhere in this 
thread) is that it doesn't work for deleting a file from the repo. 
Fixing this in the commit-msg hook is tricky. We can get the list of 
changed files via "git diff --name-only" (and then we could compare our 
commit message against that list), but there's a problem: I don't know 
of a good way to detect when the user is amending a commit[1]. For a 
normal commit, you'd get the changed files via something like "git diff 
--staged --name-only", but for an amended commit, you'd want to add 
"HEAD^" to that command.

Another option might be to do this as a pre-push hook. By then, the 
commits are finalized, so we can more easily compare the commit message 
and its diff. This is also nice because when the commit-msg hook errors 
out, it throws away the commit message; quite annoying if you just made 
a small typo! However, it does mean that committers would need to be 
comfortable with amending their commit messages if they hit this error.

(We could also check this in a post-commit hook so that committers are 
alerted *before* they try to push, but post-commit hooks can't abort the 
commit: it's already done! Maybe we could have a post-commit hook that 
warns the committer, and a pre-push hook that actually errors out.)

[1] The only way I've seen to do this is to look at the arguments from 
the parent (Git) process that called the hook: 
<https://stackoverflow.com/questions/19387073/how-to-detect-commit-amend-by-pre-commit-hook>. 
I'm not sure how reliably that works, especially on platforms like 
MS-Windows...



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-11 19:27                 ` Jim Porter
@ 2023-04-11 19:36                   ` Eli Zaretskii
  2023-04-12  0:20                     ` Jim Porter
  2023-04-12  9:41                     ` Alan Mackenzie
  0 siblings, 2 replies; 31+ messages in thread
From: Eli Zaretskii @ 2023-04-11 19:36 UTC (permalink / raw)
  To: Jim Porter; +Cc: acm, emacs-devel

> Date: Tue, 11 Apr 2023 12:27:40 -0700
> Cc: acm@muc.de, emacs-devel@gnu.org
> From: Jim Porter <jporterbugs@gmail.com>
> 
> One issue with the current implementation (mentioned elsewhere in this 
> thread) is that it doesn't work for deleting a file from the repo. 
> Fixing this in the commit-msg hook is tricky. We can get the list of 
> changed files via "git diff --name-only" (and then we could compare our 
> commit message against that list), but there's a problem: I don't know 
> of a good way to detect when the user is amending a commit[1]. For a 
> normal commit, you'd get the changed files via something like "git diff 
> --staged --name-only", but for an amended commit, you'd want to add 
> "HEAD^" to that command.

If this will cry wolf on every removal or rename of a file, then this
cure is worse than the disease, IMNSHO.  I'd prefer to waste a few
hours of my time when preparing the tarball than risk annoying
everyone with such false positives.

> [1] The only way I've seen to do this is to look at the arguments from 
> the parent (Git) process that called the hook: 
> <https://stackoverflow.com/questions/19387073/how-to-detect-commit-amend-by-pre-commit-hook>. 
> I'm not sure how reliably that works, especially on platforms like 
> MS-Windows...

The 'ps' part won't work on Windows.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-11 19:36                   ` Eli Zaretskii
@ 2023-04-12  0:20                     ` Jim Porter
  2023-04-13  6:18                       ` Jim Porter
  2023-04-12  9:41                     ` Alan Mackenzie
  1 sibling, 1 reply; 31+ messages in thread
From: Jim Porter @ 2023-04-12  0:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, emacs-devel

On 4/11/2023 12:36 PM, Eli Zaretskii wrote:
>> Date: Tue, 11 Apr 2023 12:27:40 -0700
>> Cc: acm@muc.de, emacs-devel@gnu.org
>> From: Jim Porter <jporterbugs@gmail.com>
>>
>> One issue with the current implementation (mentioned elsewhere in this
>> thread) is that it doesn't work for deleting a file from the repo.
>> Fixing this in the commit-msg hook is tricky. We can get the list of
>> changed files via "git diff --name-only" (and then we could compare our
>> commit message against that list), but there's a problem: I don't know
>> of a good way to detect when the user is amending a commit[1]. For a
>> normal commit, you'd get the changed files via something like "git diff
>> --staged --name-only", but for an amended commit, you'd want to add
>> "HEAD^" to that command.
> 
> If this will cry wolf on every removal or rename of a file, then this
> cure is worse than the disease, IMNSHO.  I'd prefer to waste a few
> hours of my time when preparing the tarball than risk annoying
> everyone with such false positives.

Yeah, I think doing this in a different hook would be less error-prone. 
Below is a demonstration of a post-commit hook that would warn the user 
about files listed in the commit message that aren't in the diff. The 
only potential for error I see is when creating or deleting entire 
directories, but that should be fixable by adding the parent directories 
of touched files into the 'CHANGES' variable.

Since this is a post-commit hook, it's only advisory, and can't error 
out. However, we could do something similar for a pre-push hook, except 
we'd iterate over each commit that's about to be pushed. A pre-push 
*could* error out, thus preventing users from pushing any 
badly-formatted commits upstream.

If we did something like this, I think both post-commit and pre-push 
hooks would be useful. The former gives committers immediate feedback on 
their commit message (if they read the output), while the latter is what 
ensures that the upstream commit messages are properly-formatted.

--------------------------------------------------

#!/bin/sh

git log -1 --format=%B | awk '
BEGIN {
   # Collect all the files touched in the last commit.
   while (("git log -1 --name-status --format=" | getline) > 0) {
     for (i = 2; i <= NF; i++) {
       CHANGES[$i] = 1
     }
   }
}

/^\* / {
   if (match($2, "[^:/][^:]*")) {
     FILE = substr($2, RSTART, RLENGTH)
     if (! (FILE in CHANGES)) {
       printf("File %s listed in commit message, but not in diff\n", FILE)
       status = 1
     }
   }
}

END {
   if (status != 0) {
     print "Bad commit; please see the file '\''CONTRIBUTE'\''"
   }
}
'



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-11 18:31             ` Jim Porter
  2023-04-11 18:45               ` Eli Zaretskii
@ 2023-04-12  9:32               ` Alan Mackenzie
  1 sibling, 0 replies; 31+ messages in thread
From: Alan Mackenzie @ 2023-04-12  9:32 UTC (permalink / raw)
  To: Jim Porter; +Cc: Eli Zaretskii, emacs-devel

Hello, Jim.

On Tue, Apr 11, 2023 at 11:31:55 -0700, Jim Porter wrote:
> On 4/11/2023 7:01 AM, Alan Mackenzie wrote:
> >   
> > +  /^\* [a-zA-Z0-9_.~#-]/ {
> > +    nfiles++
> > +    if ((rc = (getline x < $2)) < 0) {
> > +      status = 1
> > +      print "File " $2 " cannot be read: [" ERRNO "]"
> > +    }
> > +    close($2)
> > +  }

> One thing to be careful of here (which is why I did the 'match'/'substr' 
> dance in my patch) is that we need to be able to handle lines like this 
> correctly:

>    * some/file.el: New file.

Yes, you're right, thanks.  These little things are always tripping me
up.

[ .... ]

> Also, using 'getline' will work for checking files, but not directories. 
> I'm not sure this ever comes up in practice, but it might occur once in 
> a while if we import a large package into the Emacs tree. We could do a 
> bit of feature-checking and upgrade to using 'stat' if we have a newish 
> gawk. That way, gawk users would get proper checks for this (rare) case. 
> (This might not be strictly necessary.)

This, too.

> Finally, I think it would make sense to have this be a purely advisory 
> warning for now so that we could check it into the Emacs tree soon-ish. 
> As Eli suggests, we can try running with this hook locally, but since 
> I've already added some Lisp code to generate the changelog for my 
> workflow, I'm probably never going to trigger this myself. If this check 
> is just a warning, I think we could be a little more aggressive in 
> merging this, since it shouldn't break anyone's workflows.

I think Eli's already decided the idea of checking for the existence of
files isn't a good one, because of the problem of deleting and renaming
files.

But also, I think having the script just warn is not useful.  Git
purposefully makes it impossible to correct mistakes in a commit
message, so we need to make sure the things are as correct as possible
to begin with.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-11 19:36                   ` Eli Zaretskii
  2023-04-12  0:20                     ` Jim Porter
@ 2023-04-12  9:41                     ` Alan Mackenzie
  2023-04-12 10:14                       ` Eli Zaretskii
  1 sibling, 1 reply; 31+ messages in thread
From: Alan Mackenzie @ 2023-04-12  9:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Jim Porter, emacs-devel

Hello, Eli.

On Tue, Apr 11, 2023 at 22:36:48 +0300, Eli Zaretskii wrote:
> > Date: Tue, 11 Apr 2023 12:27:40 -0700
> > Cc: acm@muc.de, emacs-devel@gnu.org
> > From: Jim Porter <jporterbugs@gmail.com>

> > One issue with the current implementation (mentioned elsewhere in this 
> > thread) is that it doesn't work for deleting a file from the repo. 
> > Fixing this in the commit-msg hook is tricky. We can get the list of 
> > changed files via "git diff --name-only" (and then we could compare our 
> > commit message against that list), but there's a problem: I don't know 
> > of a good way to detect when the user is amending a commit[1]. For a 
> > normal commit, you'd get the changed files via something like "git diff 
> > --staged --name-only", but for an amended commit, you'd want to add 
> > "HEAD^" to that command.

> If this will cry wolf on every removal or rename of a file, then this
> cure is worse than the disease, IMNSHO.  I'd prefer to waste a few
> hours of my time when preparing the tarball than risk annoying
> everyone with such false positives.

OK, so checking the existence of files isn't a good way to tackle the
problem.  Jim's got some good ideas for an alternative.

Coming back to the other part of the problem, what exactly happened when
you ran `authors'?  When I tried it yesterday, apart from me not having
the locale en_US.UTF-8, it ran to completion, producing the output and a
list of missing files together with their "authors".  What am I missing?

[ .... ]

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-12  9:41                     ` Alan Mackenzie
@ 2023-04-12 10:14                       ` Eli Zaretskii
  0 siblings, 0 replies; 31+ messages in thread
From: Eli Zaretskii @ 2023-04-12 10:14 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: jporterbugs, emacs-devel

> Date: Wed, 12 Apr 2023 09:41:29 +0000
> Cc: Jim Porter <jporterbugs@gmail.com>, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> Coming back to the other part of the problem, what exactly happened when
> you ran `authors'?  When I tried it yesterday, apart from me not having
> the locale en_US.UTF-8, it ran to completion, producing the output and a
> list of missing files together with their "authors".  What am I missing?

You are missing the changes to the databases of authors.el and to the
ChangeLog.4 file as it was originally generated from Git logs, which I
made shortly prior to the pretest.  Of course "M-x authors" runs
cleanly now: that was why I needed to do all that work back then.

AFAIR, most of your mistakes were corrected in ChangeLog.4 itself,
before it was committed for the first time, so there's no easy way of
seeing what I fixed there.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-12  0:20                     ` Jim Porter
@ 2023-04-13  6:18                       ` Jim Porter
  2023-04-13  6:49                         ` Eli Zaretskii
  0 siblings, 1 reply; 31+ messages in thread
From: Jim Porter @ 2023-04-13  6:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1488 bytes --]

On 4/11/2023 5:20 PM, Jim Porter wrote:
> If we did something like this, I think both post-commit and pre-push 
> hooks would be useful. The former gives committers immediate feedback on 
> their commit message (if they read the output), while the latter is what 
> ensures that the upstream commit messages are properly-formatted.

Attached is a mostly-working pair of hooks (post-commit and pre-push) to 
check this. There's still a bit more work to do though:

  1. Clean up the output a bit
  2. Make it work on non-gawk awks

Both of those should be pretty easy though. Here's an example of the 
error if you try to push:

--------------------------------------------------

   In commit 89f2dbb773: My commit message
     File lisp/bad.el listed in commit message, but not in diff
   Push aborted; please see the file 'CONTRIBUTE'
   error: failed to push some refs to 
'jporter@git.sv.gnu.org:/srv/git/emacs.git'

--------------------------------------------------

For the post-commit hook, we just pass the current commit SHA to our awk 
script, which checks that SHA to make sure its commit message is 
correct. For the pre-push hook, we need to compare the local ref that 
we're pushing to the remote ref, and then check every commit between the 
two (the Git documentation explains a bit about how the hook works[1]).

Assuming the general strategy looks ok, I'll finish this up and add more 
documentation for what's going on.

[1] https://git-scm.com/docs/githooks#_pre_push

[-- Attachment #2: 0001-WIP-Add-Git-hooks-to-check-filenames-listed-in-the-c.patch --]
[-- Type: text/plain, Size: 4068 bytes --]

From ac0e2bbcc99b07ea5185041f5a066c36cc1e42ac Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Wed, 12 Apr 2023 23:03:31 -0700
Subject: [PATCH] [WIP] Add Git hooks to check filenames listed in the commit
 message

* build-aux/git-hooks/commit-msg-files.awk:
* build-aux/git-hooks/post-commit:
* build-aux/git-hooks/pre-push: New files.
---
 build-aux/git-hooks/commit-msg-files.awk | 39 +++++++++++++++++++
 build-aux/git-hooks/post-commit          | 19 +++++++++
 build-aux/git-hooks/pre-push             | 49 ++++++++++++++++++++++++
 3 files changed, 107 insertions(+)
 create mode 100644 build-aux/git-hooks/commit-msg-files.awk
 create mode 100644 build-aux/git-hooks/post-commit
 create mode 100644 build-aux/git-hooks/pre-push

diff --git a/build-aux/git-hooks/commit-msg-files.awk b/build-aux/git-hooks/commit-msg-files.awk
new file mode 100644
index 00000000000..1dd89dcdea0
--- /dev/null
+++ b/build-aux/git-hooks/commit-msg-files.awk
@@ -0,0 +1,39 @@
+function get_commit_changes(commit_sha, changes,    i, j, len, bits, filename) {
+  # Collect all the files touched in the specified commit.
+  while ((("git log -1 --name-status --format= " commit_sha) | getline) > 0) {
+    for (i = 2; i <= NF; i++) {
+      len = split($i, bits, "/")
+      for (j = 1; j <= len; j++) {
+        if (j == 1)
+          filename = bits[j]
+        else
+          filename = filename "/" bits[j]
+        changes[filename] = 1
+      }
+    }
+  }
+}
+
+function check_commit_msg_files(commit_sha,    changes, good, msg, filename) {
+  get_commit_changes(commit_sha, changes)
+  good = 1
+
+  while ((("git log -1 --format=%B " commit_sha) | getline) > 0) {
+    if (! msg)
+      msg = $0
+
+    if (/^\* / && match($2, "[^:/][^:]*")) {
+      filename = substr($2, RSTART, RLENGTH)
+      if (! (filename in changes)) {
+        if (good) {
+          printf("In commit %s: %s\n", substr(commit_sha, 1, 10), msg)
+        }
+        printf("  File %s listed in commit message, but not in diff\n",
+               filename)
+        good = 0
+      }
+    }
+  }
+
+  return good
+}
diff --git a/build-aux/git-hooks/post-commit b/build-aux/git-hooks/post-commit
new file mode 100644
index 00000000000..35c902d8270
--- /dev/null
+++ b/build-aux/git-hooks/post-commit
@@ -0,0 +1,19 @@
+#!/bin/sh
+
+git rev-parse HEAD | awk '
+  # FIXME: Do this a POSIX-compatible way.
+  @include ".git/hooks/commit-msg-files.awk"
+
+  /^[a-z0-9]{40}$/ {
+    if (! check_commit_msg_files($0)) {
+      status = 1
+    }
+  }
+
+  END {
+    if (status != 0) {
+      print "Bad commit message; please see the file '\''CONTRIBUTE'\''"
+    }
+    exit status
+  }
+'
diff --git a/build-aux/git-hooks/pre-push b/build-aux/git-hooks/pre-push
new file mode 100644
index 00000000000..3329bf59e1d
--- /dev/null
+++ b/build-aux/git-hooks/pre-push
@@ -0,0 +1,49 @@
+#!/bin/sh
+
+awk -v origin_name="$1" '
+  @include ".git/hooks/commit-msg-files.awk"
+
+  # If the local SHA is all zeroes, ignore it.
+  $2 ~ /^0{40}$/ {
+    next
+  }
+
+  $2 ~ /^[a-z0-9]{40}$/ {
+    newref = $2
+
+    # If the remote SHA is all zeroes, go backwards until we find a SHA on
+    # an origin branch.
+    if ($4 ~ /^0{40}$/) {
+      back = 0
+      while ((("git branch -r -l '\''" origin_name "/*'\'' --contains " \
+               newref "~" back) | getline) == 0) {
+        back++
+      }
+
+      ("git rev-parse " newref "~" back) | getline oldref
+      if (!(oldref ~ /^[a-z0-9]{40}$/)) {
+        # The SHA is misformatted?!
+        exit 2
+      }
+    } else if ($4 ~ /^[a-z0-9]{40}$/)  {
+      oldref = $4
+    } else {
+      # The remote SHA is misformatted?!
+      exit 2
+    }
+
+    # Iterate over every SHA after oldref, up to (and including) newref.
+    while ((("git rev-list --reverse " oldref ".." newref) | getline) > 0) {
+      if (! check_commit_msg_files($0)) {
+        status = 1
+      }
+    }
+  }
+
+  END {
+    if (status != 0) {
+      print "Push aborted; please see the file '\''CONTRIBUTE'\''"
+    }
+    exit status
+  }
+'
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-13  6:18                       ` Jim Porter
@ 2023-04-13  6:49                         ` Eli Zaretskii
  2023-04-13  7:47                           ` Robert Pluim
  2023-04-15  3:41                           ` Jim Porter
  0 siblings, 2 replies; 31+ messages in thread
From: Eli Zaretskii @ 2023-04-13  6:49 UTC (permalink / raw)
  To: Jim Porter; +Cc: acm, emacs-devel

> Date: Wed, 12 Apr 2023 23:18:37 -0700
> From: Jim Porter <jporterbugs@gmail.com>
> Cc: acm@muc.de, emacs-devel@gnu.org
> 
> On 4/11/2023 5:20 PM, Jim Porter wrote:
> > If we did something like this, I think both post-commit and pre-push 
> > hooks would be useful. The former gives committers immediate feedback on 
> > their commit message (if they read the output), while the latter is what 
> > ensures that the upstream commit messages are properly-formatted.
> 
> Attached is a mostly-working pair of hooks (post-commit and pre-push) to 
> check this. There's still a bit more work to do though:
> 
>   1. Clean up the output a bit
>   2. Make it work on non-gawk awks

Please be sure to test the part that finds the file names on all the
log messages in the repository.  The things people do in logs will
sometimes surprise you.  In particular, a '*' after leading whitespace
doesn't necessarily flag a file name, see, for example, the following
commits:

  92d75e5c53241ac76e8fdcb6fc66ade68354687c
  0bd96806ef1a0f0d2d3f48cdb1204b7e393ab036
  eff42dc0af741cc56c52d7d9577d29fc16f9f665
  b5f70c239e87e5f38fd70181ef75cd28a43a8b41

Also, it looks like your script doesn't recognize file names in a line
that starts with a semi-colon, as in this commit:

  commit 74ddfe811f980122816ba831bea18ca18afedb85
  Author:     Eli Zaretskii <eliz@gnu.org>
  AuthorDate: Sat Apr 8 19:14:15 2023 +0300
  Commit:     Eli Zaretskii <eliz@gnu.org>
  CommitDate: Sat Apr 8 19:14:15 2023 +0300

      ; * doc/misc/calc.texi (Rewrites Tutorial): Fix a typo (bug#62658).

>    In commit 89f2dbb773: My commit message
>      File lisp/bad.el listed in commit message, but not in diff

What about the opposite: a file is mentioned in the diff, but not in
the commit message?  Or maybe that is allowed, and we shouldn't block
it?

Thanks.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-13  6:49                         ` Eli Zaretskii
@ 2023-04-13  7:47                           ` Robert Pluim
  2023-04-15  3:41                           ` Jim Porter
  1 sibling, 0 replies; 31+ messages in thread
From: Robert Pluim @ 2023-04-13  7:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Jim Porter, acm, emacs-devel

>>>>> On Thu, 13 Apr 2023 09:49:23 +0300, Eli Zaretskii <eliz@gnu.org> said:

    Eli> What about the opposite: a file is mentioned in the diff, but not in
    Eli> the commit message?  Or maybe that is allowed, and we shouldn't block
    Eli> it?

From CONTRIBUTE:

    - There is no need to mention files such as NEWS and MAINTAINERS, or
      to indicate regeneration of files such as 'lib/gnulib.mk', in the
      ChangeLog entry.  "There is no need" means you don't have to, but
      you can if you want to.

Iʼll admit I tend to the prescriptive side of things, and think all
changed files should be mentioned in the commit message.

Robert
-- 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-13  6:49                         ` Eli Zaretskii
  2023-04-13  7:47                           ` Robert Pluim
@ 2023-04-15  3:41                           ` Jim Porter
  2023-04-15  5:45                             ` Jim Porter
  2023-04-15  7:08                             ` Eli Zaretskii
  1 sibling, 2 replies; 31+ messages in thread
From: Jim Porter @ 2023-04-15  3:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2858 bytes --]

Ok, I think this patch should work, though I'll continue testing it 
locally before I merge it. We could also merge this patch sooner if we 
temporarily made the pre-push hook advisory (i.e. doesn't block a push). 
That way, others can try these hooks out without it blocking anyone's 
work. Then when we're happy with it, we can make the pre-push hook error 
out on bad commit messages.

On 4/12/2023 11:49 PM, Eli Zaretskii wrote:
> Please be sure to test the part that finds the file names on all the
> log messages in the repository.  The things people do in logs will
> sometimes surprise you.  In particular, a '*' after leading whitespace
> doesn't necessarily flag a file name, see, for example, the following
> commits:
> 
>    92d75e5c53241ac76e8fdcb6fc66ade68354687c

This works without errors with my latest patch. (Though it's not smart 
enough to recognize the "src/comp.c" as a file to check, since there's 
no leading "* ".) I think that's probably ok; it'd be hard to detect 
cases like that reliably.

>    0bd96806ef1a0f0d2d3f48cdb1204b7e393ab036

This fails, correctly I think. The first line of the commit message is 
"* Rename `comp--typeof-builtin-types'", which I don't think we should 
allow (going forward, at least). Since it's in the first line, we 
*could* treat that specially, but I'm not sure it's worth the 
complexity. Committers can just avoid starting lines with "*" (for 
example, by using " *" instead).

>    eff42dc0af741cc56c52d7d9577d29fc16f9f665

This also works without errors. An indented "*" is ok, and not checked 
by these hooks.

>    b5f70c239e87e5f38fd70181ef75cd28a43a8b41

This fails for two reasons: first, there's a line like this: "* 
buffer-match-p and match-buffers", which is recognized as a list of file 
names. That looks like auto-fill-mode (or something similar) adding a "* 
" where it shouldn't. That's happened to me before, so I'd be glad for 
the hook to catch this.

It also fails because of this line: "* lisp/window.el 
(display-buffer-assq-regexp): Mention what happens". That's correct too, 
since "lisp/window.el" wasn't changed in this commit.

If you apply my patch, you can also test out other commits via "echo 
COMMIT-SHA | awk -f build-aux/git-hooks/commit-msg-files.awk". (I'll do 
this locally as well.)

> Also, it looks like your script doesn't recognize file names in a line
> that starts with a semi-colon, as in this commit:

I fixed this case, though as far as I can tell, authors.el doesn't look 
at lines like this. (I could be wrong, since I just read over that code 
briefly.)

> What about the opposite: a file is mentioned in the diff, but not in
> the commit message?  Or maybe that is allowed, and we shouldn't block
> it?

I think that's ok. Robert Pluim mentions a couple of cases, and we 
probably also want to allow commit messages like: "; Fix last change".

[-- Attachment #2: 0001-Add-Git-hooks-to-check-filenames-listed-in-the-commi.patch --]
[-- Type: text/plain, Size: 8210 bytes --]

From f849fa082f0d7c8c3d472120e91e155b8700c65c Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Wed, 12 Apr 2023 23:03:31 -0700
Subject: [PATCH] Add Git hooks to check filenames listed in the commit message

* build-aux/git-hooks/commit-msg-files.awk:
* build-aux/git-hooks/post-commit:
* build-aux/git-hooks/pre-push: New files...
* autogen.sh: ... add them.
---
 autogen.sh                               |  2 +-
 build-aux/git-hooks/commit-msg-files.awk | 91 ++++++++++++++++++++++++
 build-aux/git-hooks/post-commit          | 29 ++++++++
 build-aux/git-hooks/pre-push             | 66 +++++++++++++++++
 4 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 build-aux/git-hooks/commit-msg-files.awk
 create mode 100755 build-aux/git-hooks/post-commit
 create mode 100755 build-aux/git-hooks/pre-push

diff --git a/autogen.sh b/autogen.sh
index af4c2ad14df..71d7ac89abf 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -340,7 +340,7 @@ hooks=
 tailored_hooks=
 sample_hooks=
 
-for hook in commit-msg pre-commit prepare-commit-msg; do
+for hook in commit-msg pre-commit prepare-commit-msg post-commit pre-push commit-msg-files.awk; do
     cmp -- build-aux/git-hooks/$hook "$hooks/$hook" >/dev/null 2>&1 ||
 	tailored_hooks="$tailored_hooks $hook"
 done
diff --git a/build-aux/git-hooks/commit-msg-files.awk b/build-aux/git-hooks/commit-msg-files.awk
new file mode 100644
index 00000000000..c1edccdf3e5
--- /dev/null
+++ b/build-aux/git-hooks/commit-msg-files.awk
@@ -0,0 +1,91 @@
+# Check the file list of GNU Emacs change log entries for each commit SHA.
+
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This file is part of GNU Emacs.
+
+# GNU Emacs is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU Emacs is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+function get_commit_changes(commit_sha, changes,    i, j, len, bits, filename) {
+  # Collect all the files touched in the specified commit.
+  while ((("git log -1 --name-status --format= " commit_sha) | getline) > 0) {
+    for (i = 2; i <= NF; i++) {
+      len = split($i, bits, "/")
+      for (j = 1; j <= len; j++) {
+        if (j == 1)
+          filename = bits[j]
+        else
+          filename = filename "/" bits[j]
+        changes[filename] = 1
+      }
+    }
+  }
+}
+
+function check_commit_msg_files(commit_sha, verbose,    changes, good, msg, \
+                                filenames_str, filenames, i) {
+  get_commit_changes(commit_sha, changes)
+  good = 1
+
+  while ((("git log -1 --format=%B " commit_sha) | getline) > 0) {
+    if (verbose && ! msg)
+      msg = $0
+
+    # Find lines that reference files.  We look at any line starting
+    # with "*" (possibly prefixed by "; ") where the file part starts
+    # with an alphanumeric character.  The file part ends if we
+    # encounter any of the following characters: [ ( < { :
+    if (/^(; )?\*[ \t]+[[:alnum:]]/ && match($0, "[[:alnum:]][^[(<{:]*")) {
+      # There might be multiple files listed on this line, separated
+      # by a comma and/or space.  Iterate over each of them.
+      split(substr($0, RSTART, RLENGTH), filenames, "[[:blank:],][[:blank:]]*")
+      for (i in filenames) {
+        if (length(filenames[i]) && ! (filenames[i] in changes)) {
+          if (good) {
+            # Print a header describing the error.
+            if (verbose)
+              printf("In commit %s \"%s\"...\n", substr(commit_sha, 1, 10), msg)
+            printf("Files listed in commit message, but not in diff:\n")
+          }
+          printf("  %s\n", filenames[i])
+          good = 0
+        }
+      }
+    }
+  }
+
+  return good
+}
+
+BEGIN {
+  if (reason == "pre-push")
+    verbose = 1
+}
+
+/^[a-z0-9]{40}$/ {
+  if (! check_commit_msg_files($0, verbose)) {
+    status = 1
+  }
+}
+
+END {
+  if (status != 0) {
+    if (reason == "pre-push")
+      error_msg = "Push aborted"
+    else
+      error_msg = "Bad commit message"
+    printf("%s; please see the file 'CONTRIBUTE'\n", error_msg)
+  }
+  exit status
+}
diff --git a/build-aux/git-hooks/post-commit b/build-aux/git-hooks/post-commit
new file mode 100755
index 00000000000..4c30ec76e02
--- /dev/null
+++ b/build-aux/git-hooks/post-commit
@@ -0,0 +1,29 @@
+#!/bin/sh
+# Check the file list of GNU Emacs change log entries after committing.
+
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This file is part of GNU Emacs.
+
+# GNU Emacs is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU Emacs is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+# Prefer gawk if available, as it handles NUL bytes properly.
+if type gawk >/dev/null 2>&1; then
+  awk="gawk"
+else
+  awk="awk"
+fi
+
+git rev-parse HEAD | $awk -v reason=post-commit \
+                          -f .git/hooks/commit-msg-files.awk
diff --git a/build-aux/git-hooks/pre-push b/build-aux/git-hooks/pre-push
new file mode 100755
index 00000000000..136f84e6691
--- /dev/null
+++ b/build-aux/git-hooks/pre-push
@@ -0,0 +1,66 @@
+#!/bin/sh
+# Check the file list of GNU Emacs change log entries before pushing.
+
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This file is part of GNU Emacs.
+
+# GNU Emacs is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU Emacs is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+# Prefer gawk if available, as it handles NUL bytes properly.
+if type gawk >/dev/null 2>&1; then
+  awk="gawk"
+else
+  awk="awk"
+fi
+
+# Standard input receives lines of the form:
+#   <local ref> SP <local name> SP <remote ref> SP <remote name> LF
+$awk -v origin_name="$1" '
+  # If the local SHA is all zeroes, ignore it.
+  $2 ~ /^0{40}$/ {
+    next
+  }
+
+  $2 ~ /^[a-z0-9]{40}$/ {
+    newref = $2
+    # If the remote SHA is all zeroes, this is a new object to be
+    # pushed (likely a branch).  Go backwards until we find a SHA on
+    # an origin branch.
+    if ($4 ~ /^0{40}$/) {
+      back = 0
+      while ((("git branch -r -l '\''" origin_name "/*'\'' --contains " \
+               newref "~" back) | getline) == 0) {
+
+        # Only look back at most 1000 commits, just in case...
+        if (back++ > 1000)
+          break;
+      }
+
+      ("git rev-parse " newref "~" back) | getline oldref
+      if (!(oldref ~ /^[a-z0-9]{40}$/)) {
+        # The SHA is misformatted!  Skip this line.
+        next
+      }
+    } else if ($4 ~ /^[a-z0-9]{40}$/)  {
+      oldref = $4
+    } else {
+      # The SHA is misformatted!  Skip this line.
+      next
+    }
+
+    # Print every SHA after oldref, up to (and including) newref.
+    system("git rev-list --reverse " oldref ".." newref)
+  }
+' | $awk -v reason=pre-push -f .git/hooks/commit-msg-files.awk
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-15  3:41                           ` Jim Porter
@ 2023-04-15  5:45                             ` Jim Porter
  2023-04-15  7:15                               ` Eli Zaretskii
  2023-04-15 20:54                               ` Jim Porter
  2023-04-15  7:08                             ` Eli Zaretskii
  1 sibling, 2 replies; 31+ messages in thread
From: Jim Porter @ 2023-04-15  5:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1870 bytes --]

On 4/14/2023 8:41 PM, Jim Porter wrote:
> Ok, I think this patch should work, though I'll continue testing it 
> locally before I merge it. We could also merge this patch sooner if we 
> temporarily made the pre-push hook advisory (i.e. doesn't block a push). 
> That way, others can try these hooks out without it blocking anyone's 
> work. Then when we're happy with it, we can make the pre-push hook error 
> out on bad commit messages.

Here are a couple small fixes after testing against the 2000 latest 
commits (though I haven't examined *all* the errors in detail). I fixed 
the hooks so that they actually close the pipes we use, and also made a 
regexp a bit more strict. Previously, it split file names by a comma 
and/or a space, but there are a few files with commas in their names, 
e.g. "test/lisp/net/eudc-resources/dc=gnu,dc=org.ldif". Now it handles 
those properly.

Testing this, there are a bunch of commits where the first line is of 
the form "* Do something". Maybe the hooks should ignore cases like 
that, or we could just disallow that going forward. I don't have a 
strong opinion.

There's also one more commit I'm not quite sure what to do about: 
0a95a81d8d36722ccf030a6194ecd953fc257a59. It has this in the commit message:

     This fixes bug #61144.  If the space around the * is "symmetric" we 
leave foo
     * bar unfontified, a multiplication operation.  If it is 
"asymmetric" we
     fontify it as a pointer declaration.

The second line of this excerpt is treated like a file entry. Maybe our 
hook could allow that if it were clever enough, or maybe this is a rare 
enough occurrence that we should just have committers reformat the 
message slightly.

Almost all the other errors are due to commits from packages that were 
imported into the Emacs repo and their files moved around during the 
import (e.g. Eglot, use-package).

[-- Attachment #2: 0001-Add-Git-hooks-to-check-filenames-listed-in-the-commi.patch --]
[-- Type: text/plain, Size: 8386 bytes --]

From 6e2a73dfba7c0b1a7b4000f9305b666911f4a171 Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Wed, 12 Apr 2023 23:03:31 -0700
Subject: [PATCH] Add Git hooks to check filenames listed in the commit message

* build-aux/git-hooks/commit-msg-files.awk:
* build-aux/git-hooks/post-commit:
* build-aux/git-hooks/pre-push: New files...
* autogen.sh: ... add them.
---
 autogen.sh                               |  3 +-
 build-aux/git-hooks/commit-msg-files.awk | 96 ++++++++++++++++++++++++
 build-aux/git-hooks/post-commit          | 29 +++++++
 build-aux/git-hooks/pre-push             | 70 +++++++++++++++++
 4 files changed, 197 insertions(+), 1 deletion(-)
 create mode 100644 build-aux/git-hooks/commit-msg-files.awk
 create mode 100755 build-aux/git-hooks/post-commit
 create mode 100755 build-aux/git-hooks/pre-push

diff --git a/autogen.sh b/autogen.sh
index af4c2ad14df..6127e7b24f4 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -340,7 +340,8 @@ hooks=
 tailored_hooks=
 sample_hooks=
 
-for hook in commit-msg pre-commit prepare-commit-msg; do
+for hook in commit-msg pre-commit prepare-commit-msg post-commit \
+            pre-push commit-msg-files.awk; do
     cmp -- build-aux/git-hooks/$hook "$hooks/$hook" >/dev/null 2>&1 ||
 	tailored_hooks="$tailored_hooks $hook"
 done
diff --git a/build-aux/git-hooks/commit-msg-files.awk b/build-aux/git-hooks/commit-msg-files.awk
new file mode 100644
index 00000000000..da066b83bdd
--- /dev/null
+++ b/build-aux/git-hooks/commit-msg-files.awk
@@ -0,0 +1,96 @@
+# Check the file list of GNU Emacs change log entries for each commit SHA.
+
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This file is part of GNU Emacs.
+
+# GNU Emacs is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU Emacs is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+function get_commit_changes(commit_sha, changes,    cmd, i, j, len, \
+                            bits, filename) {
+  # Collect all the files touched in the specified commit.
+  cmd = ("git log -1 --name-status --format= " commit_sha)
+  while ((cmd | getline) > 0) {
+    for (i = 2; i <= NF; i++) {
+      len = split($i, bits, "/")
+      for (j = 1; j <= len; j++) {
+        if (j == 1)
+          filename = bits[j]
+        else
+          filename = filename "/" bits[j]
+        changes[filename] = 1
+      }
+    }
+  }
+  close(cmd)
+}
+
+function check_commit_msg_files(commit_sha, verbose,    changes, good, \
+                                cmd, msg, filenames_str, filenames, i) {
+  get_commit_changes(commit_sha, changes)
+  good = 1
+
+  cmd = ("git log -1 --format=%B " commit_sha)
+  while ((cmd | getline) > 0) {
+    if (verbose && ! msg)
+      msg = $0
+
+    # Find lines that reference files.  We look at any line starting
+    # with "*" (possibly prefixed by "; ") where the file part starts
+    # with an alphanumeric character.  The file part ends if we
+    # encounter any of the following characters: [ ( < { :
+    if (/^(; )?\*[ \t]+[[:alnum:]]/ && match($0, "[[:alnum:]][^[(<{:]*")) {
+      # There might be multiple files listed on this line, separated
+      # by spaces (and possibly a comma).  Iterate over each of them.
+      split(substr($0, RSTART, RLENGTH), filenames, ",?[[:blank:]]+")
+      for (i in filenames) {
+        if (length(filenames[i]) && ! (filenames[i] in changes)) {
+          if (good) {
+            # Print a header describing the error.
+            if (verbose)
+              printf("In commit %s \"%s\"...\n", substr(commit_sha, 1, 10), msg)
+            printf("Files listed in commit message, but not in diff:\n")
+          }
+          printf("  %s\n", filenames[i])
+          good = 0
+        }
+      }
+    }
+  }
+  close(cmd)
+
+  return good
+}
+
+BEGIN {
+  if (reason == "pre-push")
+    verbose = 1
+}
+
+/^[a-z0-9]{40}$/ {
+  if (! check_commit_msg_files($0, verbose)) {
+    status = 1
+  }
+}
+
+END {
+  if (status != 0) {
+    if (reason == "pre-push")
+      error_msg = "Push aborted"
+    else
+      error_msg = "Bad commit message"
+    printf("%s; please see the file 'CONTRIBUTE'\n", error_msg)
+  }
+  exit status
+}
diff --git a/build-aux/git-hooks/post-commit b/build-aux/git-hooks/post-commit
new file mode 100755
index 00000000000..4c30ec76e02
--- /dev/null
+++ b/build-aux/git-hooks/post-commit
@@ -0,0 +1,29 @@
+#!/bin/sh
+# Check the file list of GNU Emacs change log entries after committing.
+
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This file is part of GNU Emacs.
+
+# GNU Emacs is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU Emacs is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+# Prefer gawk if available, as it handles NUL bytes properly.
+if type gawk >/dev/null 2>&1; then
+  awk="gawk"
+else
+  awk="awk"
+fi
+
+git rev-parse HEAD | $awk -v reason=post-commit \
+                          -f .git/hooks/commit-msg-files.awk
diff --git a/build-aux/git-hooks/pre-push b/build-aux/git-hooks/pre-push
new file mode 100755
index 00000000000..b0185a97b28
--- /dev/null
+++ b/build-aux/git-hooks/pre-push
@@ -0,0 +1,70 @@
+#!/bin/sh
+# Check the file list of GNU Emacs change log entries before pushing.
+
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This file is part of GNU Emacs.
+
+# GNU Emacs is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU Emacs is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+# Prefer gawk if available, as it handles NUL bytes properly.
+if type gawk >/dev/null 2>&1; then
+  awk="gawk"
+else
+  awk="awk"
+fi
+
+# Standard input receives lines of the form:
+#   <local ref> SP <local name> SP <remote ref> SP <remote name> LF
+$awk -v origin_name="$1" '
+  # If the local SHA is all zeroes, ignore it.
+  $2 ~ /^0{40}$/ {
+    next
+  }
+
+  $2 ~ /^[a-z0-9]{40}$/ {
+    newref = $2
+    # If the remote SHA is all zeroes, this is a new object to be
+    # pushed (likely a branch).  Go backwards until we find a SHA on
+    # an origin branch.
+    if ($4 ~ /^0{40}$/) {
+      back = 0
+      cmd = ("git branch -r -l '\''" origin_name "/*'\'' --contains " \
+             newref "~" back)
+      while ((cmd | getline) == 0) {
+
+        # Only look back at most 1000 commits, just in case...
+        if (back++ > 1000)
+          break;
+      }
+      close(cmd)
+
+      cmd = ("git rev-parse " newref "~" back)
+      cmd | getline oldref
+      if (!(oldref ~ /^[a-z0-9]{40}$/)) {
+        # The SHA is misformatted!  Skip this line.
+        next
+      }
+      close(cmd)
+    } else if ($4 ~ /^[a-z0-9]{40}$/)  {
+      oldref = $4
+    } else {
+      # The SHA is misformatted!  Skip this line.
+      next
+    }
+
+    # Print every SHA after oldref, up to (and including) newref.
+    system("git rev-list --reverse " oldref ".." newref)
+  }
+' | $awk -v reason=pre-push -f .git/hooks/commit-msg-files.awk
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-15  3:41                           ` Jim Porter
  2023-04-15  5:45                             ` Jim Porter
@ 2023-04-15  7:08                             ` Eli Zaretskii
  1 sibling, 0 replies; 31+ messages in thread
From: Eli Zaretskii @ 2023-04-15  7:08 UTC (permalink / raw)
  To: Jim Porter; +Cc: acm, emacs-devel

> Date: Fri, 14 Apr 2023 20:41:49 -0700
> Cc: acm@muc.de, emacs-devel@gnu.org
> From: Jim Porter <jporterbugs@gmail.com>
> 
> > Also, it looks like your script doesn't recognize file names in a line
> > that starts with a semi-colon, as in this commit:
> 
> I fixed this case, though as far as I can tell, authors.el doesn't look 
> at lines like this. (I could be wrong, since I just read over that code 
> briefly.)

authors.el ignores such lines because they don't appear in ChangeLog
files: they are ignored when generating ChangeLog from the Git log.
(authors.el works on ChangeLog* files, not on the Git logs.)  But that
doesn't mean we don't want the file names in those logs to be correct;
this script we are discussing is not just for the benefit of producing
AUTHORS with less manual work, it is about our commit logs being
cleaner.

Thanks.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-15  5:45                             ` Jim Porter
@ 2023-04-15  7:15                               ` Eli Zaretskii
  2023-04-15 10:44                                 ` Alan Mackenzie
  2023-04-15 20:54                               ` Jim Porter
  1 sibling, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2023-04-15  7:15 UTC (permalink / raw)
  To: Jim Porter; +Cc: acm, emacs-devel

> Date: Fri, 14 Apr 2023 22:45:44 -0700
> From: Jim Porter <jporterbugs@gmail.com>
> Cc: acm@muc.de, emacs-devel@gnu.org
> 
> There's also one more commit I'm not quite sure what to do about: 
> 0a95a81d8d36722ccf030a6194ecd953fc257a59. It has this in the commit message:
> 
>      This fixes bug #61144.  If the space around the * is "symmetric" we 
> leave foo
>      * bar unfontified, a multiplication operation.  If it is 
> "asymmetric" we
>      fontify it as a pointer declaration.
> 
> The second line of this excerpt is treated like a file entry. Maybe our 
> hook could allow that if it were clever enough, or maybe this is a rare 
> enough occurrence that we should just have committers reformat the 
> message slightly.

The latter, IMO.  There's absolutely no reason for a log message to
have a '*' as the first non-whitespace character of a line.  Fixing
that is easy.

Btw, Alan, this is a good example of your log messages, which are
filled using a much larger value of fill-column that the default.  The
resulting ChangeLog entries appear ugly and stand out from the rest.
Would you please adjust your customizations to use a smaller
fill-column value, like at most 76, when writing commit log messages?



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-15  7:15                               ` Eli Zaretskii
@ 2023-04-15 10:44                                 ` Alan Mackenzie
  2023-04-15 11:00                                   ` Eli Zaretskii
  2023-04-21 22:16                                   ` Filipp Gunbin
  0 siblings, 2 replies; 31+ messages in thread
From: Alan Mackenzie @ 2023-04-15 10:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Jim Porter, emacs-devel

Hello, Eli.

On Sat, Apr 15, 2023 at 10:15:19 +0300, Eli Zaretskii wrote:
> > Date: Fri, 14 Apr 2023 22:45:44 -0700
> > From: Jim Porter <jporterbugs@gmail.com>
> > Cc: acm@muc.de, emacs-devel@gnu.org

> > There's also one more commit I'm not quite sure what to do about: 
> > 0a95a81d8d36722ccf030a6194ecd953fc257a59. It has this in the commit message:

> >      This fixes bug #61144.  If the space around the * is "symmetric" we 
> > leave foo
> >      * bar unfontified, a multiplication operation.  If it is 
> > "asymmetric" we
> >      fontify it as a pointer declaration.

> > The second line of this excerpt is treated like a file entry. Maybe our 
> > hook could allow that if it were clever enough, or maybe this is a rare 
> > enough occurrence that we should just have committers reformat the 
> > message slightly.

> The latter, IMO.  There's absolutely no reason for a log message to
> have a '*' as the first non-whitespace character of a line.  Fixing
> that is easy.

It can happen easily as a result of filling.  I think it would be good if
one of Jim's scripts flagged it up as an error.  What is less easy than
it might be, is correcting an erroneous commit message and submitting the
commit again using this correction.  The erroneous message remains in
..../.git/COMMIT_EDITMSG but I'm not sure how to get it back into one's
editor at the next commit attempt.  Maybe the error message could give
some help, here.

> Btw, Alan, this is a good example of your log messages, which are
> filled using a much larger value of fill-column that the default.  The
> resulting ChangeLog entries appear ugly and stand out from the rest.
> Would you please adjust your customizations to use a smaller
> fill-column value, like at most 76, when writing commit log messages?

I've been using a fill-column of 78, to match the maximum allowed width.
I'll try cutting that down to 63, as recommended in CONTRIBUTE.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-15 10:44                                 ` Alan Mackenzie
@ 2023-04-15 11:00                                   ` Eli Zaretskii
  2023-04-21 22:16                                   ` Filipp Gunbin
  1 sibling, 0 replies; 31+ messages in thread
From: Eli Zaretskii @ 2023-04-15 11:00 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: jporterbugs, emacs-devel

> Date: Sat, 15 Apr 2023 10:44:42 +0000
> Cc: Jim Porter <jporterbugs@gmail.com>, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > Btw, Alan, this is a good example of your log messages, which are
> > filled using a much larger value of fill-column that the default.  The
> > resulting ChangeLog entries appear ugly and stand out from the rest.
> > Would you please adjust your customizations to use a smaller
> > fill-column value, like at most 76, when writing commit log messages?
> 
> I've been using a fill-column of 78, to match the maximum allowed width.

When we produce ChangeLog from the Git logs, the messages are indented
on tab stop, so 78 is way too large.  Even just "git log" will show
you that 78 is too large, since Git indents the log messages by 4
spaces.

> I'll try cutting that down to 63, as recommended in CONTRIBUTE.

Thank you.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-15  5:45                             ` Jim Porter
  2023-04-15  7:15                               ` Eli Zaretskii
@ 2023-04-15 20:54                               ` Jim Porter
  2023-04-15 21:23                                 ` Jim Porter
  2023-04-21  4:59                                 ` Jim Porter
  1 sibling, 2 replies; 31+ messages in thread
From: Jim Porter @ 2023-04-15 20:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3743 bytes --]

On 4/14/2023 10:45 PM, Jim Porter wrote:
> Here are a couple small fixes after testing against the 2000 latest 
> commits (though I haven't examined *all* the errors in detail).

Ok, I've now gone over the 5000 latest commits in some closer detail. 
Here are the main classes of error I see:

1. Commit message lines starting with "*" that aren't file entries.

2. Commit messages from Eglot's import (these aren't *really* a problem, 
and I think authors.el is smart enough to handle them; if something like 
this happened in future, we could just use "git push --no-verify" to 
ignore this hook).

3. Obviously-wrong typos of file names (including files listed in the 
commit message that exist, but aren't in the diff).

I think erroring out on all three of these cases makes sense. Number 2 
isn't truly an error, but it would be hard to detect that reliably in 
the hook (at least not without adding a list of exceptions), and it's 
rare enough that we can just use "--no-verify" as needed (or fix up the 
commit messages during the import).

There are a few cases that are more interesting though:

A. Wildcards in filenames
-------------------------

626f2f7441, 623db40dd1, 4b5414abbc, c32212bf96, f4bfe7834a, 23c1ee6989, 
7e185bc9ba, 36474a1e49, 5fb8b20fa3, c45bfd3c4a, fcc7373b28, 8e9835c673, 
952550258d, d1a7d16f8e, dc083ebc4e, 5f74397490, 0f85f2c0e5, 49bad2a0a6, 
5a77517e7d, 24b9515da0, ae3904bb5d, 228d9d615d, 8461cfc8fc, e402887d5d, 
ab7dddea90, 9e4f11a163, 6342264ef7

These cases use wildcards in the file names. I think we should avoid 
doing this, for the same reason the GNU ChangeLog standards say to avoid 
doing it for function names:

"If you mention the names of the modified functions or variables, it’s 
important to name them in full. ...  Subsequent maintainers will often 
search for a function name to find all the change log entries that 
pertain to it; if you abbreviate the name, they won’t find it when they 
search."[1]

Therefore, I didn't do anything about these in the latest revision of 
the hook: they're still errors. For committers that don't want to list 
every file explicitly, they could just add an entry for the directory, 
*without* any wildcards.

B. File entry for a directory, using a trailing "/"
---------------------------------------------------

9b80fe55f9, 28d0654943, d9c94e93b7, 7d0dc31833, 16975078b4, 07235678a4, 
c55f4055dd, 9451ea0a05, f342b7c969

I think this is perfectly ok, so I've fixed the hook to allow this (see 
attached).

C. Line-wrapped file entries
----------------------------

8675f4136c, 0e25a39e69, 7c79eea51d, 335a5fd173, 31f8ae53be

These are cases where file entries extend multiple lines. In the first 
example, a single file is split onto two lines, which I don't think is 
feasible to identify. In the other cases, the entries are formatted like so:

     * path/to/file.txt,
     path/to/other/file.txt: Change something.

I've partially fixed this so that it can properly parse the first line, 
but handling the second line is tricky, so I didn't bother. It would be 
better if committers used this format anyway:

     * path/to/file.txt:
     * path/to/other/file.txt: Change something.

D. Merge commit with file entries (and no diff)
-----------------------------------------------

289b457cac

This is an odd case: it's a merge commit that mentions a bunch of file 
changes, but there's no diff in this commit. I'm not really sure what to 
do about this one, but maybe we shouldn't add file entries for cases 
like this under normal circumstances. If there's an exception to that 
rule, we can always push a commit like this with "--no-verify".



[1] 
https://www.gnu.org/prep/standards/html_node/Style-of-Change-Logs.html#Style-of-Change-Logs

[-- Attachment #2: 0001-Add-Git-hooks-to-check-filenames-listed-in-the-commi.patch --]
[-- Type: text/plain, Size: 8987 bytes --]

From db23241d5edd61acca14899c804eb39e13385879 Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Wed, 12 Apr 2023 23:03:31 -0700
Subject: [PATCH] Add Git hooks to check filenames listed in the commit message

* build-aux/git-hooks/commit-msg-files.awk:
* build-aux/git-hooks/post-commit:
* build-aux/git-hooks/pre-push: New files...
* autogen.sh: ... add them.
---
 autogen.sh                               |   3 +-
 build-aux/git-hooks/commit-msg-files.awk | 113 +++++++++++++++++++++++
 build-aux/git-hooks/post-commit          |  29 ++++++
 build-aux/git-hooks/pre-push             |  70 ++++++++++++++
 4 files changed, 214 insertions(+), 1 deletion(-)
 create mode 100644 build-aux/git-hooks/commit-msg-files.awk
 create mode 100755 build-aux/git-hooks/post-commit
 create mode 100755 build-aux/git-hooks/pre-push

diff --git a/autogen.sh b/autogen.sh
index af4c2ad14df..6127e7b24f4 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -340,7 +340,8 @@ hooks=
 tailored_hooks=
 sample_hooks=
 
-for hook in commit-msg pre-commit prepare-commit-msg; do
+for hook in commit-msg pre-commit prepare-commit-msg post-commit \
+            pre-push commit-msg-files.awk; do
     cmp -- build-aux/git-hooks/$hook "$hooks/$hook" >/dev/null 2>&1 ||
 	tailored_hooks="$tailored_hooks $hook"
 done
diff --git a/build-aux/git-hooks/commit-msg-files.awk b/build-aux/git-hooks/commit-msg-files.awk
new file mode 100644
index 00000000000..4566905657d
--- /dev/null
+++ b/build-aux/git-hooks/commit-msg-files.awk
@@ -0,0 +1,113 @@
+# Check the file list of GNU Emacs change log entries for each commit SHA.
+
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This file is part of GNU Emacs.
+
+# GNU Emacs is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU Emacs is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+### Commentary:
+
+# This script accepts a list of (unabbreviated) Git commit SHAs, and
+# will then iterate over them to check that any files mentioned in the
+# commit message are actually present in the commit's diff.  If not,
+# it will print out the incorrect file names and return 1.
+
+# You can also pass "-v reason=pre-push", which will add more-verbose
+# output, indicate the abbreviated commit SHA and first line of the
+# commit message for any improper commits.
+
+### Code:
+
+function get_commit_changes(commit_sha, changes,    cmd, i, j, len, \
+                            bits, filename) {
+  # Collect all the files touched in the specified commit.
+  cmd = ("git log -1 --name-status --format= " commit_sha)
+  while ((cmd | getline) > 0) {
+    for (i = 2; i <= NF; i++) {
+      len = split($i, bits, "/")
+      for (j = 1; j <= len; j++) {
+        if (j == 1)
+          filename = bits[j]
+        else
+          filename = filename "/" bits[j]
+        changes[filename] = 1
+      }
+    }
+  }
+  close(cmd)
+}
+
+function check_commit_msg_files(commit_sha, verbose,    changes, good, \
+                                cmd, msg, filenames_str, filenames, i) {
+  get_commit_changes(commit_sha, changes)
+  good = 1
+
+  cmd = ("git log -1 --format=%B " commit_sha)
+  while ((cmd | getline) > 0) {
+    if (verbose && ! msg)
+      msg = $0
+
+    # Find lines that reference files.  We look at any line starting
+    # with "*" (possibly prefixed by "; ") where the file part starts
+    # with an alphanumeric character.  The file part ends if we
+    # encounter any of the following characters: [ ( < { :
+    if (/^(; )?\*[ \t]+[[:alnum:]]/ && match($0, /[[:alnum:]][^[(<{:]*/)) {
+      # There might be multiple files listed on this line, separated
+      # by spaces (and possibly a comma).  Iterate over each of them.
+      split(substr($0, RSTART, RLENGTH), filenames, ",?([[:blank:]]+|$)")
+
+      for (i in filenames) {
+        # Remove trailing slashes from any directory entries.
+        sub(/\/$/, "", filenames[i])
+
+        if (length(filenames[i]) && ! (filenames[i] in changes)) {
+          if (good) {
+            # Print a header describing the error.
+            if (verbose)
+              printf("In commit %s \"%s\"...\n", substr(commit_sha, 1, 10), msg)
+            printf("Files listed in commit message, but not in diff:\n")
+          }
+          printf("  %s\n", filenames[i])
+          good = 0
+        }
+      }
+    }
+  }
+  close(cmd)
+
+  return good
+}
+
+BEGIN {
+  if (reason == "pre-push")
+    verbose = 1
+}
+
+/^[a-z0-9]{40}$/ {
+  if (! check_commit_msg_files($0, verbose)) {
+    status = 1
+  }
+}
+
+END {
+  if (status != 0) {
+    if (reason == "pre-push")
+      error_msg = "Push aborted"
+    else
+      error_msg = "Bad commit message"
+    printf("%s; please see the file 'CONTRIBUTE'\n", error_msg)
+  }
+  exit status
+}
diff --git a/build-aux/git-hooks/post-commit b/build-aux/git-hooks/post-commit
new file mode 100755
index 00000000000..4c30ec76e02
--- /dev/null
+++ b/build-aux/git-hooks/post-commit
@@ -0,0 +1,29 @@
+#!/bin/sh
+# Check the file list of GNU Emacs change log entries after committing.
+
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This file is part of GNU Emacs.
+
+# GNU Emacs is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU Emacs is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+# Prefer gawk if available, as it handles NUL bytes properly.
+if type gawk >/dev/null 2>&1; then
+  awk="gawk"
+else
+  awk="awk"
+fi
+
+git rev-parse HEAD | $awk -v reason=post-commit \
+                          -f .git/hooks/commit-msg-files.awk
diff --git a/build-aux/git-hooks/pre-push b/build-aux/git-hooks/pre-push
new file mode 100755
index 00000000000..b0185a97b28
--- /dev/null
+++ b/build-aux/git-hooks/pre-push
@@ -0,0 +1,70 @@
+#!/bin/sh
+# Check the file list of GNU Emacs change log entries before pushing.
+
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This file is part of GNU Emacs.
+
+# GNU Emacs is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU Emacs is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+# Prefer gawk if available, as it handles NUL bytes properly.
+if type gawk >/dev/null 2>&1; then
+  awk="gawk"
+else
+  awk="awk"
+fi
+
+# Standard input receives lines of the form:
+#   <local ref> SP <local name> SP <remote ref> SP <remote name> LF
+$awk -v origin_name="$1" '
+  # If the local SHA is all zeroes, ignore it.
+  $2 ~ /^0{40}$/ {
+    next
+  }
+
+  $2 ~ /^[a-z0-9]{40}$/ {
+    newref = $2
+    # If the remote SHA is all zeroes, this is a new object to be
+    # pushed (likely a branch).  Go backwards until we find a SHA on
+    # an origin branch.
+    if ($4 ~ /^0{40}$/) {
+      back = 0
+      cmd = ("git branch -r -l '\''" origin_name "/*'\'' --contains " \
+             newref "~" back)
+      while ((cmd | getline) == 0) {
+
+        # Only look back at most 1000 commits, just in case...
+        if (back++ > 1000)
+          break;
+      }
+      close(cmd)
+
+      cmd = ("git rev-parse " newref "~" back)
+      cmd | getline oldref
+      if (!(oldref ~ /^[a-z0-9]{40}$/)) {
+        # The SHA is misformatted!  Skip this line.
+        next
+      }
+      close(cmd)
+    } else if ($4 ~ /^[a-z0-9]{40}$/)  {
+      oldref = $4
+    } else {
+      # The SHA is misformatted!  Skip this line.
+      next
+    }
+
+    # Print every SHA after oldref, up to (and including) newref.
+    system("git rev-list --reverse " oldref ".." newref)
+  }
+' | $awk -v reason=pre-push -f .git/hooks/commit-msg-files.awk
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-15 20:54                               ` Jim Porter
@ 2023-04-15 21:23                                 ` Jim Porter
  2023-04-16  5:43                                   ` Eli Zaretskii
  2023-04-21  4:59                                 ` Jim Porter
  1 sibling, 1 reply; 31+ messages in thread
From: Jim Porter @ 2023-04-15 21:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, emacs-devel

On 4/15/2023 1:54 PM, Jim Porter wrote:
> On 4/14/2023 10:45 PM, Jim Porter wrote:
>> Here are a couple small fixes after testing against the 2000 latest 
>> commits (though I haven't examined *all* the errors in detail).
> 
> Ok, I've now gone over the 5000 latest commits in some closer detail.

Oh, and just to give an idea of how common incorrect commit messages 
are, my hook says that out of the last 5000 commits, 866 have a problem, 
excluding the Eglot commits.

If I also ignore lines starting with "*" that have no "/" in them (these 
are almost always due to an unnecessary leading "*" on the first line of 
the commit), we get down to 196 bad commit messages, or 3.9% of commits.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-15 21:23                                 ` Jim Porter
@ 2023-04-16  5:43                                   ` Eli Zaretskii
  2023-04-16 20:06                                     ` Jim Porter
  0 siblings, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2023-04-16  5:43 UTC (permalink / raw)
  To: Jim Porter; +Cc: acm, emacs-devel

> Date: Sat, 15 Apr 2023 14:23:02 -0700
> From: Jim Porter <jporterbugs@gmail.com>
> Cc: acm@muc.de, emacs-devel@gnu.org
> 
> Oh, and just to give an idea of how common incorrect commit messages 
> are, my hook says that out of the last 5000 commits, 866 have a problem, 
> excluding the Eglot commits.
> 
> If I also ignore lines starting with "*" that have no "/" in them (these 
> are almost always due to an unnecessary leading "*" on the first line of 
> the commit), we get down to 196 bad commit messages, or 3.9% of commits.

And that's _a_lot_, no matter how you count them.  Which should give
people here some idea why I raised this issue in the first place: I
needed to fix all those cases by hand while working on the pretest
tarball.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-16  5:43                                   ` Eli Zaretskii
@ 2023-04-16 20:06                                     ` Jim Porter
  2023-04-16 20:19                                       ` Michael Albinus
  2023-04-17  2:22                                       ` Eli Zaretskii
  0 siblings, 2 replies; 31+ messages in thread
From: Jim Porter @ 2023-04-16 20:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, emacs-devel

On 4/15/2023 10:43 PM, Eli Zaretskii wrote:
> And that's _a_lot_, no matter how you count them.  Which should give
> people here some idea why I raised this issue in the first place: I
> needed to fix all those cases by hand while working on the pretest
> tarball.

Yeah, it's quite a bit more than I expected.

Overall, I think the hooks are working pretty well now, and all the 
false positives should (hopefully) be resolved. Assuming there are no 
further issues, I'll merge them in a few days. Worst-case scenario, 
committers would just need to pass "--no-verify" to skip the hooks (or 
uninstall them temporarily), and then file a bug about any problems they 
find with the hooks.

Another, more-general thing we could consider is to add a job on EMBA 
that tries to run through the automatable steps of making pretest 
tarballs, just to see if they complete without errors. Then we could set 
that to run, say, once a week, so we could identify problems earlier and 
fix them (e.g. by adding new Git hooks, or just sending a message to 
emacs-devel about it).



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-16 20:06                                     ` Jim Porter
@ 2023-04-16 20:19                                       ` Michael Albinus
  2023-04-17  2:22                                       ` Eli Zaretskii
  1 sibling, 0 replies; 31+ messages in thread
From: Michael Albinus @ 2023-04-16 20:19 UTC (permalink / raw)
  To: Jim Porter; +Cc: Eli Zaretskii, acm, emacs-devel

Jim Porter <jporterbugs@gmail.com> writes:

Hi Jim,

> Another, more-general thing we could consider is to add a job on EMBA
> that tries to run through the automatable steps of making pretest
> tarballs, just to see if they complete without errors. Then we could
> set that to run, say, once a week, so we could identify problems
> earlier and fix them (e.g. by adding new Git hooks, or just sending a
> message to emacs-devel about it).

Patches welcome :-)

Best regards, Michael.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-16 20:06                                     ` Jim Porter
  2023-04-16 20:19                                       ` Michael Albinus
@ 2023-04-17  2:22                                       ` Eli Zaretskii
  2023-04-17  7:28                                         ` Michael Albinus
  1 sibling, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2023-04-17  2:22 UTC (permalink / raw)
  To: Jim Porter; +Cc: acm, emacs-devel

> Date: Sun, 16 Apr 2023 13:06:58 -0700
> Cc: acm@muc.de, emacs-devel@gnu.org
> From: Jim Porter <jporterbugs@gmail.com>
> 
> Another, more-general thing we could consider is to add a job on EMBA 
> that tries to run through the automatable steps of making pretest 
> tarballs, just to see if they complete without errors. Then we could set 
> that to run, say, once a week, so we could identify problems earlier and 
> fix them (e.g. by adding new Git hooks, or just sending a message to 
> emacs-devel about it).

That'd be too much, I think.  We only need to do these jobs once in a
blue moon, so having it bother us once a week would be an unnecessary
annoyance, I think.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-17  2:22                                       ` Eli Zaretskii
@ 2023-04-17  7:28                                         ` Michael Albinus
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Albinus @ 2023-04-17  7:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Jim Porter, acm, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

Hi Eli,

>> Another, more-general thing we could consider is to add a job on EMBA
>> that tries to run through the automatable steps of making pretest
>> tarballs, just to see if they complete without errors. Then we could set
>> that to run, say, once a week, so we could identify problems earlier and
>> fix them (e.g. by adding new Git hooks, or just sending a message to
>> emacs-devel about it).
>
> That'd be too much, I think.  We only need to do these jobs once in a
> blue moon, so having it bother us once a week would be an unnecessary
> annoyance, I think.

It's up to us. We could define another schedule, how often to run. We
could run it only when a given tag (specified by a regexp) has been
applied in git. We could mark to run this job only manually. A
combination of this. Whatever.

Best regards, Michael.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-15 20:54                               ` Jim Porter
  2023-04-15 21:23                                 ` Jim Porter
@ 2023-04-21  4:59                                 ` Jim Porter
  1 sibling, 0 replies; 31+ messages in thread
From: Jim Porter @ 2023-04-21  4:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, emacs-devel

On 4/15/2023 1:54 PM, Jim Porter wrote:
> On 4/14/2023 10:45 PM, Jim Porter wrote:
>> Here are a couple small fixes after testing against the 2000 latest 
>> commits (though I haven't examined *all* the errors in detail).
> 
> Ok, I've now gone over the 5000 latest commits in some closer detail. 

I've now merged this patch as commit 4416262f59f and sent a separate 
announcement to the list so that (hopefully) everyone will see it. I'm 
reasonably sure I've ironed out all the bugs, but just let me know if I 
missed something.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Mistakes in commit log messages
  2023-04-15 10:44                                 ` Alan Mackenzie
  2023-04-15 11:00                                   ` Eli Zaretskii
@ 2023-04-21 22:16                                   ` Filipp Gunbin
  1 sibling, 0 replies; 31+ messages in thread
From: Filipp Gunbin @ 2023-04-21 22:16 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Eli Zaretskii, Jim Porter, emacs-devel

Hi Alan,

On 15/04/2023 10:44 +0000, Alan Mackenzie wrote:

[...]

> What is less easy than it might be, is correcting an erroneous commit
> message and submitting the commit again using this correction.  The
> erroneous message remains in ..../.git/COMMIT_EDITMSG but I'm not sure
> how to get it back into one's editor at the next commit attempt.
> Maybe the error message could give some help, here.

[...]

*vc-log* remembers previous commit messages, and they can be brought
back by M-p and M-n.  Or are you talking about something different here?



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2023-04-21 22:16 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <835ya5m4p0.fsf@gnu.org>
     [not found] ` <ZDPkykCsW3i30UR9@ACM>
     [not found]   ` <83v8i4arzt.fsf@gnu.org>
     [not found]     ` <CANh=_JF0CEPDsWZSuyy9ymByma2LxcypP90O3-LQ+KhoJ8cqvg@mail.gmail.com>
     [not found]       ` <CANh=_JEO4-E79dPCLc3cRLi7=ftAzc+H1FC46eck1vJN3TD3Sg@mail.gmail.com>
2023-04-11  6:02         ` Mistakes in commit log messages Eli Zaretskii
2023-04-11 14:01           ` Alan Mackenzie
2023-04-11 14:57             ` Eli Zaretskii
2023-04-11 17:20               ` Alan Mackenzie
2023-04-11 18:00                 ` Eli Zaretskii
2023-04-11 18:31             ` Jim Porter
2023-04-11 18:45               ` Eli Zaretskii
2023-04-11 19:27                 ` Jim Porter
2023-04-11 19:36                   ` Eli Zaretskii
2023-04-12  0:20                     ` Jim Porter
2023-04-13  6:18                       ` Jim Porter
2023-04-13  6:49                         ` Eli Zaretskii
2023-04-13  7:47                           ` Robert Pluim
2023-04-15  3:41                           ` Jim Porter
2023-04-15  5:45                             ` Jim Porter
2023-04-15  7:15                               ` Eli Zaretskii
2023-04-15 10:44                                 ` Alan Mackenzie
2023-04-15 11:00                                   ` Eli Zaretskii
2023-04-21 22:16                                   ` Filipp Gunbin
2023-04-15 20:54                               ` Jim Porter
2023-04-15 21:23                                 ` Jim Porter
2023-04-16  5:43                                   ` Eli Zaretskii
2023-04-16 20:06                                     ` Jim Porter
2023-04-16 20:19                                       ` Michael Albinus
2023-04-17  2:22                                       ` Eli Zaretskii
2023-04-17  7:28                                         ` Michael Albinus
2023-04-21  4:59                                 ` Jim Porter
2023-04-15  7:08                             ` Eli Zaretskii
2023-04-12  9:41                     ` Alan Mackenzie
2023-04-12 10:14                       ` Eli Zaretskii
2023-04-12  9:32               ` Alan Mackenzie

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).