unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Jim Porter <jporterbugs@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: acm@muc.de, emacs-devel@gnu.org
Subject: Re: Mistakes in commit log messages
Date: Fri, 14 Apr 2023 20:41:49 -0700	[thread overview]
Message-ID: <5adfbf6f-fbcb-f4e8-3662-48bd5eb6a269@gmail.com> (raw)
In-Reply-To: <838rew5lak.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 2858 bytes --]

Ok, I think this patch should work, though I'll continue testing it 
locally before I merge it. We could also merge this patch sooner if we 
temporarily made the pre-push hook advisory (i.e. doesn't block a push). 
That way, others can try these hooks out without it blocking anyone's 
work. Then when we're happy with it, we can make the pre-push hook error 
out on bad commit messages.

On 4/12/2023 11:49 PM, Eli Zaretskii wrote:
> Please be sure to test the part that finds the file names on all the
> log messages in the repository.  The things people do in logs will
> sometimes surprise you.  In particular, a '*' after leading whitespace
> doesn't necessarily flag a file name, see, for example, the following
> commits:
> 
>    92d75e5c53241ac76e8fdcb6fc66ade68354687c

This works without errors with my latest patch. (Though it's not smart 
enough to recognize the "src/comp.c" as a file to check, since there's 
no leading "* ".) I think that's probably ok; it'd be hard to detect 
cases like that reliably.

>    0bd96806ef1a0f0d2d3f48cdb1204b7e393ab036

This fails, correctly I think. The first line of the commit message is 
"* Rename `comp--typeof-builtin-types'", which I don't think we should 
allow (going forward, at least). Since it's in the first line, we 
*could* treat that specially, but I'm not sure it's worth the 
complexity. Committers can just avoid starting lines with "*" (for 
example, by using " *" instead).

>    eff42dc0af741cc56c52d7d9577d29fc16f9f665

This also works without errors. An indented "*" is ok, and not checked 
by these hooks.

>    b5f70c239e87e5f38fd70181ef75cd28a43a8b41

This fails for two reasons: first, there's a line like this: "* 
buffer-match-p and match-buffers", which is recognized as a list of file 
names. That looks like auto-fill-mode (or something similar) adding a "* 
" where it shouldn't. That's happened to me before, so I'd be glad for 
the hook to catch this.

It also fails because of this line: "* lisp/window.el 
(display-buffer-assq-regexp): Mention what happens". That's correct too, 
since "lisp/window.el" wasn't changed in this commit.

If you apply my patch, you can also test out other commits via "echo 
COMMIT-SHA | awk -f build-aux/git-hooks/commit-msg-files.awk". (I'll do 
this locally as well.)

> Also, it looks like your script doesn't recognize file names in a line
> that starts with a semi-colon, as in this commit:

I fixed this case, though as far as I can tell, authors.el doesn't look 
at lines like this. (I could be wrong, since I just read over that code 
briefly.)

> What about the opposite: a file is mentioned in the diff, but not in
> the commit message?  Or maybe that is allowed, and we shouldn't block
> it?

I think that's ok. Robert Pluim mentions a couple of cases, and we 
probably also want to allow commit messages like: "; Fix last change".

[-- Attachment #2: 0001-Add-Git-hooks-to-check-filenames-listed-in-the-commi.patch --]
[-- Type: text/plain, Size: 8210 bytes --]

From f849fa082f0d7c8c3d472120e91e155b8700c65c Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Wed, 12 Apr 2023 23:03:31 -0700
Subject: [PATCH] Add Git hooks to check filenames listed in the commit message

* build-aux/git-hooks/commit-msg-files.awk:
* build-aux/git-hooks/post-commit:
* build-aux/git-hooks/pre-push: New files...
* autogen.sh: ... add them.
---
 autogen.sh                               |  2 +-
 build-aux/git-hooks/commit-msg-files.awk | 91 ++++++++++++++++++++++++
 build-aux/git-hooks/post-commit          | 29 ++++++++
 build-aux/git-hooks/pre-push             | 66 +++++++++++++++++
 4 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 build-aux/git-hooks/commit-msg-files.awk
 create mode 100755 build-aux/git-hooks/post-commit
 create mode 100755 build-aux/git-hooks/pre-push

diff --git a/autogen.sh b/autogen.sh
index af4c2ad14df..71d7ac89abf 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -340,7 +340,7 @@ hooks=
 tailored_hooks=
 sample_hooks=
 
-for hook in commit-msg pre-commit prepare-commit-msg; do
+for hook in commit-msg pre-commit prepare-commit-msg post-commit pre-push commit-msg-files.awk; do
     cmp -- build-aux/git-hooks/$hook "$hooks/$hook" >/dev/null 2>&1 ||
 	tailored_hooks="$tailored_hooks $hook"
 done
diff --git a/build-aux/git-hooks/commit-msg-files.awk b/build-aux/git-hooks/commit-msg-files.awk
new file mode 100644
index 00000000000..c1edccdf3e5
--- /dev/null
+++ b/build-aux/git-hooks/commit-msg-files.awk
@@ -0,0 +1,91 @@
+# Check the file list of GNU Emacs change log entries for each commit SHA.
+
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This file is part of GNU Emacs.
+
+# GNU Emacs is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU Emacs is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+function get_commit_changes(commit_sha, changes,    i, j, len, bits, filename) {
+  # Collect all the files touched in the specified commit.
+  while ((("git log -1 --name-status --format= " commit_sha) | getline) > 0) {
+    for (i = 2; i <= NF; i++) {
+      len = split($i, bits, "/")
+      for (j = 1; j <= len; j++) {
+        if (j == 1)
+          filename = bits[j]
+        else
+          filename = filename "/" bits[j]
+        changes[filename] = 1
+      }
+    }
+  }
+}
+
+function check_commit_msg_files(commit_sha, verbose,    changes, good, msg, \
+                                filenames_str, filenames, i) {
+  get_commit_changes(commit_sha, changes)
+  good = 1
+
+  while ((("git log -1 --format=%B " commit_sha) | getline) > 0) {
+    if (verbose && ! msg)
+      msg = $0
+
+    # Find lines that reference files.  We look at any line starting
+    # with "*" (possibly prefixed by "; ") where the file part starts
+    # with an alphanumeric character.  The file part ends if we
+    # encounter any of the following characters: [ ( < { :
+    if (/^(; )?\*[ \t]+[[:alnum:]]/ && match($0, "[[:alnum:]][^[(<{:]*")) {
+      # There might be multiple files listed on this line, separated
+      # by a comma and/or space.  Iterate over each of them.
+      split(substr($0, RSTART, RLENGTH), filenames, "[[:blank:],][[:blank:]]*")
+      for (i in filenames) {
+        if (length(filenames[i]) && ! (filenames[i] in changes)) {
+          if (good) {
+            # Print a header describing the error.
+            if (verbose)
+              printf("In commit %s \"%s\"...\n", substr(commit_sha, 1, 10), msg)
+            printf("Files listed in commit message, but not in diff:\n")
+          }
+          printf("  %s\n", filenames[i])
+          good = 0
+        }
+      }
+    }
+  }
+
+  return good
+}
+
+BEGIN {
+  if (reason == "pre-push")
+    verbose = 1
+}
+
+/^[a-z0-9]{40}$/ {
+  if (! check_commit_msg_files($0, verbose)) {
+    status = 1
+  }
+}
+
+END {
+  if (status != 0) {
+    if (reason == "pre-push")
+      error_msg = "Push aborted"
+    else
+      error_msg = "Bad commit message"
+    printf("%s; please see the file 'CONTRIBUTE'\n", error_msg)
+  }
+  exit status
+}
diff --git a/build-aux/git-hooks/post-commit b/build-aux/git-hooks/post-commit
new file mode 100755
index 00000000000..4c30ec76e02
--- /dev/null
+++ b/build-aux/git-hooks/post-commit
@@ -0,0 +1,29 @@
+#!/bin/sh
+# Check the file list of GNU Emacs change log entries after committing.
+
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This file is part of GNU Emacs.
+
+# GNU Emacs is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU Emacs is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+# Prefer gawk if available, as it handles NUL bytes properly.
+if type gawk >/dev/null 2>&1; then
+  awk="gawk"
+else
+  awk="awk"
+fi
+
+git rev-parse HEAD | $awk -v reason=post-commit \
+                          -f .git/hooks/commit-msg-files.awk
diff --git a/build-aux/git-hooks/pre-push b/build-aux/git-hooks/pre-push
new file mode 100755
index 00000000000..136f84e6691
--- /dev/null
+++ b/build-aux/git-hooks/pre-push
@@ -0,0 +1,66 @@
+#!/bin/sh
+# Check the file list of GNU Emacs change log entries before pushing.
+
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This file is part of GNU Emacs.
+
+# GNU Emacs is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU Emacs is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+# Prefer gawk if available, as it handles NUL bytes properly.
+if type gawk >/dev/null 2>&1; then
+  awk="gawk"
+else
+  awk="awk"
+fi
+
+# Standard input receives lines of the form:
+#   <local ref> SP <local name> SP <remote ref> SP <remote name> LF
+$awk -v origin_name="$1" '
+  # If the local SHA is all zeroes, ignore it.
+  $2 ~ /^0{40}$/ {
+    next
+  }
+
+  $2 ~ /^[a-z0-9]{40}$/ {
+    newref = $2
+    # If the remote SHA is all zeroes, this is a new object to be
+    # pushed (likely a branch).  Go backwards until we find a SHA on
+    # an origin branch.
+    if ($4 ~ /^0{40}$/) {
+      back = 0
+      while ((("git branch -r -l '\''" origin_name "/*'\'' --contains " \
+               newref "~" back) | getline) == 0) {
+
+        # Only look back at most 1000 commits, just in case...
+        if (back++ > 1000)
+          break;
+      }
+
+      ("git rev-parse " newref "~" back) | getline oldref
+      if (!(oldref ~ /^[a-z0-9]{40}$/)) {
+        # The SHA is misformatted!  Skip this line.
+        next
+      }
+    } else if ($4 ~ /^[a-z0-9]{40}$/)  {
+      oldref = $4
+    } else {
+      # The SHA is misformatted!  Skip this line.
+      next
+    }
+
+    # Print every SHA after oldref, up to (and including) newref.
+    system("git rev-list --reverse " oldref ".." newref)
+  }
+' | $awk -v reason=pre-push -f .git/hooks/commit-msg-files.awk
-- 
2.25.1


  parent reply	other threads:[~2023-04-15  3:41 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <835ya5m4p0.fsf@gnu.org>
     [not found] ` <ZDPkykCsW3i30UR9@ACM>
     [not found]   ` <83v8i4arzt.fsf@gnu.org>
     [not found]     ` <CANh=_JF0CEPDsWZSuyy9ymByma2LxcypP90O3-LQ+KhoJ8cqvg@mail.gmail.com>
     [not found]       ` <CANh=_JEO4-E79dPCLc3cRLi7=ftAzc+H1FC46eck1vJN3TD3Sg@mail.gmail.com>
2023-04-11  6:02         ` Mistakes in commit log messages Eli Zaretskii
2023-04-11 14:01           ` Alan Mackenzie
2023-04-11 14:57             ` Eli Zaretskii
2023-04-11 17:20               ` Alan Mackenzie
2023-04-11 18:00                 ` Eli Zaretskii
2023-04-11 18:31             ` Jim Porter
2023-04-11 18:45               ` Eli Zaretskii
2023-04-11 19:27                 ` Jim Porter
2023-04-11 19:36                   ` Eli Zaretskii
2023-04-12  0:20                     ` Jim Porter
2023-04-13  6:18                       ` Jim Porter
2023-04-13  6:49                         ` Eli Zaretskii
2023-04-13  7:47                           ` Robert Pluim
2023-04-15  3:41                           ` Jim Porter [this message]
2023-04-15  5:45                             ` Jim Porter
2023-04-15  7:15                               ` Eli Zaretskii
2023-04-15 10:44                                 ` Alan Mackenzie
2023-04-15 11:00                                   ` Eli Zaretskii
2023-04-21 22:16                                   ` Filipp Gunbin
2023-04-15 20:54                               ` Jim Porter
2023-04-15 21:23                                 ` Jim Porter
2023-04-16  5:43                                   ` Eli Zaretskii
2023-04-16 20:06                                     ` Jim Porter
2023-04-16 20:19                                       ` Michael Albinus
2023-04-17  2:22                                       ` Eli Zaretskii
2023-04-17  7:28                                         ` Michael Albinus
2023-04-21  4:59                                 ` Jim Porter
2023-04-15  7:08                             ` Eli Zaretskii
2023-04-12  9:41                     ` Alan Mackenzie
2023-04-12 10:14                       ` Eli Zaretskii
2023-04-12  9:32               ` Alan Mackenzie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5adfbf6f-fbcb-f4e8-3662-48bd5eb6a269@gmail.com \
    --to=jporterbugs@gmail.com \
    --cc=acm@muc.de \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).