unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Jim Porter <jporterbugs@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: acm@muc.de, emacs-devel@gnu.org
Subject: Re: Mistakes in commit log messages
Date: Sat, 15 Apr 2023 13:54:05 -0700	[thread overview]
Message-ID: <b2b335ef-837c-f1e0-8ae2-d433e849c6e7@gmail.com> (raw)
In-Reply-To: <9c8826af-a160-d36f-7577-2bc5ec0c2004@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3743 bytes --]

On 4/14/2023 10:45 PM, Jim Porter wrote:
> Here are a couple small fixes after testing against the 2000 latest 
> commits (though I haven't examined *all* the errors in detail).

Ok, I've now gone over the 5000 latest commits in some closer detail. 
Here are the main classes of error I see:

1. Commit message lines starting with "*" that aren't file entries.

2. Commit messages from Eglot's import (these aren't *really* a problem, 
and I think authors.el is smart enough to handle them; if something like 
this happened in future, we could just use "git push --no-verify" to 
ignore this hook).

3. Obviously-wrong typos of file names (including files listed in the 
commit message that exist, but aren't in the diff).

I think erroring out on all three of these cases makes sense. Number 2 
isn't truly an error, but it would be hard to detect that reliably in 
the hook (at least not without adding a list of exceptions), and it's 
rare enough that we can just use "--no-verify" as needed (or fix up the 
commit messages during the import).

There are a few cases that are more interesting though:

A. Wildcards in filenames
-------------------------

626f2f7441, 623db40dd1, 4b5414abbc, c32212bf96, f4bfe7834a, 23c1ee6989, 
7e185bc9ba, 36474a1e49, 5fb8b20fa3, c45bfd3c4a, fcc7373b28, 8e9835c673, 
952550258d, d1a7d16f8e, dc083ebc4e, 5f74397490, 0f85f2c0e5, 49bad2a0a6, 
5a77517e7d, 24b9515da0, ae3904bb5d, 228d9d615d, 8461cfc8fc, e402887d5d, 
ab7dddea90, 9e4f11a163, 6342264ef7

These cases use wildcards in the file names. I think we should avoid 
doing this, for the same reason the GNU ChangeLog standards say to avoid 
doing it for function names:

"If you mention the names of the modified functions or variables, it’s 
important to name them in full. ...  Subsequent maintainers will often 
search for a function name to find all the change log entries that 
pertain to it; if you abbreviate the name, they won’t find it when they 
search."[1]

Therefore, I didn't do anything about these in the latest revision of 
the hook: they're still errors. For committers that don't want to list 
every file explicitly, they could just add an entry for the directory, 
*without* any wildcards.

B. File entry for a directory, using a trailing "/"
---------------------------------------------------

9b80fe55f9, 28d0654943, d9c94e93b7, 7d0dc31833, 16975078b4, 07235678a4, 
c55f4055dd, 9451ea0a05, f342b7c969

I think this is perfectly ok, so I've fixed the hook to allow this (see 
attached).

C. Line-wrapped file entries
----------------------------

8675f4136c, 0e25a39e69, 7c79eea51d, 335a5fd173, 31f8ae53be

These are cases where file entries extend multiple lines. In the first 
example, a single file is split onto two lines, which I don't think is 
feasible to identify. In the other cases, the entries are formatted like so:

     * path/to/file.txt,
     path/to/other/file.txt: Change something.

I've partially fixed this so that it can properly parse the first line, 
but handling the second line is tricky, so I didn't bother. It would be 
better if committers used this format anyway:

     * path/to/file.txt:
     * path/to/other/file.txt: Change something.

D. Merge commit with file entries (and no diff)
-----------------------------------------------

289b457cac

This is an odd case: it's a merge commit that mentions a bunch of file 
changes, but there's no diff in this commit. I'm not really sure what to 
do about this one, but maybe we shouldn't add file entries for cases 
like this under normal circumstances. If there's an exception to that 
rule, we can always push a commit like this with "--no-verify".



[1] 
https://www.gnu.org/prep/standards/html_node/Style-of-Change-Logs.html#Style-of-Change-Logs

[-- Attachment #2: 0001-Add-Git-hooks-to-check-filenames-listed-in-the-commi.patch --]
[-- Type: text/plain, Size: 8987 bytes --]

From db23241d5edd61acca14899c804eb39e13385879 Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Wed, 12 Apr 2023 23:03:31 -0700
Subject: [PATCH] Add Git hooks to check filenames listed in the commit message

* build-aux/git-hooks/commit-msg-files.awk:
* build-aux/git-hooks/post-commit:
* build-aux/git-hooks/pre-push: New files...
* autogen.sh: ... add them.
---
 autogen.sh                               |   3 +-
 build-aux/git-hooks/commit-msg-files.awk | 113 +++++++++++++++++++++++
 build-aux/git-hooks/post-commit          |  29 ++++++
 build-aux/git-hooks/pre-push             |  70 ++++++++++++++
 4 files changed, 214 insertions(+), 1 deletion(-)
 create mode 100644 build-aux/git-hooks/commit-msg-files.awk
 create mode 100755 build-aux/git-hooks/post-commit
 create mode 100755 build-aux/git-hooks/pre-push

diff --git a/autogen.sh b/autogen.sh
index af4c2ad14df..6127e7b24f4 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -340,7 +340,8 @@ hooks=
 tailored_hooks=
 sample_hooks=
 
-for hook in commit-msg pre-commit prepare-commit-msg; do
+for hook in commit-msg pre-commit prepare-commit-msg post-commit \
+            pre-push commit-msg-files.awk; do
     cmp -- build-aux/git-hooks/$hook "$hooks/$hook" >/dev/null 2>&1 ||
 	tailored_hooks="$tailored_hooks $hook"
 done
diff --git a/build-aux/git-hooks/commit-msg-files.awk b/build-aux/git-hooks/commit-msg-files.awk
new file mode 100644
index 00000000000..4566905657d
--- /dev/null
+++ b/build-aux/git-hooks/commit-msg-files.awk
@@ -0,0 +1,113 @@
+# Check the file list of GNU Emacs change log entries for each commit SHA.
+
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This file is part of GNU Emacs.
+
+# GNU Emacs is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU Emacs is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+### Commentary:
+
+# This script accepts a list of (unabbreviated) Git commit SHAs, and
+# will then iterate over them to check that any files mentioned in the
+# commit message are actually present in the commit's diff.  If not,
+# it will print out the incorrect file names and return 1.
+
+# You can also pass "-v reason=pre-push", which will add more-verbose
+# output, indicate the abbreviated commit SHA and first line of the
+# commit message for any improper commits.
+
+### Code:
+
+function get_commit_changes(commit_sha, changes,    cmd, i, j, len, \
+                            bits, filename) {
+  # Collect all the files touched in the specified commit.
+  cmd = ("git log -1 --name-status --format= " commit_sha)
+  while ((cmd | getline) > 0) {
+    for (i = 2; i <= NF; i++) {
+      len = split($i, bits, "/")
+      for (j = 1; j <= len; j++) {
+        if (j == 1)
+          filename = bits[j]
+        else
+          filename = filename "/" bits[j]
+        changes[filename] = 1
+      }
+    }
+  }
+  close(cmd)
+}
+
+function check_commit_msg_files(commit_sha, verbose,    changes, good, \
+                                cmd, msg, filenames_str, filenames, i) {
+  get_commit_changes(commit_sha, changes)
+  good = 1
+
+  cmd = ("git log -1 --format=%B " commit_sha)
+  while ((cmd | getline) > 0) {
+    if (verbose && ! msg)
+      msg = $0
+
+    # Find lines that reference files.  We look at any line starting
+    # with "*" (possibly prefixed by "; ") where the file part starts
+    # with an alphanumeric character.  The file part ends if we
+    # encounter any of the following characters: [ ( < { :
+    if (/^(; )?\*[ \t]+[[:alnum:]]/ && match($0, /[[:alnum:]][^[(<{:]*/)) {
+      # There might be multiple files listed on this line, separated
+      # by spaces (and possibly a comma).  Iterate over each of them.
+      split(substr($0, RSTART, RLENGTH), filenames, ",?([[:blank:]]+|$)")
+
+      for (i in filenames) {
+        # Remove trailing slashes from any directory entries.
+        sub(/\/$/, "", filenames[i])
+
+        if (length(filenames[i]) && ! (filenames[i] in changes)) {
+          if (good) {
+            # Print a header describing the error.
+            if (verbose)
+              printf("In commit %s \"%s\"...\n", substr(commit_sha, 1, 10), msg)
+            printf("Files listed in commit message, but not in diff:\n")
+          }
+          printf("  %s\n", filenames[i])
+          good = 0
+        }
+      }
+    }
+  }
+  close(cmd)
+
+  return good
+}
+
+BEGIN {
+  if (reason == "pre-push")
+    verbose = 1
+}
+
+/^[a-z0-9]{40}$/ {
+  if (! check_commit_msg_files($0, verbose)) {
+    status = 1
+  }
+}
+
+END {
+  if (status != 0) {
+    if (reason == "pre-push")
+      error_msg = "Push aborted"
+    else
+      error_msg = "Bad commit message"
+    printf("%s; please see the file 'CONTRIBUTE'\n", error_msg)
+  }
+  exit status
+}
diff --git a/build-aux/git-hooks/post-commit b/build-aux/git-hooks/post-commit
new file mode 100755
index 00000000000..4c30ec76e02
--- /dev/null
+++ b/build-aux/git-hooks/post-commit
@@ -0,0 +1,29 @@
+#!/bin/sh
+# Check the file list of GNU Emacs change log entries after committing.
+
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This file is part of GNU Emacs.
+
+# GNU Emacs is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU Emacs is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+# Prefer gawk if available, as it handles NUL bytes properly.
+if type gawk >/dev/null 2>&1; then
+  awk="gawk"
+else
+  awk="awk"
+fi
+
+git rev-parse HEAD | $awk -v reason=post-commit \
+                          -f .git/hooks/commit-msg-files.awk
diff --git a/build-aux/git-hooks/pre-push b/build-aux/git-hooks/pre-push
new file mode 100755
index 00000000000..b0185a97b28
--- /dev/null
+++ b/build-aux/git-hooks/pre-push
@@ -0,0 +1,70 @@
+#!/bin/sh
+# Check the file list of GNU Emacs change log entries before pushing.
+
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This file is part of GNU Emacs.
+
+# GNU Emacs is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU Emacs is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+# Prefer gawk if available, as it handles NUL bytes properly.
+if type gawk >/dev/null 2>&1; then
+  awk="gawk"
+else
+  awk="awk"
+fi
+
+# Standard input receives lines of the form:
+#   <local ref> SP <local name> SP <remote ref> SP <remote name> LF
+$awk -v origin_name="$1" '
+  # If the local SHA is all zeroes, ignore it.
+  $2 ~ /^0{40}$/ {
+    next
+  }
+
+  $2 ~ /^[a-z0-9]{40}$/ {
+    newref = $2
+    # If the remote SHA is all zeroes, this is a new object to be
+    # pushed (likely a branch).  Go backwards until we find a SHA on
+    # an origin branch.
+    if ($4 ~ /^0{40}$/) {
+      back = 0
+      cmd = ("git branch -r -l '\''" origin_name "/*'\'' --contains " \
+             newref "~" back)
+      while ((cmd | getline) == 0) {
+
+        # Only look back at most 1000 commits, just in case...
+        if (back++ > 1000)
+          break;
+      }
+      close(cmd)
+
+      cmd = ("git rev-parse " newref "~" back)
+      cmd | getline oldref
+      if (!(oldref ~ /^[a-z0-9]{40}$/)) {
+        # The SHA is misformatted!  Skip this line.
+        next
+      }
+      close(cmd)
+    } else if ($4 ~ /^[a-z0-9]{40}$/)  {
+      oldref = $4
+    } else {
+      # The SHA is misformatted!  Skip this line.
+      next
+    }
+
+    # Print every SHA after oldref, up to (and including) newref.
+    system("git rev-list --reverse " oldref ".." newref)
+  }
+' | $awk -v reason=pre-push -f .git/hooks/commit-msg-files.awk
-- 
2.25.1


  parent reply	other threads:[~2023-04-15 20:54 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <835ya5m4p0.fsf@gnu.org>
     [not found] ` <ZDPkykCsW3i30UR9@ACM>
     [not found]   ` <83v8i4arzt.fsf@gnu.org>
     [not found]     ` <CANh=_JF0CEPDsWZSuyy9ymByma2LxcypP90O3-LQ+KhoJ8cqvg@mail.gmail.com>
     [not found]       ` <CANh=_JEO4-E79dPCLc3cRLi7=ftAzc+H1FC46eck1vJN3TD3Sg@mail.gmail.com>
2023-04-11  6:02         ` Mistakes in commit log messages Eli Zaretskii
2023-04-11 14:01           ` Alan Mackenzie
2023-04-11 14:57             ` Eli Zaretskii
2023-04-11 17:20               ` Alan Mackenzie
2023-04-11 18:00                 ` Eli Zaretskii
2023-04-11 18:31             ` Jim Porter
2023-04-11 18:45               ` Eli Zaretskii
2023-04-11 19:27                 ` Jim Porter
2023-04-11 19:36                   ` Eli Zaretskii
2023-04-12  0:20                     ` Jim Porter
2023-04-13  6:18                       ` Jim Porter
2023-04-13  6:49                         ` Eli Zaretskii
2023-04-13  7:47                           ` Robert Pluim
2023-04-15  3:41                           ` Jim Porter
2023-04-15  5:45                             ` Jim Porter
2023-04-15  7:15                               ` Eli Zaretskii
2023-04-15 10:44                                 ` Alan Mackenzie
2023-04-15 11:00                                   ` Eli Zaretskii
2023-04-21 22:16                                   ` Filipp Gunbin
2023-04-15 20:54                               ` Jim Porter [this message]
2023-04-15 21:23                                 ` Jim Porter
2023-04-16  5:43                                   ` Eli Zaretskii
2023-04-16 20:06                                     ` Jim Porter
2023-04-16 20:19                                       ` Michael Albinus
2023-04-17  2:22                                       ` Eli Zaretskii
2023-04-17  7:28                                         ` Michael Albinus
2023-04-21  4:59                                 ` Jim Porter
2023-04-15  7:08                             ` Eli Zaretskii
2023-04-12  9:41                     ` Alan Mackenzie
2023-04-12 10:14                       ` Eli Zaretskii
2023-04-12  9:32               ` Alan Mackenzie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b2b335ef-837c-f1e0-8ae2-d433e849c6e7@gmail.com \
    --to=jporterbugs@gmail.com \
    --cc=acm@muc.de \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).