unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: emacs-devel@gnu.org
Subject: Re: commit-msg hook
Date: Tue, 14 Apr 2015 18:08:48 +0300	[thread overview]
Message-ID: <83pp76bvcv.fsf@gnu.org> (raw)
In-Reply-To: <552C32F7.5010206@cs.ucla.edu>

> Date: Mon, 13 Apr 2015 14:19:51 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: emacs-devel@gnu.org
> 
> On 04/13/2015 01:18 PM, Eli Zaretskii wrote:
> > Gawk has the --characters-as-bytes option since v4.0.0, which should
> > countermand that, I think.
> 
> Sure, although the code should work even plain POSIX awk, as there 
> should be no need to assume such a GNU extension when bootstrapping.  
> That is, the script could support either:
> 
> 1. POSIX awk with multibyte OS support, with proper UTF-8 checking from 
> OS libraries; or
> 
> 2. GNU awk 4 (2012) or later, with nearly-as-good UTF-8 checking 
> hand-coded into the script; or
> 
> 3. Traditional awk without UTF-8 checking.
> 
> Currently the script supports (1) and (3) but someone could add support 
> for (2).

How about the following change?  It improves on (3), and worked for me
both on MS-Windows and on GNU/Linux.

--- ./.git/hooks/commit-msg.~5~	2015-04-12 19:11:27.481125000 +0300
+++ ./.git/hooks/commit-msg	2015-04-14 11:11:02.000000000 +0300
@@ -45,10 +45,13 @@
   BEGIN {
     # These regular expressions assume traditional Unix unibyte behavior.
     # They are needed for old or broken versions of awk, e.g.,
-    # mawk 1.3.3 (1996), or gawk on MSYS (2015).
+    # mawk 1.3.3 (1996), or gawk on MSYS (2015), and/or for systems that
+    # cannot use UTF-8 as the codeset for the locale.
     space = "[ \f\n\r\t\v]"
     non_space = "[^ \f\n\r\t\v]"
-    non_print = "[\1-\37\177]"
+    # The non_print below rejects control characters and surrogates
+    # UTF-8 for: 0x01-0x1f 0x7f   0x80-0x9f    0xd800-0xdbff     0xdc00-0xdfff
+    non_print = "[\1-\37\177]|\302[\200-\237]|\355([\240-\257]|[\260-\277])[\200-\277]"
 
     # Prefer POSIX regular expressions if available, as they do a
     # better job of checking.  Similarly, prefer POSIX negated



  reply	other threads:[~2015-04-14 15:08 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-10 10:43 commit-msg hook Eli Zaretskii
2015-04-10 18:23 ` Johan Bockgård
2015-04-11  2:42 ` Paul Eggert
2015-04-11  7:24   ` Eli Zaretskii
2015-04-11  9:55     ` Eli Zaretskii
2015-04-11  9:59       ` Eli Zaretskii
2015-04-11 12:42         ` Dmitry Gutov
2015-04-11 14:29           ` Eli Zaretskii
2015-04-11 15:13             ` Dmitry Gutov
2015-04-11 15:17               ` Eli Zaretskii
2015-04-12  3:36                 ` Stefan Monnier
2015-04-12 18:54                   ` chad
2015-04-11 15:40       ` Paul Eggert
2015-04-11 16:40         ` Eli Zaretskii
2015-04-11 20:09           ` Paul Eggert
2015-04-12 16:10             ` Eli Zaretskii
2015-04-13 15:48             ` Eli Zaretskii
2015-04-13 18:37               ` Paul Eggert
2015-04-13 20:18                 ` Eli Zaretskii
2015-04-13 21:19                   ` Paul Eggert
2015-04-14 15:08                     ` Eli Zaretskii [this message]
2015-04-14 17:01                       ` Paul Eggert
2015-04-14 17:09                         ` Eli Zaretskii
2015-04-14 17:42                           ` Paul Eggert
2015-04-14 18:01                             ` Eli Zaretskii
2015-04-14 18:32                               ` Paul Eggert
2015-04-14 18:59                                 ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83pp76bvcv.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).