From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: commit-msg hook Date: Tue, 14 Apr 2015 18:08:48 +0300 Message-ID: <83pp76bvcv.fsf@gnu.org> References: <83y4m0tgac.fsf@gnu.org> <55288A2D.5030809@cs.ucla.edu> <83d23bruui.fsf@gnu.org> <83a8yff0pl.fsf@gnu.org> <5529406D.4040304@cs.ucla.edu> <83sic6ehzc.fsf@gnu.org> <55297F68.4080103@cs.ucla.edu> <83lhhwc9lv.fsf@gnu.org> <552C0CEF.8070403@cs.ucla.edu> <83fv83dbo2.fsf@gnu.org> <552C32F7.5010206@cs.ucla.edu> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1429024151 3331 80.91.229.3 (14 Apr 2015 15:09:11 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 14 Apr 2015 15:09:11 +0000 (UTC) Cc: emacs-devel@gnu.org To: Paul Eggert Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Apr 14 17:09:02 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Yi2SU-00030p-AL for ged-emacs-devel@m.gmane.org; Tue, 14 Apr 2015 17:09:02 +0200 Original-Received: from localhost ([::1]:56338 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yi2ST-0008C6-FW for ged-emacs-devel@m.gmane.org; Tue, 14 Apr 2015 11:09:01 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:44155) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yi2SA-0008Bs-SJ for emacs-devel@gnu.org; Tue, 14 Apr 2015 11:08:48 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yi2S5-0004tt-T5 for emacs-devel@gnu.org; Tue, 14 Apr 2015 11:08:42 -0400 Original-Received: from mtaout29.012.net.il ([80.179.55.185]:55631) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yi2S5-0004tl-Ln for emacs-devel@gnu.org; Tue, 14 Apr 2015 11:08:37 -0400 Original-Received: from conversion-daemon.mtaout29.012.net.il by mtaout29.012.net.il (HyperSendmail v2007.08) id <0NMS00500XW6S100@mtaout29.012.net.il> for emacs-devel@gnu.org; Tue, 14 Apr 2015 18:06:37 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout29.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NMS009Z7XZ0E7B0@mtaout29.012.net.il>; Tue, 14 Apr 2015 18:06:37 +0300 (IDT) In-reply-to: <552C32F7.5010206@cs.ucla.edu> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.179.55.185 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:185408 Archived-At: > Date: Mon, 13 Apr 2015 14:19:51 -0700 > From: Paul Eggert > CC: emacs-devel@gnu.org > > On 04/13/2015 01:18 PM, Eli Zaretskii wrote: > > Gawk has the --characters-as-bytes option since v4.0.0, which should > > countermand that, I think. > > Sure, although the code should work even plain POSIX awk, as there > should be no need to assume such a GNU extension when bootstrapping. > That is, the script could support either: > > 1. POSIX awk with multibyte OS support, with proper UTF-8 checking from > OS libraries; or > > 2. GNU awk 4 (2012) or later, with nearly-as-good UTF-8 checking > hand-coded into the script; or > > 3. Traditional awk without UTF-8 checking. > > Currently the script supports (1) and (3) but someone could add support > for (2). How about the following change? It improves on (3), and worked for me both on MS-Windows and on GNU/Linux. --- ./.git/hooks/commit-msg.~5~ 2015-04-12 19:11:27.481125000 +0300 +++ ./.git/hooks/commit-msg 2015-04-14 11:11:02.000000000 +0300 @@ -45,10 +45,13 @@ BEGIN { # These regular expressions assume traditional Unix unibyte behavior. # They are needed for old or broken versions of awk, e.g., - # mawk 1.3.3 (1996), or gawk on MSYS (2015). + # mawk 1.3.3 (1996), or gawk on MSYS (2015), and/or for systems that + # cannot use UTF-8 as the codeset for the locale. space = "[ \f\n\r\t\v]" non_space = "[^ \f\n\r\t\v]" - non_print = "[\1-\37\177]" + # The non_print below rejects control characters and surrogates + # UTF-8 for: 0x01-0x1f 0x7f 0x80-0x9f 0xd800-0xdbff 0xdc00-0xdfff + non_print = "[\1-\37\177]|\302[\200-\237]|\355([\240-\257]|[\260-\277])[\200-\277]" # Prefer POSIX regular expressions if available, as they do a # better job of checking. Similarly, prefer POSIX negated