From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: commit-msg hook Date: Tue, 14 Apr 2015 21:01:45 +0300 Message-ID: <83618ybncm.fsf@gnu.org> References: <83y4m0tgac.fsf@gnu.org> <55288A2D.5030809@cs.ucla.edu> <83d23bruui.fsf@gnu.org> <83a8yff0pl.fsf@gnu.org> <5529406D.4040304@cs.ucla.edu> <83sic6ehzc.fsf@gnu.org> <55297F68.4080103@cs.ucla.edu> <83lhhwc9lv.fsf@gnu.org> <552C0CEF.8070403@cs.ucla.edu> <83fv83dbo2.fsf@gnu.org> <552C32F7.5010206@cs.ucla.edu> <83pp76bvcv.fsf@gnu.org> <552D47D8.5020302@cs.ucla.edu> <838udubprd.fsf@gnu.org> <552D519D.1000907@cs.ucla.edu> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1429034516 32232 80.91.229.3 (14 Apr 2015 18:01:56 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 14 Apr 2015 18:01:56 +0000 (UTC) Cc: emacs-devel@gnu.org To: Paul Eggert Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Apr 14 20:01:46 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Yi59e-0001kb-3t for ged-emacs-devel@m.gmane.org; Tue, 14 Apr 2015 20:01:46 +0200 Original-Received: from localhost ([::1]:57415 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yi59d-0000v1-HF for ged-emacs-devel@m.gmane.org; Tue, 14 Apr 2015 14:01:45 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39061) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yi59X-0000oi-Nx for emacs-devel@gnu.org; Tue, 14 Apr 2015 14:01:40 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yi59S-0000zh-QV for emacs-devel@gnu.org; Tue, 14 Apr 2015 14:01:39 -0400 Original-Received: from mtaout27.012.net.il ([80.179.55.183]:42217) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yi59S-0000yp-IX for emacs-devel@gnu.org; Tue, 14 Apr 2015 14:01:34 -0400 Original-Received: from conversion-daemon.mtaout27.012.net.il by mtaout27.012.net.il (HyperSendmail v2007.08) id <0NMT00K005C8N700@mtaout27.012.net.il> for emacs-devel@gnu.org; Tue, 14 Apr 2015 20:56:30 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout27.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NMT00JD35U6O330@mtaout27.012.net.il>; Tue, 14 Apr 2015 20:56:30 +0300 (IDT) In-reply-to: <552D519D.1000907@cs.ucla.edu> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.179.55.183 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:185423 Archived-At: > Date: Tue, 14 Apr 2015 10:42:53 -0700 > From: Paul Eggert > CC: emacs-devel@gnu.org > > How about this idea? Before falling back to the unibyte regular > expressions in awk, set LC_ALL='C' in the environment. This should work > well enough, as in practice all environments where the C locale is > multibyte have working UTF-8 so they won't need to fall back to unibyte > anyway. You mean, like below? --- ./.git/hooks/commit-msg.~5~ 2015-04-12 19:11:27.481125000 +0300 +++ ./.git/hooks/commit-msg 2015-04-14 21:01:14.481125000 +0300 @@ -37,6 +37,8 @@ at_sign=`LC_ALL=en_US.UTF-8 $awk "$print_at_sign" /dev/null` if test "$at_sign" = @; then LC_ALL=en_US.UTF-8; export LC_ALL + else + LC_ALL=C; export LC_ALL fi fi @@ -45,10 +47,13 @@ BEGIN { # These regular expressions assume traditional Unix unibyte behavior. # They are needed for old or broken versions of awk, e.g., - # mawk 1.3.3 (1996), or gawk on MSYS (2015). + # mawk 1.3.3 (1996), or gawk on MSYS (2015), and/or for systems that + # cannot use UTF-8 as the codeset for the locale. space = "[ \f\n\r\t\v]" non_space = "[^ \f\n\r\t\v]" - non_print = "[\1-\37\177]" + # The non_print below rejects control characters and surrogates + # UTF-8 for: 0x01-0x1f 0x7f 0x80-0x9f 0xd800-0xdbff 0xdc00-0xdfff + non_print = "[\1-\37\177]|\302[\200-\237]|\355([\240-\257]|[\260-\277])[\200-\277]" # Prefer POSIX regular expressions if available, as they do a # better job of checking. Similarly, prefer POSIX negated