From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: commit-msg hook Date: Tue, 14 Apr 2015 10:42:53 -0700 Organization: UCLA Computer Science Department Message-ID: <552D519D.1000907@cs.ucla.edu> References: <83y4m0tgac.fsf@gnu.org> <55288A2D.5030809@cs.ucla.edu> <83d23bruui.fsf@gnu.org> <83a8yff0pl.fsf@gnu.org> <5529406D.4040304@cs.ucla.edu> <83sic6ehzc.fsf@gnu.org> <55297F68.4080103@cs.ucla.edu> <83lhhwc9lv.fsf@gnu.org> <552C0CEF.8070403@cs.ucla.edu> <83fv83dbo2.fsf@gnu.org> <552C32F7.5010206@cs.ucla.edu> <83pp76bvcv.fsf@gnu.org> <552D47D8.5020302@cs.ucla.edu> <838udubprd.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1429033397 12603 80.91.229.3 (14 Apr 2015 17:43:17 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 14 Apr 2015 17:43:17 +0000 (UTC) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Apr 14 19:43:08 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Yi4ra-0000KC-Vp for ged-emacs-devel@m.gmane.org; Tue, 14 Apr 2015 19:43:07 +0200 Original-Received: from localhost ([::1]:57355 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yi4rZ-0002xw-RY for ged-emacs-devel@m.gmane.org; Tue, 14 Apr 2015 13:43:05 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33231) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yi4rV-0002ux-4a for emacs-devel@gnu.org; Tue, 14 Apr 2015 13:43:02 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yi4rU-000885-CK for emacs-devel@gnu.org; Tue, 14 Apr 2015 13:43:01 -0400 Original-Received: from smtp.cs.ucla.edu ([131.179.128.62]:50114) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yi4rQ-000864-NW; Tue, 14 Apr 2015 13:42:56 -0400 Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id B9DACA6002B; Tue, 14 Apr 2015 10:42:54 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Original-Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ixuUmarfO7Vf; Tue, 14 Apr 2015 10:42:54 -0700 (PDT) Original-Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 1CDB739E8014; Tue, 14 Apr 2015 10:42:54 -0700 (PDT) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 In-Reply-To: <838udubprd.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 131.179.128.62 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:185421 Archived-At: On 04/14/2015 10:09 AM, Eli Zaretskii wrote: >> This sort of thing should work in a unibyte environment, but it needs to >> >be used only after testing that we actually are in a unibyte environment. > I thought that's what all the tests with cent_sign and at_sign do, > don't they? No, they test for something more specific, namely, whether we're in a UTF-8 locale. Not every multibyte locale uses UTF-8. > what bad things could happen if this regular expression is > used in a multibyte environment? I suppose it could cause the script to print "Unprintable character in commit message" even though all the message's characters are actually printable. How about this idea? Before falling back to the unibyte regular expressions in awk, set LC_ALL='C' in the environment. This should work well enough, as in practice all environments where the C locale is multibyte have working UTF-8 so they won't need to fall back to unibyte anyway.