From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: ludo@gnu.org (Ludovic =?iso-8859-1?Q?Court=E8s?=) Newsgroups: gmane.lisp.guile.devel Subject: Re: make check fails if no en_US.iso88591 locale Date: Thu, 10 Sep 2009 17:33:02 +0200 Message-ID: <8763bqu85t.fsf@gnu.org> References: <87pra1djys.fsf@arudy.ossau.uklinux.net> <322965.9784.qm@web37906.mail.mud.yahoo.com> <873a6v7pjr.fsf@arudy.ossau.uklinux.net> <1252550169.24639.66.camel@localhost.localdomain> <87r5uf8574.fsf@gnu.org> <1252586696.24639.116.camel@localhost.localdomain> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: ger.gmane.org 1252598792 2546 80.91.229.12 (10 Sep 2009 16:06:32 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 10 Sep 2009 16:06:32 +0000 (UTC) To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu Sep 10 18:06:24 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MlmAA-0002yw-PB for guile-devel@m.gmane.org; Thu, 10 Sep 2009 18:06:23 +0200 Original-Received: from localhost ([127.0.0.1]:37735 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MlmA9-0007pG-U3 for guile-devel@m.gmane.org; Thu, 10 Sep 2009 12:06:21 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MlleQ-0000kJ-NQ for guile-devel@gnu.org; Thu, 10 Sep 2009 11:33:34 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MlleM-0000hw-Sy for guile-devel@gnu.org; Thu, 10 Sep 2009 11:33:34 -0400 Original-Received: from [199.232.76.173] (port=54011 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MlleM-0000hq-PW for guile-devel@gnu.org; Thu, 10 Sep 2009 11:33:30 -0400 Original-Received: from lo.gmane.org ([80.91.229.12]:60721) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1MlleM-0001FW-62 for guile-devel@gnu.org; Thu, 10 Sep 2009 11:33:30 -0400 Original-Received: from list by lo.gmane.org with local (Exim 4.50) id 1MlleH-0008Eg-KS for guile-devel@gnu.org; Thu, 10 Sep 2009 17:33:25 +0200 Original-Received: from laptop-147-210-128-170.labri.fr ([147.210.128.170]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 10 Sep 2009 17:33:25 +0200 Original-Received: from ludo by laptop-147-210-128-170.labri.fr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 10 Sep 2009 17:33:25 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 108 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: laptop-147-210-128-170.labri.fr X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 24 Fructidor an 217 de la =?iso-8859-1?Q?R=E9volutio?= =?iso-8859-1?Q?n?= X-PGP-Key-ID: 0xEA52ECF4 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 821D 815D 902A 7EAB 5CEE D120 7FBA 3D4F EB1F 5364 X-OS: x86_64-unknown-linux-gnu User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) Cancel-Lock: sha1:NvXmq9tGPqY/COIqb7lJaEoIglg= X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:9300 Archived-At: --=-=-= Mike Gran writes: > I could fix the test by testing only characters 0 to 127 in a C locale > if a Latin-1 locale can't be found. Yes, that'd be nice. > I can also fix the test by using the 'setbinary' function --8<---------------cut here---------------start------------->8--- scheme@(guile-user)> (help setbinary) `setbinary' is a primitive procedure in the (guile) module. -- Scheme Procedure: setbinary Sets the encoding for the current input, output, and error ports to ISO-8859-1. That character encoding allows ports to operate on binary data. It also sets the default encoding for newly created ports to ISO-8859-1. The previous default encoding for new ports is returned --8<---------------cut here---------------end--------------->8--- It seems to do a lot of things, which aren't clear from the name. ;-) What can be done about it? At least it should be renamed, to `set-port-binary-mode!' or similar. Then it'd be nice if that functionality could be split in several functions, some operating on a per-port basis. After all, one can already do: (for-each (lambda (p) (set-port-encoding! p "ISO-8859-1")) (list (current-input-port) (current-output-port) (current-error-port))) So we just lack: ;; encoding for newly created ports (set-default-port-encoding! "ISO-8859-1") With that `setbinary' can be implemented in Scheme. > to force the encodings on stdin and stdout to a default value that > will pass through binary data, instead of calling 'setlocale'. Hmm, I think I'd still prefer `setlocale'. regexec(3) doesn't say anything about the string encoding. Do libc implementations actually expect plain ASCII or Latin-1? Or do they adapt to the current locale's encoding? > I looked in the POSIX spec on Regex for specific advice using 128-255 in > regex in the C locale. I didn't see anything offhand. The spec does > spend a lot of time talking about the interaction between the locale and > regular expressions. I get the impression from the spec that using > regex on 128-255 in the C locale is an unexpected use of regular > expressions. http://www.opengroup.org/onlinepubs/9699919799/functions/regexec.html reads: If, when regexec() is called, the locale is different from when the regular expression was compiled, the result is undefined. It makes me think that, if a process runs with a UTF-8 locale and passes raw UTF-8 bytes to regcomp(3) and regexec(3), it may work. Hmm, the program below, with UTF-8-encoded source, works both with a Latin-1 and a UTF-8 locale: --=-=-= Content-Type: text/x-csrc; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit #include #include #include int main (int argc, char *argv[]) { regex_t rx; regmatch_t match; setlocale (LC_ALL, "fr_FR.utf8"); regcomp (&rx, "ça", REG_EXTENDED); return regexec (&rx, "ça va ?", 1, &match, 0) == 0 ? EXIT_SUCCESS : EXIT_FAILURE; } --=-=-= Do you think it would work to just leave `regexp.test' as it is in 1.8? Thanks, Ludo'. --=-=-=--