From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mike Gran Newsgroups: gmane.lisp.guile.devel Subject: Re: make check fails if no en_US.iso88591 locale Date: Wed, 09 Sep 2009 19:36:09 -0700 Message-ID: <1252550169.24639.66.camel@localhost.localdomain> References: <87pra1djys.fsf@arudy.ossau.uklinux.net> <322965.9784.qm@web37906.mail.mud.yahoo.com> <873a6v7pjr.fsf@arudy.ossau.uklinux.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1252550219 28880 80.91.229.12 (10 Sep 2009 02:36:59 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 10 Sep 2009 02:36:59 +0000 (UTC) Cc: Guile Devel To: Neil Jerram Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu Sep 10 04:36:52 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MlZWj-0003XQ-T2 for guile-devel@m.gmane.org; Thu, 10 Sep 2009 04:36:50 +0200 Original-Received: from localhost ([127.0.0.1]:47170 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MlZWi-0005iR-VZ for guile-devel@m.gmane.org; Wed, 09 Sep 2009 22:36:49 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MlZWf-0005h8-LP for guile-devel@gnu.org; Wed, 09 Sep 2009 22:36:45 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MlZWf-0005gp-1a for guile-devel@gnu.org; Wed, 09 Sep 2009 22:36:45 -0400 Original-Received: from [199.232.76.173] (port=58919 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MlZWe-0005gm-Uw for guile-devel@gnu.org; Wed, 09 Sep 2009 22:36:44 -0400 Original-Received: from smtp104.prem.mail.sp1.yahoo.com ([98.136.44.59]:34749) by monty-python.gnu.org with smtp (Exim 4.60) (envelope-from ) id 1MlZWe-0006YE-Hv for guile-devel@gnu.org; Wed, 09 Sep 2009 22:36:44 -0400 Original-Received: (qmail 58815 invoked from network); 10 Sep 2009 02:36:43 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Received:X-Yahoo-SMTP:X-YMail-OSG:X-Yahoo-Newman-Property:Subject:From:To:Cc:In-Reply-To:References:Content-Type:Date:Message-Id:Mime-Version:X-Mailer:Content-Transfer-Encoding; b=Q63VusmQpf22BBz15rNd9OEpM3t+Ig4RZtTMvpYrIMDAH6KLe1CjESbCNQRIO9IUMUdnI7tvCYtHM0kTjLZOnk5DcXPHFbEt9GdL6x3OsS4MVSbM8wOtyBDxBuphJHd77vrSeF8xbvoduQ+Wmsu1B+qPs6eT/v9JvokImUCaxCU= ; Original-Received: from adsl-71-130-218-93.dsl.irvnca.pacbell.net (spk121@71.130.218.93 with plain) by smtp104.prem.mail.sp1.yahoo.com with SMTP; 09 Sep 2009 19:36:43 -0700 PDT X-Yahoo-SMTP: FzNaA9iswBDuBl1BmgaIRDaP9Q-- X-YMail-OSG: CIgx9QEVM1m1hehh4KDua6oXBD.QkmaAfYTvO3JgyFRCZ0DQ5F3dsxjFCUcJK10ZFCE2Ei5uOPRjdEMVkgcG_ni6Ege1jsyo.J52NfaQ1YW7vWXGuUjKIjPhS1k50iFNPtDL_sqgwKHbw6LdFDeOnuSfezNRjRtZpz7wbIVSHKgc5rQgieJH0J8HTWksWdTVDcc.aWXy0t8pV9w4bqenojze2lkWPRNIqpprHsAJlGR2XnLT63M85D2lZlkVyyd8X1EX X-Yahoo-Newman-Property: ymail-3 In-Reply-To: <873a6v7pjr.fsf@arudy.ossau.uklinux.net> X-Mailer: Evolution 2.24.5 (2.24.5-2.fc10) X-detected-operating-system: by monty-python.gnu.org: FreeBSD 4.7-5.2 (or MacOS X 10.2-10.4) (2) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:9297 Archived-At: On Wed, 2009-09-09 at 22:53 +0100, Neil Jerram wrote: > > It is important. This is one of the problems with the whole Unicode > > effort. There is no Unicode-capable regex library. The regexp.test > > tries matching all bytes from 0 to 255, and it uses scm_to_locale_string > > to prep the string for dispatch to the libc regex calls and > > scm_from_locale_string to send them back. [...] > Thanks for explaining; I think I understand now. So then Ludovic's > suggestion of with-latin1-locale should work, shouldn't it? Yeah. I went with that idea. > > > This regex library actually can be used with arbitrary Unicode data > > but it takes extra care. UTF-8 can be used as the locale, and, then > > regular expression must be written keeping in mind that each non-ASCII > > character is really a multibyte string. > > Can you give an example of what that ("keeping in mind...") means? Is > it being careful with repetition counts (as in "[a-z]{3}"), for > example? I'm not much of a regex guy, but, here's a couple of examples. First one that sort of works as expected. guile> (string-match "sé" "José") ==> #("José" (2 . 5)) Regex properly matches the word, but, the match struct (2 . 5) is referring to the bytes of the string, not the characters of the string. Here's one that doesn't work as expected. guile> (string-match "[:lower:]" "Hi, mom") ==> #("Hi, mom" (5 . 6)) guile> (string-match "[:lower:]" "Hí, móm") ==> #f Once you add accents on the vowels, nothing matches. Thanks, Mike