From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: dsmich@roadrunner.com Newsgroups: gmane.lisp.guile.bugs Subject: bug#57507: Regular expression matching depends on locale encoding Date: Thu, 01 Sep 2022 19:34:17 +0000 Message-ID: <58cf2a302a753608ba9b978ebace5f13ef0fae70@webmail> References: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=_cf6e2bbb45a9e4a4389f62f4f3ba8a83" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30469"; mail-complaints-to="usenet@ciao.gmane.io" Cc: "'57507@debbugs.gnu.org'" <57507@debbugs.gnu.org> To: "'Jean Abou Samra'" Original-X-From: bug-guile-bounces+guile-bugs=m.gmane-mx.org@gnu.org Thu Sep 01 21:35:58 2022 Return-path: Envelope-to: guile-bugs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oTpyn-0007jj-VR for guile-bugs@m.gmane-mx.org; Thu, 01 Sep 2022 21:35:57 +0200 Original-Received: from localhost ([::1]:49496 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oTpym-0006mn-TJ for guile-bugs@m.gmane-mx.org; Thu, 01 Sep 2022 15:35:56 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:53862) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oTpxv-0006C3-IC for bug-guile@gnu.org; Thu, 01 Sep 2022 15:35:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:54478) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oTpxu-00025J-AA for bug-guile@gnu.org; Thu, 01 Sep 2022 15:35:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oTpxu-00074p-62 for bug-guile@gnu.org; Thu, 01 Sep 2022 15:35:02 -0400 X-Loop: help-debbugs@gnu.org In-Reply-To: Resent-From: dsmich@roadrunner.com Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Thu, 01 Sep 2022 19:35:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 57507 X-GNU-PR-Package: guile Original-Received: via spool by 57507-submit@debbugs.gnu.org id=B57507.166206086927161 (code B ref 57507); Thu, 01 Sep 2022 19:35:02 +0000 Original-Received: (at 57507) by debbugs.gnu.org; 1 Sep 2022 19:34:29 +0000 Original-Received: from localhost ([127.0.0.1]:44227 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oTpxM-000740-GU for submit@debbugs.gnu.org; Thu, 01 Sep 2022 15:34:28 -0400 Original-Received: from p-impout006aa.msg.pkvw.co.charter.net ([47.43.26.137]:52035 helo=p-impout006.msg.pkvw.co.charter.net) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oTpxL-00073m-1K for 57507@debbugs.gnu.org; Thu, 01 Sep 2022 15:34:27 -0400 Original-Received: from localhost ([34.233.51.36]) by cmsmtp with ESMTP id TpxBoNsNkc2JfTpxBolfNc; Thu, 01 Sep 2022 19:34:18 +0000 X-Authority-Analysis: v=2.4 cv=SORR6cjH c=1 sm=1 tr=0 ts=6311093a a=TrnfHZhGi+cGSPqA0dbxTQ==:117 a=TrnfHZhGi+cGSPqA0dbxTQ==:17 a=KTtA7ReM4oAA:10 a=mDV3o1hIAAAA:8 a=LP8mQn3rpFOBcqUV9yAA:9 a=QEXdDO2ut3YA:10 a=p4KCnXIM3wF440YW14QA:9 a=AvTSyNsJAD-45a5L:21 a=_W_S_7VecoQA:10 a=_FVE-zBwftR9WsbkzFJk:22 X-Mailer: Atmail X-Originating-IP: [63.87.53.154] X-Priority: 3 Importance: Normal X-MSMail-Priority: Normal X-CMAE-Envelope: MS4xfN4CZhZLLZRVnlzOA4W0ELRl/2zMtitN1T+1OK1AEVjiP3xWVyPGKANl76N1+8aAeceyjy6Z03fro81h8OCOYsqxvtiDBGQcVfGARvBQX9kCBoUjZ1ff kPKkmRfVVvvz6S6No+W8PrTU3dXYipt0mTvhlYNBssImpgtrDyNlnF03T952aurKopR1MBh7EJpT+In0T6KVKtRNVKXQA8oQQEc= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane-mx.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.io gmane.lisp.guile.bugs:10357 Archived-At: --=_cf6e2bbb45a9e4a4389f62f4f3ba8a83 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable =0AAlso remember that Guile uses the system C library regex routines. An= d=0Ais using C strings, not Guile strings.=0A=0A(sorry for top post, too= tired to fight with this web editor)=0A=0A-Dale=0A=0A=09---------------= --------------------------From: "Jean Abou Samra" =0ATo: 57507@debbugs.g= nu.org=0ACc: =0ASent: Wednesday August 31 2022 12:55:13PM=0ASubject: bug= #57507: Regular expression matching depends on locale=0Aencoding=0A=0A R= egular expressions do funky things with Unicode if a=0Anon-Unicode-aware= =0A locale is set. Yet, they're purely string operations, so I don't=0At= hink=0A it's expected that they depend on the locale encoding.=0A=0A $ L= C_ALL=3DC guile3.0=0A GNU Guile 3.0.7=0A Copyright (C) 1995-2021 Free So= ftware Foundation, Inc.=0A=0A Guile comes with ABSOLUTELY NO WARRANTY; f= or details type `,show w'.=0A This program is free software, and you are= welcome to redistribute it=0A under certain conditions; type `,show c'= for details.=0A=0A Enter `,help' for help.=0A scheme@(guile-user)> (use= -modules (ice-9 regex))=0A scheme@(guile-user)> (match:substring (string= -match "u203f" "u3091"))=0A ice-9/boot-9.scm:1685:16: In procedure raise= -exception:=0A In procedure make-regexp: Invalid preceding regular expre= ssion=0A=0A Entering a new prompt. Type `,bt' for a backtrace or `,q' to= =0Acontinue.=0A scheme@(guile-user) [1]> ,q=0A scheme@(guile-user)> (mat= ch:substring (string-match "[u203f]"=0A"u3091"))=0A $1 =3D "u3091"=0A sc= heme@(guile-user)>=0A=0A --=_cf6e2bbb45a9e4a4389f62f4f3ba8a83 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Also remember that Guile uses the system C library regex= routines.  And is using C strings, not Guile strings.

(sorr= y for top post, too tired to fight with this web editor)

-Dale

---------------------------= --------------

From: "Jean Abou Samra"
To: 57= 507@debbugs.gnu.org
Cc:
Sent: Wednesday August 31 2022 12:55:13PM=
Subject: bug#57507: Regular expression matching depends on locale en= coding

=0ARegular expressions do funky things with Unicode if a= =0Anon-Unicode-aware
=0Alocale is set. Yet, they're purely string ope= rations, so I don't=0Athink
=0Ait's expected that they depend on the= locale encoding.



=0A$ LC_ALL=3DC guile3.0
=0AGNU Guil= e 3.0.7
=0ACopyright (C) 1995-2021 Free Software Foundation, Inc.
=
=0AGuile comes with ABSOLUTELY NO WARRANTY; for details type `,show= =0Aw'.
=0AThis program is free software, and you are welcome to redis= tribute=0Ait
=0Aunder certain conditions; type `,show c' for details.=

=0AEnter `,help' for help.
=0Ascheme@(guile-user)> (use-mo= dules (ice-9 regex))
=0Ascheme@(guile-user)> (match:substring (str= ing-match "\u203f"=0A"\u3091"))
=0Aice-9/boot-9.scm:1685:16: In proce= dure raise-exception:
=0AIn procedure make-regexp: Invalid preceding= regular expression

=0AEntering a new prompt.  Type `,bt' fo= r a backtrace or `,q' to=0Acontinue.
=0Ascheme@(guile-user) [1]> ,= q
=0Ascheme@(guile-user)> (match:substring (string-match "[\u203f]= "=0A"\u3091"))
=0A$1 =3D "\u3091"
=0Ascheme@(guile-user)>



--=_cf6e2bbb45a9e4a4389f62f4f3ba8a83--