From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.bugs Subject: bug#36251: Regex library doesn't recognize ']' in a character class Date: Tue, 18 Jun 2019 07:08:06 -0400 Message-ID: <87r27rywum.fsf@netris.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="4356"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) Cc: 36251@debbugs.gnu.org To: Abdulrahman Semrie Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Tue Jun 18 13:11:29 2019 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hdC1R-00010j-2a for guile-bugs@m.gmane.org; Tue, 18 Jun 2019 13:11:29 +0200 Original-Received: from localhost ([::1]:55612 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hdC1P-0006B5-Rv for guile-bugs@m.gmane.org; Tue, 18 Jun 2019 07:11:27 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:41635) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hdC16-0006Ak-4z for bug-guile@gnu.org; Tue, 18 Jun 2019 07:11:09 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hdC14-00041g-2t for bug-guile@gnu.org; Tue, 18 Jun 2019 07:11:07 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:58521) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hdC11-0003wk-7m for bug-guile@gnu.org; Tue, 18 Jun 2019 07:11:04 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hdC10-0000KJ-54 for bug-guile@gnu.org; Tue, 18 Jun 2019 07:11:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mark H Weaver Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Tue, 18 Jun 2019 11:11:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 36251 X-GNU-PR-Package: guile Original-Received: via spool by 36251-submit@debbugs.gnu.org id=B36251.15608562401224 (code B ref 36251); Tue, 18 Jun 2019 11:11:02 +0000 Original-Received: (at 36251) by debbugs.gnu.org; 18 Jun 2019 11:10:40 +0000 Original-Received: from localhost ([127.0.0.1]:43832 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hdC0e-0000Jf-FL for submit@debbugs.gnu.org; Tue, 18 Jun 2019 07:10:40 -0400 Original-Received: from world.peace.net ([64.112.178.59]:44778) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hdC0a-0000JP-BW for 36251@debbugs.gnu.org; Tue, 18 Jun 2019 07:10:38 -0400 Original-Received: from mhw by world.peace.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hdC0T-0006Y4-SJ; Tue, 18 Jun 2019 07:10:29 -0400 In-Reply-To: (Abdulrahman Semrie's message of "Sun, 16 Jun 2019 20:16:29 +0300") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:9349 Archived-At: Hi, Abdulrahman Semrie writes: > I am using the pattern [\\[\\]a-zA-Z]+ to match a string with left or > right bracket in it. However, the string-match function doesn=E2=80=99t m= atch > the =E2=80=98]=E2=80=99 character. To demonstrate with an example, try th= e following > funciton: > > (string-match "[\\[\\]a-zA-Z]+" "Text[ab]=E2=80=9D) > > The result for the above function should have been a match structure > with Text[ab] matched. However, the string-match returns #f which is > incorrect. To test if the pattern I am using was right, I tried on > regex101.com and it works. Here (https://regex101.com/r/VAl6aI/1) is > the link that demonstrates that it works. It turns out that there are several flavors of regular expressions in common use, with different features and syntax. The link you provided is using PCRE (PHP) regular expressions (see the "flavor" pane on the left), and there are three other supported flavors on that web site. Guile's (ice-9 regex) module provides a simpler flavor of regexps known as "POSIX extended regular expressions", implemented as a thin wrapper around your system's POSIX regular expression library ('regcomp' and 'regexec'). The web site you referenced does not appear to support POSIX extended regular expressions, but here are some links about them: https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_= Expressions https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#= tag_09_04 One of the notable differences is that in POSIX extended regular expressions, character classes do not support backslash escapes, but instead use a more ad-hoc approach as described. Regards, Mark