* regexp character classes not supported?
@ 2012-12-28 16:39 Limbo Peng
2012-12-28 17:22 ` Mark H Weaver
0 siblings, 1 reply; 10+ messages in thread
From: Limbo Peng @ 2012-12-28 16:39 UTC (permalink / raw)
To: guile-user
[-- Attachment #1: Type: text/plain, Size: 286 bytes --]
Hi,
I'm confused by the result of string-match:
(string-match "[0-9]+" "abc123zzz") ;; this works, giving result:
#("abc123zzz" (3 . 6))
(string-match "\\d+" "abc123zzz") ;; this doesn't work, giving result: #f
Why isn't the "\\d+" syntax (character classes) supported?
-Limbo Peng
[-- Attachment #2: Type: text/html, Size: 469 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: regexp character classes not supported?
2012-12-28 16:39 regexp character classes not supported? Limbo Peng
@ 2012-12-28 17:22 ` Mark H Weaver
2012-12-28 17:41 ` Limbo Peng
0 siblings, 1 reply; 10+ messages in thread
From: Mark H Weaver @ 2012-12-28 17:22 UTC (permalink / raw)
To: Limbo Peng; +Cc: guile-user
Limbo Peng <iwinux@gmail.com> writes:
> I'm confused by the result of string-match:
>
> (string-match "[0-9]+" "abc123zzz") ;; this works, giving result: #
> ("abc123zzz" (3 . 6))
> (string-match "\\d+" "abc123zzz") ;; this doesn't work, giving result:
> #f
>
> Why isn't the "\\d+" syntax (character classes) supported?
Regular expression syntax is not standardized, and there are several
different variants. The "\d" syntax for character classes is a
non-standard perl extension, and is not supported by Guile.
Guile supports the POSIX regexp syntax, whose character classes look
like this:
(string-match "[[:digit:]]+" "abc123zzz")
=> #("abc123zzz" (3 . 6))
For more information see:
http://www.gnu.org/software/guile/manual/html_node/Regular-Expressions.html
http://www.gnu.org/software/emacs/manual/html_node/emacs/Regexps.html
http://www.gnu.org/software/emacs/manual/html_node/elisp/Char-Classes.html
Mark
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: regexp character classes not supported?
2012-12-28 17:22 ` Mark H Weaver
@ 2012-12-28 17:41 ` Limbo Peng
2012-12-28 19:15 ` Mark H Weaver
2012-12-30 11:38 ` Marco Maggi
0 siblings, 2 replies; 10+ messages in thread
From: Limbo Peng @ 2012-12-28 17:41 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guile-user
[-- Attachment #1: Type: text/plain, Size: 448 bytes --]
On Sat, Dec 29, 2012 at 1:22 AM, Mark H Weaver <mhw@netris.org> wrote:
> Regular expression syntax is not standardized, and there are several
> different variants. The "\d" syntax for character classes is a
> non-standard perl extension, and is not supported by Guile.
>
Thx...seems that I've been taking such syntax for granted for a long time
:(
BTW: I've found a module called pregexp which looks more powerful - does it
work well in Guile?
[-- Attachment #2: Type: text/html, Size: 830 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: regexp character classes not supported?
2012-12-28 17:41 ` Limbo Peng
@ 2012-12-28 19:15 ` Mark H Weaver
2012-12-29 2:24 ` Limbo Peng
2013-01-06 10:47 ` Ian Price
2012-12-30 11:38 ` Marco Maggi
1 sibling, 2 replies; 10+ messages in thread
From: Mark H Weaver @ 2012-12-28 19:15 UTC (permalink / raw)
To: Limbo Peng; +Cc: guile-user
Limbo Peng <iwinux@gmail.com> writes:
> On Sat, Dec 29, 2012 at 1:22 AM, Mark H Weaver <mhw@netris.org> wrote:
>
> Regular expression syntax is not standardized, and there are
> several
> different variants. The "\d" syntax for character classes is a
> non-standard perl extension, and is not supported by Guile.
>
>
> Thx...seems that I've been taking such syntax for granted for a long
> time :(
>
> BTW: I've found a module called pregexp which looks more powerful -
> does it work well in Guile?
First, I should note that any regexp matcher written in Scheme will be
much slower than the one built-in to Guile (which is written in C).
There are some additional problems with pregexp. It does not appear to
be written with Unicode in mind, and is also written in such a way that
it will probably perform quite poorly on future versions of Guile.
If you'd like a more advanced regexp library, and don't mind the fact
that it will be much slower than Guile's built-in regexps, I recommend
that you look at Alex Shinn's irregex package:
http://synthcode.com/scheme/irregex/
Irregex is written with Unicode in mind, and supports not only
perl-style regexps, but also Olin Shivers' SRE (Scheme Regular
Expression) syntax, which is a far superior notation for complex
or dynamically-constructed regexps.
http://www.ccs.neu.edu/home/shivers/papers/sre.txt
Regards,
Mark
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: regexp character classes not supported?
2012-12-28 19:15 ` Mark H Weaver
@ 2012-12-29 2:24 ` Limbo Peng
2013-01-06 10:47 ` Ian Price
1 sibling, 0 replies; 10+ messages in thread
From: Limbo Peng @ 2012-12-29 2:24 UTC (permalink / raw)
Cc: guile-user
[-- Attachment #1: Type: text/plain, Size: 608 bytes --]
On Sat, Dec 29, 2012 at 3:15 AM, Mark H Weaver <mhw@netris.org> wrote:
> There are some additional problems with pregexp. It does not appear to
> be written with Unicode in mind, and is also written in such a way that
> it will probably perform quite poorly on future versions of Guile.
>
Thanks for pointing out the potential issues :)
> If you'd like a more advanced regexp library, and don't mind the fact
> that it will be much slower than Guile's built-in regexps, I recommend
> that you look at Alex Shinn's irregex package:
>
> http://synthcode.com/scheme/irregex/
>
I'll take a look at this.
[-- Attachment #2: Type: text/html, Size: 1234 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: regexp character classes not supported?
2012-12-28 19:15 ` Mark H Weaver
2012-12-29 2:24 ` Limbo Peng
@ 2013-01-06 10:47 ` Ian Price
2013-01-06 12:02 ` Klaus Schilling
1 sibling, 1 reply; 10+ messages in thread
From: Ian Price @ 2013-01-06 10:47 UTC (permalink / raw)
To: Mark H Weaver; +Cc: Limbo Peng, guile-user
Mark H Weaver <mhw@netris.org> writes:
> If you'd like a more advanced regexp library, and don't mind the fact
> that it will be much slower than Guile's built-in regexps, I recommend
> that you look at Alex Shinn's irregex package:
>
> http://synthcode.com/scheme/irregex/
It is a very nice library, and guildhall users can obtain it with
guild install wak-irregex
and require it with (use-modules (wak irregex))
> Irregex is written with Unicode in mind, and supports not only
> perl-style regexps, but also Olin Shivers' SRE (Scheme Regular
> Expression) syntax, which is a far superior notation for complex
> or dynamically-constructed regexps.
+1
--
Ian Price -- shift-reset.com
"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: regexp character classes not supported?
2013-01-06 10:47 ` Ian Price
@ 2013-01-06 12:02 ` Klaus Schilling
2013-01-08 12:32 ` Ian Price
0 siblings, 1 reply; 10+ messages in thread
From: Klaus Schilling @ 2013-01-06 12:02 UTC (permalink / raw)
To: ianprice90; +Cc: guile-user, iwinux
From: Ian Price <ianprice90@googlemail.com>
Subject: Re: regexp character classes not supported?
Date: Sun, 06 Jan 2013 10:47:11 +0000
> Mark H Weaver <mhw@netris.org> writes:
>
> > If you'd like a more advanced regexp library, and don't mind the fact
> > that it will be much slower than Guile's built-in regexps, I recommend
> > that you look at Alex Shinn's irregex package:
> >
> > http://synthcode.com/scheme/irregex/
>
> It is a very nice library, and guildhall users can obtain it with
>
> guild install wak-irregex
>
> and require it with (use-modules (wak irregex))
>
> > Irregex is written with Unicode in mind, and supports not only
> > perl-style regexps, but also Olin Shivers' SRE (Scheme Regular
> > Expression) syntax, which is a far superior notation for complex
> > or dynamically-constructed regexps.
does irregex support SNOBOL-style patterns?
Klaus Schilling
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: regexp character classes not supported?
2013-01-06 12:02 ` Klaus Schilling
@ 2013-01-08 12:32 ` Ian Price
2013-01-14 22:41 ` Klaus Schilling
0 siblings, 1 reply; 10+ messages in thread
From: Ian Price @ 2013-01-08 12:32 UTC (permalink / raw)
To: Klaus Schilling; +Cc: guile-user
Klaus Schilling <schilling.klaus@web.de> writes:
> does irregex support SNOBOL-style patterns?
I have not used SNOBOL, so I can't really say for sure if irregexes
functionality subsumes it. The documentation itself does not refer to
SNOBOL at all.
--
Ian Price -- shift-reset.com
"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: regexp character classes not supported?
2013-01-08 12:32 ` Ian Price
@ 2013-01-14 22:41 ` Klaus Schilling
0 siblings, 0 replies; 10+ messages in thread
From: Klaus Schilling @ 2013-01-14 22:41 UTC (permalink / raw)
To: ianprice90; +Cc: guile-user
From: Ian Price <ianprice90@googlemail.com>
Subject: Re: regexp character classes not supported?
Date: Tue, 08 Jan 2013 12:32:02 +0000
> Klaus Schilling <schilling.klaus@web.de> writes:
>
> > does irregex support SNOBOL-style patterns?
>
> I have not used SNOBOL, so I can't really say for sure if irregexes
> functionality subsumes it. The documentation itself does not refer to
> SNOBOL at all.
>
Snobol's pattern are said to be equivalent to recursive descent parsers
with backtracking or to context free grammers.
Klaus Schilling
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: regexp character classes not supported?
2012-12-28 17:41 ` Limbo Peng
2012-12-28 19:15 ` Mark H Weaver
@ 2012-12-30 11:38 ` Marco Maggi
1 sibling, 0 replies; 10+ messages in thread
From: Marco Maggi @ 2012-12-30 11:38 UTC (permalink / raw)
To: Limbo Peng; +Cc: guile-user
Limbo Peng wrote:
> On Sat, Dec 29, 2012 at 1:22 AM, Mark H Weaver <mhw@netris.org> wrote:
> Regular expression syntax is not standardized, and there are
> several
> different variants. The "\d" syntax for character classes is a
> non-standard perl extension, and is not supported by Guile.
> Thx...seems that I've been taking such syntax for granted for a long
> time :(
Shameless plug: if you do not mind installing stuff, you
can try the regexp library re2[1] (written in C++) through
its C wrapper CRE2[2][3].
[1] <http://code.google.com/p/re2/>
[2] <http://github.com/marcomaggi/cre2/>
[3] <http://code.google.com/p/cre2/downloads/list>
Here is a Guile program making use of it through the
foreign functions interface (sorry for the R6RS code, it
also needs to be polished here and there):
;; guile-cre2.sps --
;;
;; Show off CRE2 with Guile.
#!r6rs
(import (rnrs)
(system foreign)
(ice-9 format))
(define-syntax begin0
(syntax-rules ()
((_ ?expr0 ?expr ...)
(call-with-values
(lambda () ?expr0)
(lambda args ?expr ... (apply values args))))))
(define-syntax unwind-protect
;;Not general, but enough.
(syntax-rules ()
((_ ?body ?cleanup0 ?cleanup ...)
(let ((cleanup (lambda () ?cleanup0 ?cleanup ...)))
(with-exception-handler
(lambda (E) (cleanup) (raise E))
(lambda () (begin0 ?body (cleanup))))))))
(define (main)
(let* ((ptn "(ciao) (hello)")
(ptn.str (string->pointer ptn))
(ptn.len (string-length ptn))
(opt (cre2_opt_new))
(rex (cre2_new ptn.str ptn.len opt)))
(unwind-protect
(let* ((txt "ciao hello")
(txt.str (string->pointer txt))
(txt.len (string-length txt))
(nmatch 3)
(matches (make-cre2_string_t nmatch))
(ranges (make-cre2_range_t nmatch)))
(let ((rv (cre2_match rex
txt.str txt.len 0 txt.len
CRE2_UNANCHORED matches nmatch)))
(when (positive? rv)
(cre2_strings_to_ranges txt.str ranges matches nmatch)
(let ((R (parse-cre2_range_t ranges nmatch)))
(print "Full match: ~s\n"
(substring txt (list-ref R 0) (list-ref R 1)))
(print "1st submatch: ~s\n"
(substring txt (list-ref R 2) (list-ref R 3)))
(print "2nd submatch: ~s\n"
(substring txt (list-ref R 4) (list-ref R 5)))
))))
(cre2_delete rex)
(cre2_opt_delete opt))))
(define cre2
(dynamic-link "libcre2.so"))
(define cre2_new
(let* ((ptr (dynamic-func "cre2_new" cre2))
(callout (pointer->procedure '* ptr (list '* int '*))))
(lambda (ptn.str ptn.len options)
(callout ptn.str ptn.len options))))
(define cre2_delete
(let* ((ptr (dynamic-func "cre2_delete" cre2))
(callout (pointer->procedure void ptr (list '*))))
(lambda (rex)
(callout rex))))
(define cre2_opt_new
(let* ((ptr (dynamic-func "cre2_opt_new" cre2))
(callout (pointer->procedure '* ptr '())))
(lambda ()
(callout))))
(define cre2_opt_delete
(let* ((ptr (dynamic-func "cre2_opt_delete" cre2))
(callout (pointer->procedure void ptr (list '*))))
(lambda (options)
(callout options))))
(define cre2_match
(let* ((ptr (dynamic-func "cre2_match" cre2))
(callout (pointer->procedure
int ptr (list '* '* int
int int int '* int))))
(lambda (rex txt.str txt.len txt.start txt.end anchor match nmatch)
(callout rex
txt.str txt.len txt.start txt.end
anchor match nmatch))))
(define cre2_strings_to_ranges
(let* ((ptr (dynamic-func "cre2_strings_to_ranges" cre2))
(callout (pointer->procedure
void ptr (list '* '* '* int))))
(lambda (txt.str ranges strings nmatch)
(callout txt.str ranges strings nmatch))))
(define CRE2_UNANCHORED 1)
(define (make-cre2_string_t nmatch)
(do ((i 0 (+ 1 i))
(T '() (append (list '* int) T))
(V '() (append (list %null-pointer 0) V)))
((= i nmatch)
(make-c-struct T V))))
(define (make-cre2_range_t nmatch)
(do ((i 0 (+ 1 i))
(T '() (append (list long long) T))
(V '() (append '(0 0) V)))
((= i nmatch)
(make-c-struct T V))))
(define (parse-cre2_string_t S nmatch)
(do ((i 0 (+ 1 i))
(T '() (append (list '* int) T)))
((= i nmatch)
(parse-c-struct S T))))
(define (parse-cre2_range_t S nmatch)
(do ((i 0 (+ 1 i))
(T '() (append (list long long) T)))
((= i nmatch)
(parse-c-struct S T))))
(define (print template . args)
(apply format (current-output-port) template args))
(main)
;;; end of file
--
Marco Maggi
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-01-14 22:41 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-28 16:39 regexp character classes not supported? Limbo Peng
2012-12-28 17:22 ` Mark H Weaver
2012-12-28 17:41 ` Limbo Peng
2012-12-28 19:15 ` Mark H Weaver
2012-12-29 2:24 ` Limbo Peng
2013-01-06 10:47 ` Ian Price
2013-01-06 12:02 ` Klaus Schilling
2013-01-08 12:32 ` Ian Price
2013-01-14 22:41 ` Klaus Schilling
2012-12-30 11:38 ` Marco Maggi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).