unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* regexp character classes not supported?
@ 2012-12-28 16:39 Limbo Peng
  2012-12-28 17:22 ` Mark H Weaver
  0 siblings, 1 reply; 10+ messages in thread
From: Limbo Peng @ 2012-12-28 16:39 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 286 bytes --]

Hi,

I'm confused by the result of string-match:

(string-match "[0-9]+" "abc123zzz") ;; this works, giving result:
#("abc123zzz" (3 . 6))
(string-match "\\d+" "abc123zzz") ;; this doesn't work, giving result: #f

Why isn't the "\\d+" syntax (character classes) supported?

-Limbo Peng

[-- Attachment #2: Type: text/html, Size: 469 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: regexp character classes not supported?
  2012-12-28 16:39 regexp character classes not supported? Limbo Peng
@ 2012-12-28 17:22 ` Mark H Weaver
  2012-12-28 17:41   ` Limbo Peng
  0 siblings, 1 reply; 10+ messages in thread
From: Mark H Weaver @ 2012-12-28 17:22 UTC (permalink / raw)
  To: Limbo Peng; +Cc: guile-user

Limbo Peng <iwinux@gmail.com> writes:

> I'm confused by the result of string-match:
>
> (string-match "[0-9]+" "abc123zzz") ;; this works, giving result: #
> ("abc123zzz" (3 . 6))
> (string-match "\\d+" "abc123zzz") ;; this doesn't work, giving result:
> #f
>
> Why isn't the "\\d+" syntax (character classes) supported?

Regular expression syntax is not standardized, and there are several
different variants.  The "\d" syntax for character classes is a
non-standard perl extension, and is not supported by Guile.

Guile supports the POSIX regexp syntax, whose character classes look
like this:

  (string-match "[[:digit:]]+" "abc123zzz")
  => #("abc123zzz" (3 . 6))

For more information see:

  http://www.gnu.org/software/guile/manual/html_node/Regular-Expressions.html
  http://www.gnu.org/software/emacs/manual/html_node/emacs/Regexps.html
  http://www.gnu.org/software/emacs/manual/html_node/elisp/Char-Classes.html

     Mark



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: regexp character classes not supported?
  2012-12-28 17:22 ` Mark H Weaver
@ 2012-12-28 17:41   ` Limbo Peng
  2012-12-28 19:15     ` Mark H Weaver
  2012-12-30 11:38     ` Marco Maggi
  0 siblings, 2 replies; 10+ messages in thread
From: Limbo Peng @ 2012-12-28 17:41 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 448 bytes --]

On Sat, Dec 29, 2012 at 1:22 AM, Mark H Weaver <mhw@netris.org> wrote:

> Regular expression syntax is not standardized, and there are several
> different variants.  The "\d" syntax for character classes is a
> non-standard perl extension, and is not supported by Guile.
>

Thx...seems that I've been taking such syntax for granted for a long time
:(

BTW: I've found a module called pregexp which looks more powerful - does it
work well in Guile?

[-- Attachment #2: Type: text/html, Size: 830 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: regexp character classes not supported?
  2012-12-28 17:41   ` Limbo Peng
@ 2012-12-28 19:15     ` Mark H Weaver
  2012-12-29  2:24       ` Limbo Peng
  2013-01-06 10:47       ` Ian Price
  2012-12-30 11:38     ` Marco Maggi
  1 sibling, 2 replies; 10+ messages in thread
From: Mark H Weaver @ 2012-12-28 19:15 UTC (permalink / raw)
  To: Limbo Peng; +Cc: guile-user

Limbo Peng <iwinux@gmail.com> writes:

> On Sat, Dec 29, 2012 at 1:22 AM, Mark H Weaver <mhw@netris.org> wrote:
>
>     Regular expression syntax is not standardized, and there are
>     several
>     different variants.  The "\d" syntax for character classes is a
>     non-standard perl extension, and is not supported by Guile.
>     
>
> Thx...seems that I've been taking such syntax for granted for a long
> time :( 
>
> BTW: I've found a module called pregexp which looks more powerful -
> does it work well in Guile?

First, I should note that any regexp matcher written in Scheme will be
much slower than the one built-in to Guile (which is written in C).

There are some additional problems with pregexp.  It does not appear to
be written with Unicode in mind, and is also written in such a way that
it will probably perform quite poorly on future versions of Guile.

If you'd like a more advanced regexp library, and don't mind the fact
that it will be much slower than Guile's built-in regexps, I recommend
that you look at Alex Shinn's irregex package:

  http://synthcode.com/scheme/irregex/

Irregex is written with Unicode in mind, and supports not only
perl-style regexps, but also Olin Shivers' SRE (Scheme Regular
Expression) syntax, which is a far superior notation for complex
or dynamically-constructed regexps.

  http://www.ccs.neu.edu/home/shivers/papers/sre.txt

    Regards,
      Mark



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: regexp character classes not supported?
  2012-12-28 19:15     ` Mark H Weaver
@ 2012-12-29  2:24       ` Limbo Peng
  2013-01-06 10:47       ` Ian Price
  1 sibling, 0 replies; 10+ messages in thread
From: Limbo Peng @ 2012-12-29  2:24 UTC (permalink / raw)
  Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 608 bytes --]

On Sat, Dec 29, 2012 at 3:15 AM, Mark H Weaver <mhw@netris.org> wrote:

> There are some additional problems with pregexp.  It does not appear to
> be written with Unicode in mind, and is also written in such a way that
> it will probably perform quite poorly on future versions of Guile.
>

Thanks for pointing out the potential issues :)


> If you'd like a more advanced regexp library, and don't mind the fact
> that it will be much slower than Guile's built-in regexps, I recommend
> that you look at Alex Shinn's irregex package:
>
>   http://synthcode.com/scheme/irregex/
>

I'll take a look at this.

[-- Attachment #2: Type: text/html, Size: 1234 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: regexp character classes not supported?
  2012-12-28 17:41   ` Limbo Peng
  2012-12-28 19:15     ` Mark H Weaver
@ 2012-12-30 11:38     ` Marco Maggi
  1 sibling, 0 replies; 10+ messages in thread
From: Marco Maggi @ 2012-12-30 11:38 UTC (permalink / raw)
  To: Limbo Peng; +Cc: guile-user

Limbo Peng wrote:

> On Sat, Dec 29, 2012 at 1:22 AM, Mark H Weaver <mhw@netris.org> wrote:

>     Regular expression syntax is not standardized, and there are
>     several
>     different variants.  The "\d" syntax for character classes is a
>     non-standard perl extension, and is not supported by Guile.


> Thx...seems that I've been taking such syntax for granted for a long
> time :(

  Shameless plug: if  you do not mind  installing stuff, you
can try the  regexp library re2[1] (written  in C++) through
its C wrapper CRE2[2][3].

[1] <http://code.google.com/p/re2/>
[2] <http://github.com/marcomaggi/cre2/>
[3] <http://code.google.com/p/cre2/downloads/list>

  Here  is a  Guile program  making  use of  it through  the
foreign  functions interface  (sorry for  the R6RS  code, it
also needs to be polished here and there):

;; guile-cre2.sps --
;;
;; Show off CRE2 with Guile.

#!r6rs
(import (rnrs)
  (system foreign)
  (ice-9 format))

(define-syntax begin0
  (syntax-rules ()
    ((_ ?expr0 ?expr ...)
     (call-with-values
	 (lambda () ?expr0)
       (lambda args ?expr ... (apply values args))))))

(define-syntax unwind-protect
  ;;Not general, but enough.
  (syntax-rules ()
    ((_ ?body ?cleanup0 ?cleanup ...)
     (let ((cleanup (lambda () ?cleanup0 ?cleanup ...)))
       (with-exception-handler
	   (lambda (E) (cleanup) (raise E))
	 (lambda () (begin0 ?body (cleanup))))))))

(define (main)
  (let* ((ptn     "(ciao) (hello)")
	 (ptn.str (string->pointer ptn))
	 (ptn.len (string-length ptn))
	 (opt     (cre2_opt_new))
	 (rex     (cre2_new ptn.str ptn.len opt)))
    (unwind-protect
	(let* ((txt     "ciao hello")
	       (txt.str	(string->pointer txt))
	       (txt.len (string-length txt))
	       (nmatch	3)
	       (matches	(make-cre2_string_t nmatch))
	       (ranges	(make-cre2_range_t  nmatch)))
	  (let ((rv (cre2_match rex
				txt.str txt.len 0 txt.len
				CRE2_UNANCHORED matches nmatch)))
	    (when (positive? rv)
	      (cre2_strings_to_ranges txt.str ranges matches nmatch)
	      (let ((R (parse-cre2_range_t ranges nmatch)))
		(print "Full match: ~s\n"
		       (substring txt (list-ref R 0) (list-ref R 1)))
		(print "1st submatch: ~s\n"
		       (substring txt (list-ref R 2) (list-ref R 3)))
		(print "2nd submatch: ~s\n"
		       (substring txt (list-ref R 4) (list-ref R 5)))
		))))
      (cre2_delete rex)
      (cre2_opt_delete opt))))

(define cre2
  (dynamic-link "libcre2.so"))

(define cre2_new
  (let* ((ptr     (dynamic-func "cre2_new" cre2))
	 (callout (pointer->procedure '* ptr (list '* int '*))))
    (lambda (ptn.str ptn.len options)
      (callout ptn.str ptn.len options))))

(define cre2_delete
  (let* ((ptr     (dynamic-func "cre2_delete" cre2))
	 (callout (pointer->procedure void ptr (list '*))))
    (lambda (rex)
      (callout rex))))

(define cre2_opt_new
  (let* ((ptr     (dynamic-func "cre2_opt_new" cre2))
	 (callout (pointer->procedure '* ptr '())))
    (lambda ()
      (callout))))

(define cre2_opt_delete
  (let* ((ptr     (dynamic-func "cre2_opt_delete" cre2))
	 (callout (pointer->procedure void ptr (list '*))))
    (lambda (options)
      (callout options))))

(define cre2_match
  (let* ((ptr     (dynamic-func "cre2_match" cre2))
	 (callout (pointer->procedure
		   int ptr (list '* '* int
				 int int int '* int))))
    (lambda (rex txt.str txt.len txt.start txt.end anchor match nmatch)
      (callout rex
	       txt.str txt.len txt.start txt.end
	       anchor match nmatch))))

(define cre2_strings_to_ranges
  (let* ((ptr     (dynamic-func "cre2_strings_to_ranges" cre2))
	 (callout (pointer->procedure
		   void ptr (list '* '* '* int))))
    (lambda (txt.str ranges strings nmatch)
      (callout txt.str ranges strings nmatch))))

(define CRE2_UNANCHORED 1)

(define (make-cre2_string_t nmatch)
  (do ((i 0 (+ 1 i))
       (T '() (append (list '* int) T))
       (V '() (append (list %null-pointer 0) V)))
      ((= i nmatch)
       (make-c-struct T V))))

(define (make-cre2_range_t nmatch)
  (do ((i 0 (+ 1 i))
       (T '() (append (list long long) T))
       (V '() (append '(0 0) V)))
      ((= i nmatch)
       (make-c-struct T V))))

(define (parse-cre2_string_t S nmatch)
  (do ((i 0 (+ 1 i))
       (T '() (append (list '* int) T)))
      ((= i nmatch)
       (parse-c-struct S T))))

(define (parse-cre2_range_t S nmatch)
  (do ((i 0 (+ 1 i))
       (T '() (append (list long long) T)))
      ((= i nmatch)
       (parse-c-struct S T))))

(define (print template . args)
  (apply format (current-output-port) template args))

(main)

;;; end of file

-- 
Marco Maggi



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: regexp character classes not supported?
  2012-12-28 19:15     ` Mark H Weaver
  2012-12-29  2:24       ` Limbo Peng
@ 2013-01-06 10:47       ` Ian Price
  2013-01-06 12:02         ` Klaus Schilling
  1 sibling, 1 reply; 10+ messages in thread
From: Ian Price @ 2013-01-06 10:47 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Limbo Peng, guile-user

Mark H Weaver <mhw@netris.org> writes:

> If you'd like a more advanced regexp library, and don't mind the fact
> that it will be much slower than Guile's built-in regexps, I recommend
> that you look at Alex Shinn's irregex package:
>
>   http://synthcode.com/scheme/irregex/

It is a very nice library, and guildhall users can obtain it with

  guild install wak-irregex

and require it with (use-modules (wak irregex))

> Irregex is written with Unicode in mind, and supports not only
> perl-style regexps, but also Olin Shivers' SRE (Scheme Regular
> Expression) syntax, which is a far superior notation for complex
> or dynamically-constructed regexps.
+1

-- 
Ian Price -- shift-reset.com

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: regexp character classes not supported?
  2013-01-06 10:47       ` Ian Price
@ 2013-01-06 12:02         ` Klaus Schilling
  2013-01-08 12:32           ` Ian Price
  0 siblings, 1 reply; 10+ messages in thread
From: Klaus Schilling @ 2013-01-06 12:02 UTC (permalink / raw)
  To: ianprice90; +Cc: guile-user, iwinux

From: Ian Price <ianprice90@googlemail.com>
Subject: Re: regexp character classes not supported?
Date: Sun, 06 Jan 2013 10:47:11 +0000

> Mark H Weaver <mhw@netris.org> writes:
> 
> > If you'd like a more advanced regexp library, and don't mind the fact
> > that it will be much slower than Guile's built-in regexps, I recommend
> > that you look at Alex Shinn's irregex package:
> >
> >   http://synthcode.com/scheme/irregex/
> 
> It is a very nice library, and guildhall users can obtain it with
> 
>   guild install wak-irregex
> 
> and require it with (use-modules (wak irregex))
> 
> > Irregex is written with Unicode in mind, and supports not only
> > perl-style regexps, but also Olin Shivers' SRE (Scheme Regular
> > Expression) syntax, which is a far superior notation for complex
> > or dynamically-constructed regexps.

does irregex support SNOBOL-style patterns?

Klaus Schilling



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: regexp character classes not supported?
  2013-01-06 12:02         ` Klaus Schilling
@ 2013-01-08 12:32           ` Ian Price
  2013-01-14 22:41             ` Klaus Schilling
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Price @ 2013-01-08 12:32 UTC (permalink / raw)
  To: Klaus Schilling; +Cc: guile-user

Klaus Schilling <schilling.klaus@web.de> writes:

> does irregex support SNOBOL-style patterns?

I have not used SNOBOL, so I can't really say for sure if irregexes
functionality subsumes it. The documentation itself does not refer to
SNOBOL at all.

-- 
Ian Price -- shift-reset.com

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: regexp character classes not supported?
  2013-01-08 12:32           ` Ian Price
@ 2013-01-14 22:41             ` Klaus Schilling
  0 siblings, 0 replies; 10+ messages in thread
From: Klaus Schilling @ 2013-01-14 22:41 UTC (permalink / raw)
  To: ianprice90; +Cc: guile-user

From: Ian Price <ianprice90@googlemail.com>
Subject: Re: regexp character classes not supported?
Date: Tue, 08 Jan 2013 12:32:02 +0000

> Klaus Schilling <schilling.klaus@web.de> writes:
> 
> > does irregex support SNOBOL-style patterns?
> 
> I have not used SNOBOL, so I can't really say for sure if irregexes
> functionality subsumes it. The documentation itself does not refer to
> SNOBOL at all.
> 

Snobol's pattern are said to be equivalent to recursive descent parsers
with backtracking or to context free grammers.

Klaus Schilling



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-01-14 22:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-28 16:39 regexp character classes not supported? Limbo Peng
2012-12-28 17:22 ` Mark H Weaver
2012-12-28 17:41   ` Limbo Peng
2012-12-28 19:15     ` Mark H Weaver
2012-12-29  2:24       ` Limbo Peng
2013-01-06 10:47       ` Ian Price
2013-01-06 12:02         ` Klaus Schilling
2013-01-08 12:32           ` Ian Price
2013-01-14 22:41             ` Klaus Schilling
2012-12-30 11:38     ` Marco Maggi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).