unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* regex-case
@ 2016-02-06 19:13 Matt Wette
  2016-02-06 19:23 ` regex-case Matt Wette
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Matt Wette @ 2016-02-06 19:13 UTC (permalink / raw
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]

I have always missed the ease provided by Perl in throwing a string at a list of regular expressions.   I have thought it would be nice if the (ice-9 regex) module would provide something comparable .   So I started work on a macro “regex-case”.    Code attached.
Comments on syntax appreciated. — Matt

=== test ================
(define str "foo")

 (regex-case str
   (("^([a-z]+)\\(([0-9]+)\\)$" v i)
    (list v i))
   (("^([a-z]+)$" v)
    (list v "1”)))
=>
(“foo” “1”)


=== syntax ==============
(regex-case <string> 
 ((<pattern> <var> <var> …) <body>)
 ((<pattern> <var> <var> …) <body>)
 (else <body>)

Where <pattern> is a string form of a regular expression, <var> … are variables that are bound to the matched subexpressions, and <body> is a list of expressions.  The return is the last expression of the matched case.

=== expansion ===========
The example shown above expands to:
(let ((t-292 (make-regexp "^([a-z]+)\\(([0-9]+)\\)$"))
      (t-293 (make-regexp "^([a-z]+)$")))
  (cond ((regexp-exec t-292 str)
         =>
         (lambda (m)
           (let ((v (match:substring m 1))
                 (i (match:substring m 2)))
             (list v i))))
        ((regexp-exec t-293 str)
         =>
         (lambda (m)
           (let ((v (match:substring m 1))) (list v "1"))))))

I was thinking the above expansion has some chance (if it lives in the regex module?) to memoize the make-regexp part during optimization.  

If not a macro could be written to generate a match function which can memoize the make-regexp part.
(define regex-matcher foo ((<pattern> …) 
=> 
(define (let ((t-123 (make-regex <pattern>)) …) (lambda (str) (cond ((regexp-exec t-123 str) ...



[-- Attachment #2: regex-case.scm --]
[-- Type: application/octet-stream, Size: 1511 bytes --]

;; v160206b - M.Wette

;;; Copyright (C) 2016 Matthew R. Wette
;;;
;;; This library is free software; you can redistribute it and/or
;;; modify it under the terms of the GNU Lesser General Public
;;; License as published by the Free Software Foundation; either
;;; version 3 of the License, or (at your option) any later version.

(use-modules (ice-9 pretty-print))
(use-modules (ice-9 regex))

;; helper macro for regex-case
;; (rx-let m (v ...) exp ...) => (let ((v (match:substring m 1)) ...) exp ...)
(define-syntax rx-let
  (lambda (x)
    (syntax-case x ()
      ((_ m (v ...) exp ...)
       (with-syntax (((i ...)		; fold (v ...) to (1 ...)
		      (let f ((il '()) (n 1) (vl #'(v ...)))
			(if (null? vl) (reverse il)
			    (f (cons n il) (1+ n) (cdr vl))))))
	 #'(let ((v (match:substring m i)) ...) exp ...))))))

;; @example
;; (regex-case str
;;  (("([a-z]+)" v) `(lower ,v))
;;  (("([A-Z]+)" v) `(upper ,v))
;;  (else (error "yuck")))
;; @end example
(define-syntax regex-case
  (lambda (x)
    (syntax-case x (else)
      ((_ str ((pat v ...) exp ...) ...)
       (with-syntax (((id ...) (generate-temporaries #'(pat ...))))
	 #'(let ((id (make-regexp pat)) ...)
	     (cond
	      ((regexp-exec id str) =>
	       (lambda (m) (rx-let m (v ...) exp ...)))
	      ...))
	 ))
      ;; todo: pattern with "else"
      )))

(define str "foo")
(write
 (regex-case str
   (("^([a-z]+)\\(([0-9]+)\\)$" v i)
    (list v i))
   (("^([a-z]+)$" v)
    (list v "1"))
   )
 )
(newline)

;; --- last line ---

[-- Attachment #3: Type: text/plain, Size: 3 bytes --]





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: regex-case
  2016-02-06 19:13 regex-case Matt Wette
@ 2016-02-06 19:23 ` Matt Wette
  2016-02-06 19:49 ` regex-case Marko Rauhamaa
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Matt Wette @ 2016-02-06 19:23 UTC (permalink / raw
  To: guile-user


> On Feb 6, 2016, at 11:13 AM, Matt Wette <matthew.wette@verizon.net> wrote:
> If not a macro could be written to generate a match function which can memoize the make-regexp part.
> (define regex-matcher foo ((<pattern> …) 
> => 
> (define (let ((t-123 (make-regex <pattern>)) …) (lambda (str) (cond ((regexp-exec t-123 str) ...

oops.  Should read:

(define-regex-matcher foo ((<pattern> …)
=>
(define foo 
   (let ((t-123 (make-regexp <pattern>)) …) 
      (lambda (str) 
        (cond ((regexp-exec t-123 str) … (else <body>))))) 





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: regex-case
  2016-02-06 19:13 regex-case Matt Wette
  2016-02-06 19:23 ` regex-case Matt Wette
@ 2016-02-06 19:49 ` Marko Rauhamaa
  2016-02-06 22:42   ` regex-case Matt Wette
  2016-02-06 22:10 ` regex-case Matt Wette
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: Marko Rauhamaa @ 2016-02-06 19:49 UTC (permalink / raw
  To: Matt Wette; +Cc: guile-user

Matt Wette <matthew.wette@verizon.net>:

> Comments on syntax appreciated. — Matt
>
> === test ================
> (define str "foo")
>
>  (regex-case str
>    (("^([a-z]+)\\(([0-9]+)\\)$" v i)
>     (list v i))
>    (("^([a-z]+)$" v)
>     (list v "1”)))
> =>
> (“foo” “1”)
>
>
> === syntax ==============
> (regex-case <string> 
>  ((<pattern> <var> <var> …) <body>)
>  ((<pattern> <var> <var> …) <body>)
>  (else <body>)

Seems like a great idea, especially since the compilation of the regular
expression can be done at compile-time.

Only two additions would be needed to make it better:

 [1] Python's named substrings: (?P<name>...)
     (<URL: https://docs.python.org/3/library/re.html?highlight=regex#reg
     ular-expression-syntax>)

 [2] Seamless constant string concatenation as in C:

     #define PREFIX "..."
     #define MIDDLE "..."
     #define SUFFIX "..."
     ...
     {
         int status = regcomp(&reg, PREFIX MIDDLE SUFFIX, 0);
     }

Now, I understand [1] is not in your hands, but named substrings are
essential in the understandability and maintainability of regular
expression code.

You might be able to do something about [2]. Without that capacity,
regular expressions might turn into kilometer-long lines or annoying
(string-concatenate) calls.

> I was thinking the above expansion has some chance (if it lives in the
> regex module?) to memoize the make-regexp part during optimization.

That would be crucial, I'm thinking.


Marko



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: regex-case
  2016-02-06 19:13 regex-case Matt Wette
  2016-02-06 19:23 ` regex-case Matt Wette
  2016-02-06 19:49 ` regex-case Marko Rauhamaa
@ 2016-02-06 22:10 ` Matt Wette
  2016-02-08 14:29 ` regex-case Ludovic Courtès
  2016-02-11  1:19 ` regex-case Matt Wette
  4 siblings, 0 replies; 8+ messages in thread
From: Matt Wette @ 2016-02-06 22:10 UTC (permalink / raw
  To: Matthew Wette; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 1778 bytes --]


> On Feb 6, 2016, at 11:13 AM, Matt Wette <matthew.wette@verizon.net> wrote:
> 
> I have always missed the ease provided by Perl in throwing a string at a list of regular expressions.   I have thought it would be nice if the (ice-9 regex) module would provide something comparable .   So I started work on a macro “regex-case”.    Code attached.
> Comments on syntax appreciated. — Matt

I have added the else case and cleaned up the fold in rx-let.  New code attached, and echoed partial here:

;;; Copyright (C) 2016 Matthew R. Wette
;;;
;;; This library is free software; you can redistribute it and/or
;;; modify it under the terms of the GNU Lesser General Public
;;; License as published by the Free Software Foundation; either
;;; version 3 of the License, or (at your option) any later version.

(define-syntax rx-let
  (lambda (x)
    (syntax-case x ()
      ((_ m (v ...) exp ...)
       (with-syntax (((i ...)
		      (let f ((n 1) (vl #'(v ...))) ; fold (v ...) to (1 ...)
			(if (null? vl) '() (cons n (f (1+ n) (cdr vl)))))))
	 #'(let ((v (match:substring m i)) ...) exp ...))))))

(define-syntax regex-case
  (lambda (x)
    (syntax-case x (else)
      ((_ str ((pat v ...) exp ...) ...)
       (with-syntax (((id ...) (generate-temporaries #'(pat ...))))
	 #'(let ((id (make-regexp pat)) ...)
	     (cond
	      ((regexp-exec id str) =>
	       (lambda (m) (rx-let m (v ...) exp ...)))
	      ...))))
      ((_ str ((pat v ...) exp ...) ... (else else-exp ...))
       (with-syntax (((id ...) (generate-temporaries #'(pat ...))))
	 #'(let ((id (make-regexp pat)) ...)
	     (cond
	      ((regexp-exec id str) =>
	       (lambda (m) (rx-let m (v ...) exp ...)))
	      ...
	     (else else-exp ...)))))
      )))

[-- Attachment #2: regex-case.scm --]
[-- Type: application/octet-stream, Size: 1817 bytes --]

;; v160206c - M.Wette

;;; Copyright (C) 2016 Matthew R. Wette
;;;
;;; This library is free software; you can redistribute it and/or
;;; modify it under the terms of the GNU Lesser General Public
;;; License as published by the Free Software Foundation; either
;;; version 3 of the License, or (at your option) any later version.

(use-modules (ice-9 pretty-print))
(use-modules (ice-9 regex))

;; helper macro for regex-case
;; (rx-let m (v ...) exp ...) => (let ((v (match:substring m 1)) ...) exp ...)
(define-syntax rx-let
  (lambda (x)
    (syntax-case x ()
      ((_ m (v ...) exp ...)
       (with-syntax (((i ...)
		      (let f ((n 1) (vl #'(v ...))) ; fold (v ...) to (1 ...)
			(if (null? vl) '() (cons n (f (1+ n) (cdr vl)))))))
	 #'(let ((v (match:substring m i)) ...) exp ...))))))

;; @example
;; (regex-case str
;;  (("([a-z]+)" v) `(lower ,v))
;;  (("([A-Z]+)" v) `(upper ,v))
;;  (else (error "yuck")))
;; @end example
(define-syntax regex-case
  (lambda (x)
    (syntax-case x (else)
      ((_ str ((pat v ...) exp ...) ...)
       (with-syntax (((id ...) (generate-temporaries #'(pat ...))))
	 #'(let ((id (make-regexp pat)) ...)
	     (cond
	      ((regexp-exec id str) =>
	       (lambda (m) (rx-let m (v ...) exp ...)))
	      ...))))
      ;; todo: pattern with "else"
      ((_ str ((pat v ...) exp ...) ... (else else-exp ...))
       (with-syntax (((id ...) (generate-temporaries #'(pat ...))))
	 #'(let ((id (make-regexp pat)) ...)
	     (cond
	      ((regexp-exec id str) =>
	       (lambda (m) (rx-let m (v ...) exp ...)))
	      ...
	     (else else-exp ...)))))
      )))

(define str "foo(3)")
(write
 (regex-case str
   (("^([a-z]+)\\(([0-9]+)\\)$" v i)
    (list v i))
   (("^([a-z]+)$" v)
    (list v "1"))
   (else
    (error "not found"))
   )
 )
(newline)

;; --- last line ---

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: regex-case
  2016-02-06 19:49 ` regex-case Marko Rauhamaa
@ 2016-02-06 22:42   ` Matt Wette
  2016-02-07  8:15     ` regex-case Marko Rauhamaa
  0 siblings, 1 reply; 8+ messages in thread
From: Matt Wette @ 2016-02-06 22:42 UTC (permalink / raw
  To: guile-user


> On Feb 6, 2016, at 11:49 AM, Marko Rauhamaa <marko@pacujo.net> wrote:

> Only two additions would be needed to make it better:
> 
> [1] Python's named substrings: (?P<name>...)
>     (<URL: https://docs.python.org/3/library/re.html?highlight=regex#reg
>     ular-expression-syntax>)
> 
> [2] Seamless constant string concatenation as in C:
> 
>     #define PREFIX "..."
>     #define MIDDLE "..."
>     #define SUFFIX "..."
>     ...
>     {
>         int status = regcomp(&reg, PREFIX MIDDLE SUFFIX, 0);
>     }
> 

[1] will be tough IMO because it is not supported by the underlying regexp library used by Guile.

[2] may be possible if it is supported by the Guile regexp library.  But I’m not sure there is a clean way to do this, given that syntax-case bindings are lexical.

Matt





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: regex-case
  2016-02-06 22:42   ` regex-case Matt Wette
@ 2016-02-07  8:15     ` Marko Rauhamaa
  0 siblings, 0 replies; 8+ messages in thread
From: Marko Rauhamaa @ 2016-02-07  8:15 UTC (permalink / raw
  To: Matt Wette; +Cc: guile-user

Matt Wette <matthew.wette@verizon.net>:

> [2] may be possible if it is supported by the Guile regexp library.
> But I’m not sure there is a clean way to do this, given that
> syntax-case bindings are lexical.

Additionally, you have a problem with the different regexp flags
(newline semantics, case-sensitivity etc).

Maybe it's best to keep your original idea and keep it simple.


Marko



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: regex-case
  2016-02-06 19:13 regex-case Matt Wette
                   ` (2 preceding siblings ...)
  2016-02-06 22:10 ` regex-case Matt Wette
@ 2016-02-08 14:29 ` Ludovic Courtès
  2016-02-11  1:19 ` regex-case Matt Wette
  4 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2016-02-08 14:29 UTC (permalink / raw
  To: guile-user

Matt Wette <matthew.wette@verizon.net> skribis:

>  (regex-case str
>    (("^([a-z]+)\\(([0-9]+)\\)$" v i)
>     (list v i))
>    (("^([a-z]+)$" v)
>     (list v "1”)))

Sounds useful and convenient!

> (let ((t-292 (make-regexp "^([a-z]+)\\(([0-9]+)\\)$"))
>       (t-293 (make-regexp "^([a-z]+)$")))
>   (cond ((regexp-exec t-292 str)
>          =>
>          (lambda (m)
>            (let ((v (match:substring m 1))
>                  (i (match:substring m 2)))
>              (list v i))))
>         ((regexp-exec t-293 str)
>          =>
>          (lambda (m)
>            (let ((v (match:substring m 1))) (list v "1"))))))

When the ‘else’ clause is missing, I think it would be best to throw an
error like ‘match’ does—it’s rarely helpful to return #unspecified in
those cases.

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: regex-case
  2016-02-06 19:13 regex-case Matt Wette
                   ` (3 preceding siblings ...)
  2016-02-08 14:29 ` regex-case Ludovic Courtès
@ 2016-02-11  1:19 ` Matt Wette
  4 siblings, 0 replies; 8+ messages in thread
From: Matt Wette @ 2016-02-11  1:19 UTC (permalink / raw
  To: guile-user


> On Feb 6, 2016, at 11:13 AM, Matt Wette <matthew.wette@verizon.net> wrote:
> 
> I have always missed the ease provided by Perl in throwing a string at a list of regular expressions.   I have thought it would be nice if the (ice-9 regex) module would provide something comparable .   So I started work on a macro “regex-case”.    Code attached.
> Comments on syntax appreciated. — Matt
> 

> I was thinking the above expansion has some chance (if it lives in the regex module?) to memoize the make-regexp part during optimization.  

I am going to try to optimize by using eval-when and narrowing the syntax to use only constant strings for the case items.

I will post update if I can get it working.

Matt




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-02-11  1:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-06 19:13 regex-case Matt Wette
2016-02-06 19:23 ` regex-case Matt Wette
2016-02-06 19:49 ` regex-case Marko Rauhamaa
2016-02-06 22:42   ` regex-case Matt Wette
2016-02-07  8:15     ` regex-case Marko Rauhamaa
2016-02-06 22:10 ` regex-case Matt Wette
2016-02-08 14:29 ` regex-case Ludovic Courtès
2016-02-11  1:19 ` regex-case Matt Wette

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).