* regex-case
@ 2016-02-06 19:13 Matt Wette
2016-02-06 19:23 ` regex-case Matt Wette
` (4 more replies)
0 siblings, 5 replies; 8+ messages in thread
From: Matt Wette @ 2016-02-06 19:13 UTC (permalink / raw)
To: guile-user
[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]
I have always missed the ease provided by Perl in throwing a string at a list of regular expressions. I have thought it would be nice if the (ice-9 regex) module would provide something comparable . So I started work on a macro “regex-case”. Code attached.
Comments on syntax appreciated. — Matt
=== test ================
(define str "foo")
(regex-case str
(("^([a-z]+)\\(([0-9]+)\\)$" v i)
(list v i))
(("^([a-z]+)$" v)
(list v "1”)))
=>
(“foo” “1”)
=== syntax ==============
(regex-case <string>
((<pattern> <var> <var> …) <body>)
((<pattern> <var> <var> …) <body>)
(else <body>)
Where <pattern> is a string form of a regular expression, <var> … are variables that are bound to the matched subexpressions, and <body> is a list of expressions. The return is the last expression of the matched case.
=== expansion ===========
The example shown above expands to:
(let ((t-292 (make-regexp "^([a-z]+)\\(([0-9]+)\\)$"))
(t-293 (make-regexp "^([a-z]+)$")))
(cond ((regexp-exec t-292 str)
=>
(lambda (m)
(let ((v (match:substring m 1))
(i (match:substring m 2)))
(list v i))))
((regexp-exec t-293 str)
=>
(lambda (m)
(let ((v (match:substring m 1))) (list v "1"))))))
I was thinking the above expansion has some chance (if it lives in the regex module?) to memoize the make-regexp part during optimization.
If not a macro could be written to generate a match function which can memoize the make-regexp part.
(define regex-matcher foo ((<pattern> …)
=>
(define (let ((t-123 (make-regex <pattern>)) …) (lambda (str) (cond ((regexp-exec t-123 str) ...
[-- Attachment #2: regex-case.scm --]
[-- Type: application/octet-stream, Size: 1511 bytes --]
;; v160206b - M.Wette
;;; Copyright (C) 2016 Matthew R. Wette
;;;
;;; This library is free software; you can redistribute it and/or
;;; modify it under the terms of the GNU Lesser General Public
;;; License as published by the Free Software Foundation; either
;;; version 3 of the License, or (at your option) any later version.
(use-modules (ice-9 pretty-print))
(use-modules (ice-9 regex))
;; helper macro for regex-case
;; (rx-let m (v ...) exp ...) => (let ((v (match:substring m 1)) ...) exp ...)
(define-syntax rx-let
(lambda (x)
(syntax-case x ()
((_ m (v ...) exp ...)
(with-syntax (((i ...) ; fold (v ...) to (1 ...)
(let f ((il '()) (n 1) (vl #'(v ...)))
(if (null? vl) (reverse il)
(f (cons n il) (1+ n) (cdr vl))))))
#'(let ((v (match:substring m i)) ...) exp ...))))))
;; @example
;; (regex-case str
;; (("([a-z]+)" v) `(lower ,v))
;; (("([A-Z]+)" v) `(upper ,v))
;; (else (error "yuck")))
;; @end example
(define-syntax regex-case
(lambda (x)
(syntax-case x (else)
((_ str ((pat v ...) exp ...) ...)
(with-syntax (((id ...) (generate-temporaries #'(pat ...))))
#'(let ((id (make-regexp pat)) ...)
(cond
((regexp-exec id str) =>
(lambda (m) (rx-let m (v ...) exp ...)))
...))
))
;; todo: pattern with "else"
)))
(define str "foo")
(write
(regex-case str
(("^([a-z]+)\\(([0-9]+)\\)$" v i)
(list v i))
(("^([a-z]+)$" v)
(list v "1"))
)
)
(newline)
;; --- last line ---
[-- Attachment #3: Type: text/plain, Size: 3 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: regex-case
2016-02-06 19:13 regex-case Matt Wette
@ 2016-02-06 19:23 ` Matt Wette
2016-02-06 19:49 ` regex-case Marko Rauhamaa
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Matt Wette @ 2016-02-06 19:23 UTC (permalink / raw)
To: guile-user
> On Feb 6, 2016, at 11:13 AM, Matt Wette <matthew.wette@verizon.net> wrote:
> If not a macro could be written to generate a match function which can memoize the make-regexp part.
> (define regex-matcher foo ((<pattern> …)
> =>
> (define (let ((t-123 (make-regex <pattern>)) …) (lambda (str) (cond ((regexp-exec t-123 str) ...
oops. Should read:
(define-regex-matcher foo ((<pattern> …)
=>
(define foo
(let ((t-123 (make-regexp <pattern>)) …)
(lambda (str)
(cond ((regexp-exec t-123 str) … (else <body>)))))
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: regex-case
2016-02-06 19:13 regex-case Matt Wette
2016-02-06 19:23 ` regex-case Matt Wette
@ 2016-02-06 19:49 ` Marko Rauhamaa
2016-02-06 22:42 ` regex-case Matt Wette
2016-02-06 22:10 ` regex-case Matt Wette
` (2 subsequent siblings)
4 siblings, 1 reply; 8+ messages in thread
From: Marko Rauhamaa @ 2016-02-06 19:49 UTC (permalink / raw)
To: Matt Wette; +Cc: guile-user
Matt Wette <matthew.wette@verizon.net>:
> Comments on syntax appreciated. — Matt
>
> === test ================
> (define str "foo")
>
> (regex-case str
> (("^([a-z]+)\\(([0-9]+)\\)$" v i)
> (list v i))
> (("^([a-z]+)$" v)
> (list v "1”)))
> =>
> (“foo” “1”)
>
>
> === syntax ==============
> (regex-case <string>
> ((<pattern> <var> <var> …) <body>)
> ((<pattern> <var> <var> …) <body>)
> (else <body>)
Seems like a great idea, especially since the compilation of the regular
expression can be done at compile-time.
Only two additions would be needed to make it better:
[1] Python's named substrings: (?P<name>...)
(<URL: https://docs.python.org/3/library/re.html?highlight=regex#reg
ular-expression-syntax>)
[2] Seamless constant string concatenation as in C:
#define PREFIX "..."
#define MIDDLE "..."
#define SUFFIX "..."
...
{
int status = regcomp(®, PREFIX MIDDLE SUFFIX, 0);
}
Now, I understand [1] is not in your hands, but named substrings are
essential in the understandability and maintainability of regular
expression code.
You might be able to do something about [2]. Without that capacity,
regular expressions might turn into kilometer-long lines or annoying
(string-concatenate) calls.
> I was thinking the above expansion has some chance (if it lives in the
> regex module?) to memoize the make-regexp part during optimization.
That would be crucial, I'm thinking.
Marko
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: regex-case
2016-02-06 19:49 ` regex-case Marko Rauhamaa
@ 2016-02-06 22:42 ` Matt Wette
2016-02-07 8:15 ` regex-case Marko Rauhamaa
0 siblings, 1 reply; 8+ messages in thread
From: Matt Wette @ 2016-02-06 22:42 UTC (permalink / raw)
To: guile-user
> On Feb 6, 2016, at 11:49 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Only two additions would be needed to make it better:
>
> [1] Python's named substrings: (?P<name>...)
> (<URL: https://docs.python.org/3/library/re.html?highlight=regex#reg
> ular-expression-syntax>)
>
> [2] Seamless constant string concatenation as in C:
>
> #define PREFIX "..."
> #define MIDDLE "..."
> #define SUFFIX "..."
> ...
> {
> int status = regcomp(®, PREFIX MIDDLE SUFFIX, 0);
> }
>
[1] will be tough IMO because it is not supported by the underlying regexp library used by Guile.
[2] may be possible if it is supported by the Guile regexp library. But I’m not sure there is a clean way to do this, given that syntax-case bindings are lexical.
Matt
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: regex-case
2016-02-06 22:42 ` regex-case Matt Wette
@ 2016-02-07 8:15 ` Marko Rauhamaa
0 siblings, 0 replies; 8+ messages in thread
From: Marko Rauhamaa @ 2016-02-07 8:15 UTC (permalink / raw)
To: Matt Wette; +Cc: guile-user
Matt Wette <matthew.wette@verizon.net>:
> [2] may be possible if it is supported by the Guile regexp library.
> But I’m not sure there is a clean way to do this, given that
> syntax-case bindings are lexical.
Additionally, you have a problem with the different regexp flags
(newline semantics, case-sensitivity etc).
Maybe it's best to keep your original idea and keep it simple.
Marko
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: regex-case
2016-02-06 19:13 regex-case Matt Wette
2016-02-06 19:23 ` regex-case Matt Wette
2016-02-06 19:49 ` regex-case Marko Rauhamaa
@ 2016-02-06 22:10 ` Matt Wette
2016-02-08 14:29 ` regex-case Ludovic Courtès
2016-02-11 1:19 ` regex-case Matt Wette
4 siblings, 0 replies; 8+ messages in thread
From: Matt Wette @ 2016-02-06 22:10 UTC (permalink / raw)
To: Matthew Wette; +Cc: guile-user
[-- Attachment #1: Type: text/plain, Size: 1778 bytes --]
> On Feb 6, 2016, at 11:13 AM, Matt Wette <matthew.wette@verizon.net> wrote:
>
> I have always missed the ease provided by Perl in throwing a string at a list of regular expressions. I have thought it would be nice if the (ice-9 regex) module would provide something comparable . So I started work on a macro “regex-case”. Code attached.
> Comments on syntax appreciated. — Matt
I have added the else case and cleaned up the fold in rx-let. New code attached, and echoed partial here:
;;; Copyright (C) 2016 Matthew R. Wette
;;;
;;; This library is free software; you can redistribute it and/or
;;; modify it under the terms of the GNU Lesser General Public
;;; License as published by the Free Software Foundation; either
;;; version 3 of the License, or (at your option) any later version.
(define-syntax rx-let
(lambda (x)
(syntax-case x ()
((_ m (v ...) exp ...)
(with-syntax (((i ...)
(let f ((n 1) (vl #'(v ...))) ; fold (v ...) to (1 ...)
(if (null? vl) '() (cons n (f (1+ n) (cdr vl)))))))
#'(let ((v (match:substring m i)) ...) exp ...))))))
(define-syntax regex-case
(lambda (x)
(syntax-case x (else)
((_ str ((pat v ...) exp ...) ...)
(with-syntax (((id ...) (generate-temporaries #'(pat ...))))
#'(let ((id (make-regexp pat)) ...)
(cond
((regexp-exec id str) =>
(lambda (m) (rx-let m (v ...) exp ...)))
...))))
((_ str ((pat v ...) exp ...) ... (else else-exp ...))
(with-syntax (((id ...) (generate-temporaries #'(pat ...))))
#'(let ((id (make-regexp pat)) ...)
(cond
((regexp-exec id str) =>
(lambda (m) (rx-let m (v ...) exp ...)))
...
(else else-exp ...)))))
)))
[-- Attachment #2: regex-case.scm --]
[-- Type: application/octet-stream, Size: 1817 bytes --]
;; v160206c - M.Wette
;;; Copyright (C) 2016 Matthew R. Wette
;;;
;;; This library is free software; you can redistribute it and/or
;;; modify it under the terms of the GNU Lesser General Public
;;; License as published by the Free Software Foundation; either
;;; version 3 of the License, or (at your option) any later version.
(use-modules (ice-9 pretty-print))
(use-modules (ice-9 regex))
;; helper macro for regex-case
;; (rx-let m (v ...) exp ...) => (let ((v (match:substring m 1)) ...) exp ...)
(define-syntax rx-let
(lambda (x)
(syntax-case x ()
((_ m (v ...) exp ...)
(with-syntax (((i ...)
(let f ((n 1) (vl #'(v ...))) ; fold (v ...) to (1 ...)
(if (null? vl) '() (cons n (f (1+ n) (cdr vl)))))))
#'(let ((v (match:substring m i)) ...) exp ...))))))
;; @example
;; (regex-case str
;; (("([a-z]+)" v) `(lower ,v))
;; (("([A-Z]+)" v) `(upper ,v))
;; (else (error "yuck")))
;; @end example
(define-syntax regex-case
(lambda (x)
(syntax-case x (else)
((_ str ((pat v ...) exp ...) ...)
(with-syntax (((id ...) (generate-temporaries #'(pat ...))))
#'(let ((id (make-regexp pat)) ...)
(cond
((regexp-exec id str) =>
(lambda (m) (rx-let m (v ...) exp ...)))
...))))
;; todo: pattern with "else"
((_ str ((pat v ...) exp ...) ... (else else-exp ...))
(with-syntax (((id ...) (generate-temporaries #'(pat ...))))
#'(let ((id (make-regexp pat)) ...)
(cond
((regexp-exec id str) =>
(lambda (m) (rx-let m (v ...) exp ...)))
...
(else else-exp ...)))))
)))
(define str "foo(3)")
(write
(regex-case str
(("^([a-z]+)\\(([0-9]+)\\)$" v i)
(list v i))
(("^([a-z]+)$" v)
(list v "1"))
(else
(error "not found"))
)
)
(newline)
;; --- last line ---
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: regex-case
2016-02-06 19:13 regex-case Matt Wette
` (2 preceding siblings ...)
2016-02-06 22:10 ` regex-case Matt Wette
@ 2016-02-08 14:29 ` Ludovic Courtès
2016-02-11 1:19 ` regex-case Matt Wette
4 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2016-02-08 14:29 UTC (permalink / raw)
To: guile-user
Matt Wette <matthew.wette@verizon.net> skribis:
> (regex-case str
> (("^([a-z]+)\\(([0-9]+)\\)$" v i)
> (list v i))
> (("^([a-z]+)$" v)
> (list v "1”)))
Sounds useful and convenient!
> (let ((t-292 (make-regexp "^([a-z]+)\\(([0-9]+)\\)$"))
> (t-293 (make-regexp "^([a-z]+)$")))
> (cond ((regexp-exec t-292 str)
> =>
> (lambda (m)
> (let ((v (match:substring m 1))
> (i (match:substring m 2)))
> (list v i))))
> ((regexp-exec t-293 str)
> =>
> (lambda (m)
> (let ((v (match:substring m 1))) (list v "1"))))))
When the ‘else’ clause is missing, I think it would be best to throw an
error like ‘match’ does—it’s rarely helpful to return #unspecified in
those cases.
Ludo’.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: regex-case
2016-02-06 19:13 regex-case Matt Wette
` (3 preceding siblings ...)
2016-02-08 14:29 ` regex-case Ludovic Courtès
@ 2016-02-11 1:19 ` Matt Wette
4 siblings, 0 replies; 8+ messages in thread
From: Matt Wette @ 2016-02-11 1:19 UTC (permalink / raw)
To: guile-user
> On Feb 6, 2016, at 11:13 AM, Matt Wette <matthew.wette@verizon.net> wrote:
>
> I have always missed the ease provided by Perl in throwing a string at a list of regular expressions. I have thought it would be nice if the (ice-9 regex) module would provide something comparable . So I started work on a macro “regex-case”. Code attached.
> Comments on syntax appreciated. — Matt
>
> I was thinking the above expansion has some chance (if it lives in the regex module?) to memoize the make-regexp part during optimization.
I am going to try to optimize by using eval-when and narrowing the syntax to use only constant strings for the case items.
I will post update if I can get it working.
Matt
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-02-11 1:19 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-06 19:13 regex-case Matt Wette
2016-02-06 19:23 ` regex-case Matt Wette
2016-02-06 19:49 ` regex-case Marko Rauhamaa
2016-02-06 22:42 ` regex-case Matt Wette
2016-02-07 8:15 ` regex-case Marko Rauhamaa
2016-02-06 22:10 ` regex-case Matt Wette
2016-02-08 14:29 ` regex-case Ludovic Courtès
2016-02-11 1:19 ` regex-case Matt Wette
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).