From: Tom Lord <lord@emf.net>
Cc: guile-user@gnu.org
Subject: Re: Stupid module and pregexp questions
Date: Fri, 24 Oct 2003 15:30:05 -0700 (PDT) [thread overview]
Message-ID: <200310242230.PAA03910@morrowfield.regexps.com> (raw)
In-Reply-To: <E1AD9VA-0007oD-00@surf.glug.org> (message from Thien-Thi Nguyen on Fri, 24 Oct 2003 23:29:44 +0200)
> From: Thien-Thi Nguyen <ttn@surf.glug.org>
> From: Tom Lord <lord@emf.net>
> Date: Tue, 29 Apr 2003 23:31:41 -0700 (PDT)
Wow, challenging my memory, eh?
> Heck, they were actively excised -- apparently by virtue of some
> (sorry, folks) misguided reasoning about the cleanliness of their
> semantics.
That was a comment I made about the removal of shared substrings.
I was delighted when 1.6.4 gave me:
guile> make-shared-substring
#<primitive-procedure make-shared-substring>
though perturbed that the source code says:
#if SCM_DEBUG_DEPRECATED == 0
[...]
SCM_DEFINE (scm_make_shared_substring, "make-shared-substring", 1, 2, 0,
I can try to write up a "case for shared substrings" if that would be
helpful.
> in guile 1.4.1.96 you can do `(use-modules (lang librgx))' to try out
> librx re-integration. it's even documented to some extent in the manual.
> below is some work-in-progress flex envy slated for 1.4.2 based on rx...
Yikes. I'm scared to ask what version of Rx you are using. You
_should_ (really) be using the latest and greatest in libhackerlab,
which is not currently in release. However, by shocking
coincidence, I was just today semi-preparing to set-up a savannah
libhackerlab project and get it back out there (separately from arch,
in which it happens to be included).
A nice side effect of that: systas (which I'm not planning on
re-releasing anytime soon but which is trivially available in my
public archives) has a nice libsystas binding for the latest and
greatest rx. It'd probably take like 2hrs at most to port it to
guile.
Nifty code sample from systas:
(define-public sans-leading-blanks
(structured-regexp->procedure `(^ (* ([] blank))) :pick-spec '>))
Defines a procedure that takes a string, compares it to the given
regexp, and returns a shared substring of that string. The
`pick-spec' says _which_ shared substring to return. ">" means,
return the shared substring that begins at the first character after
the match.
-t
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Structured Regular Expressions
[with apologies to Olin Shivers]
;;;
;;;
;;; A structured regexp is a recursively defined list structure.
;;; The general form is:
;;;
;;; structured-regexp := (<operator> <parameter> ...)
;;; parameter := <integer>
;;; | <character>
;;; | <string>
;;; | <keyword>
;;; | <structured-regexp>
;;;
;;; The valid operators are:
;;;
;;; operator := const ; a string constant
;;; | any ; any character
;;; | [] ; character set
;;; | [^] ; negated character set
;;; | ^ ; start anchor
;;; | $ ; end anchor
;;; | ? ; optional sub-expression
;;; | * ; repeated sub-expression
;;; | + ; non-empty, repeated sub-expression
;;; | {} ; a counted sub-expression
;;; | = ; parenthesized subexpression
;;; | & ; sub-expression concatenation
;;; | | ; alternative sub-expressions
;;; | @ ; parenthesized subexpression back-reference
;;; | / ; the "cut" operator
;;; | ! ; the symbolicly labeled "cut" operator
;;;
;;; As a short-hand, some structured regexps can be abbreviated:
;;;
;;; (const "string") == "string"
;;; (* any) == *.
;;; (^ ($ subexp)) == (^$ subexp)
;;;
;;; Each operator has its own syntax, so the precise syntax of a structured
;;; regexp is:
;;;
;;; structured-regexp := (const <string>)
;;; | ([] <character-set-element> ...)
;;; | ([^] <character-set-element> ...)
;;; | (^ <structured-regexp> ...)
;;; | ($ <structured-regexp> ...)
;;; | (? <structured-regexp> ...)
;;; | (* <structured-regexp> ...)
;;; | (+ <structured-regexp> ...)
;;; | ({} <integer> <integer> <structured-regexp> ...)
;;; | (& <structured-regexp> ...)
;;; | (| <structured-regexp> ...)
;;; | (= [<subexpression-label>] <structured-regexp> ...)
;;; | (@ <subexpression-label>)
;;; | (/ <integer>)
;;; | (! [<cut-label>] <structured-regexp> ...)
;;;
;;; character-set-element := string
;;; | character
;;; | (character . character) ; a range of characters
;;; | <character-set> ; see the `(standard char-set-lib)' module
;;;
;;; subexpresion-label := <keyword> ; (a keyword)
;;; cut-label := <keyword> ; (a keyword)
;;;
;;; A `pick-spec' specifies values to be returned from `regexec' or a
;;; procedure returned by `regexec-function'. It has the form:
;;;
;;; pick-spec := #f ; return #t if a match is found, #f otherwise
;;;
;;; | #t ; return #f or a list `(before match after)'
;;; ; that is the partition of the string implied
;;; ; by a successful match
;;;
;;; | <recursive-pick-spec>
;;;
;;;
;;; A `recursive-pick-spec' is:
;;;
;;; recursive-pick-spec := <rps-elt> ; return only the value implied by `rps-elt'
;;; | (<rps-elt> ...) ; return a list of values implied by
;;; ; the list of `rps-elt's.
;;;
;;; An `rps-elt' is:
;;;
;;; rps-elt := <part> ; return the indicated part of the string
;;; ; (see below)
;;;
;;; | (<start-point> <end-point>) ; return the substring starting
;;; ; at `<start-point>' and ending immediately
;;; ; before `<end-point>' (see below)
;;;
;;;
;;; | state-label ; return the state label of the DFA ending
;;; ; state. If the match terminated at a `cut'
;;; ; operator (`/' in sre notation), this is
;;; ; the integer argument to that operator.
;;;
;;; | ? ; the keyword of the terminating cut label or #f
;;;
;;; | <keyword> ; return the keyword literally. This is useful
;;; ; for labeling elements in a `recursive-pick-spec'
;;; ; which is a list.
;;;
;;; A `part' indicates the entire match, a parenthesized
;;; subexpression, or the substring that preceeds a match, or the
;;; substring that follows a match:
;;;
;;; part := 0 ; the entire match
;;;
;;; | <n> ; (an integer) the `nth' parenthesized subexpression
;;;
;;; | (@ <keyword>) ; the subexpression labeled by `<keyword>'
;;;
;;; | < ; (the symbol '<') the substring preceeding the match
;;;
;;; | > ; (the symbol '>') the substring following the match
;;;
;;; A `point' indicates a specific position within the string. There
;;; are two kinds of `point': a `start-point' and and `end-point' that together
;;; specify a substring of the string:
;;;
;;; start-point := <part> ; the beginning of the indicated match part.
;;; | <any-point> ; (see below)
;;;
;;; end-point := <part> ; the end of the indicated match part.
;;; | <any-point> ; (see below)
;;;
;;; any-point := (<part> 0) ; the beginning of the indicated match part
;;; | (<part> 1) ; the end of the indicated match part
;;;
;;;
;;; An example pick spec that returns a list of substrings of the original string:
;;;
;;; (0 ; the entire match
;;;
;;; (< 0) ; from the start of the string to the end of the match
;;;
;;; (2 >) ; from the start of subexpression 2 to the end of the string
;;;
;;; (@ :username) ; the subexpression labeled `:username'
;;;
;;; ((@ :username) ; from the start of the subexpression labeled `:username'
;;; (@ :directory)) ; ... to the end of the subexpression labeled `:directory'
;;;
;;; ((2 1) ; from the end of subexpression 2
;;; ((@ :directory) 0))) ; ... to the beginning of the subexpression labeled :directory
;;;
;;;
_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user
next prev parent reply other threads:[~2003-10-24 22:30 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-04-23 13:37 Stupid module and pregexp questions MJ Ray
2003-04-23 14:56 ` Paul Jarc
2003-04-24 10:01 ` MJ Ray
2003-04-24 12:52 ` Andreas Rottmann
2003-04-24 13:15 ` MJ Ray
2003-04-24 13:36 ` Andreas Rottmann
2003-04-24 16:58 ` Marius Vollmer
2003-04-24 22:55 ` Andreas Rottmann
2003-04-24 17:58 ` MJ Ray
2003-04-28 16:06 ` Rob Browning
2003-04-28 16:44 ` MJ Ray
2003-04-28 17:03 ` Rob Browning
2003-04-28 17:51 ` MJ Ray
2003-04-28 18:18 ` Rob Browning
2003-04-28 18:07 ` Dr. Peter Ivanyi
2003-04-29 18:38 ` MJ Ray
2003-04-28 17:53 ` tomas
2003-04-28 17:12 ` Rob Browning
2003-04-28 17:55 ` MJ Ray
2003-04-29 8:12 ` Low level things in C or Scheme [was Stupid module and pregexp questions] tomas
2003-04-29 17:35 ` Thamer Al-Harbash
2003-04-29 19:34 ` Low level things in C or Scheme Mikael Djurfeldt
2003-04-29 20:24 ` Ken Anderson
2003-04-30 4:27 ` Low level things in C or Scheme [was Stupid module and pregexp questions] Robert Uhl
2003-04-30 13:27 ` Thamer Al-Harbash
2003-04-30 6:39 ` tomas
2003-04-29 0:45 ` Stupid module and pregexp questions Robert Uhl
2003-04-29 22:06 ` MJ Ray
2003-04-29 23:21 ` Tom Lord
2003-04-30 0:04 ` Ken Anderson
2003-04-30 6:48 ` tomas
2003-04-30 6:31 ` Tom Lord
2003-04-30 6:35 ` Tom Lord
2003-10-24 21:29 ` Thien-Thi Nguyen
2003-10-24 22:30 ` Tom Lord [this message]
2003-10-26 18:38 ` Thien-Thi Nguyen
2003-04-30 6:58 ` Thien-Thi Nguyen
2003-04-30 10:34 ` tomas
2003-04-30 17:11 ` Tom Lord
2003-05-06 9:50 ` tomas
2003-05-06 9:28 ` Tom Lord
2003-05-08 11:47 ` tomas
2003-10-24 21:45 ` Thien-Thi Nguyen
2003-10-24 22:37 ` Tom Lord
2003-10-26 18:47 ` Thien-Thi Nguyen
2003-10-27 10:48 ` tomas
2003-05-05 5:11 ` Rob Browning
2003-05-05 6:18 ` Tom Lord
2003-05-05 7:47 ` Rob Browning
2003-05-05 17:33 ` Tom Lord
2003-05-05 19:37 ` Rob Browning
2003-05-05 20:19 ` Tom Lord
2003-10-24 22:26 ` Thien-Thi Nguyen
2003-10-24 22:58 ` Tom Lord
2003-10-26 19:02 ` Thien-Thi Nguyen
2003-10-27 10:26 ` tomas
2003-10-27 14:19 ` Dale P. Smith
2003-10-27 14:54 ` rm
2003-10-28 0:57 ` Robert Marlow
2003-10-28 1:59 ` Tom Lord
2003-10-29 9:36 ` Harri Haataja
2003-10-28 2:05 ` lord
[not found] ` <lord@morrowfield.regexps.com>
2003-10-28 2:23 ` Thien-Thi Nguyen
2003-04-30 4:38 ` Robert Uhl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200310242230.PAA03910@morrowfield.regexps.com \
--to=lord@emf.net \
--cc=guile-user@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).