From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Tom Lord Newsgroups: gmane.lisp.guile.user Subject: Re: Stupid module and pregexp questions Date: Fri, 24 Oct 2003 15:30:05 -0700 (PDT) Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Message-ID: <200310242230.PAA03910@morrowfield.regexps.com> References: <877k9eobcv.fsf@raven.i.defaultvalue.org> <20030430064840.GC22895@www> <200304300631.XAA05547@morrowfield.regexps.com> NNTP-Posting-Host: deer.gmane.org X-Trace: sea.gmane.org 1067034147 20361 80.91.224.253 (24 Oct 2003 22:22:27 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 24 Oct 2003 22:22:27 +0000 (UTC) Cc: guile-user@gnu.org Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Sat Oct 25 00:22:25 2003 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1ADAK8-0005p0-00 for ; Sat, 25 Oct 2003 00:22:24 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1ADAHQ-0004bF-Tr for guile-user@m.gmane.org; Fri, 24 Oct 2003 18:19:36 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1ADAGs-0004aG-Nu for guile-user@gnu.org; Fri, 24 Oct 2003 18:19:02 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1ADAGM-0004Kh-2H for guile-user@gnu.org; Fri, 24 Oct 2003 18:19:01 -0400 Original-Received: from [65.234.195.244] (helo=morrowfield.regexps.com) by monty-python.gnu.org with esmtp (Exim 4.24) id 1ADAGH-0004Ix-Ht for guile-user@gnu.org; Fri, 24 Oct 2003 18:18:26 -0400 Original-Received: (from lord@localhost) by morrowfield.regexps.com (8.9.1/8.9.1) id PAA03910; Fri, 24 Oct 2003 15:30:05 -0700 (PDT) (envelope-from lord@morrowfield.regexps.com) Original-To: ttn@glug.org In-reply-to: (message from Thien-Thi Nguyen on Fri, 24 Oct 2003 23:29:44 +0200) X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: main.gmane.org gmane.lisp.guile.user:2318 X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.user:2318 > From: Thien-Thi Nguyen > From: Tom Lord > Date: Tue, 29 Apr 2003 23:31:41 -0700 (PDT) Wow, challenging my memory, eh? > Heck, they were actively excised -- apparently by virtue of some > (sorry, folks) misguided reasoning about the cleanliness of their > semantics. That was a comment I made about the removal of shared substrings. I was delighted when 1.6.4 gave me: guile> make-shared-substring # though perturbed that the source code says: #if SCM_DEBUG_DEPRECATED == 0 [...] SCM_DEFINE (scm_make_shared_substring, "make-shared-substring", 1, 2, 0, I can try to write up a "case for shared substrings" if that would be helpful. > in guile 1.4.1.96 you can do `(use-modules (lang librgx))' to try out > librx re-integration. it's even documented to some extent in the manual. > below is some work-in-progress flex envy slated for 1.4.2 based on rx... Yikes. I'm scared to ask what version of Rx you are using. You _should_ (really) be using the latest and greatest in libhackerlab, which is not currently in release. However, by shocking coincidence, I was just today semi-preparing to set-up a savannah libhackerlab project and get it back out there (separately from arch, in which it happens to be included). A nice side effect of that: systas (which I'm not planning on re-releasing anytime soon but which is trivially available in my public archives) has a nice libsystas binding for the latest and greatest rx. It'd probably take like 2hrs at most to port it to guile. Nifty code sample from systas: (define-public sans-leading-blanks (structured-regexp->procedure `(^ (* ([] blank))) :pick-spec '>)) Defines a procedure that takes a string, compares it to the given regexp, and returns a shared substring of that string. The `pick-spec' says _which_ shared substring to return. ">" means, return the shared substring that begins at the first character after the match. -t ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; Structured Regular Expressions [with apologies to Olin Shivers] ;;; ;;; ;;; A structured regexp is a recursively defined list structure. ;;; The general form is: ;;; ;;; structured-regexp := ( ...) ;;; parameter := ;;; | ;;; | ;;; | ;;; | ;;; ;;; The valid operators are: ;;; ;;; operator := const ; a string constant ;;; | any ; any character ;;; | [] ; character set ;;; | [^] ; negated character set ;;; | ^ ; start anchor ;;; | $ ; end anchor ;;; | ? ; optional sub-expression ;;; | * ; repeated sub-expression ;;; | + ; non-empty, repeated sub-expression ;;; | {} ; a counted sub-expression ;;; | = ; parenthesized subexpression ;;; | & ; sub-expression concatenation ;;; | | ; alternative sub-expressions ;;; | @ ; parenthesized subexpression back-reference ;;; | / ; the "cut" operator ;;; | ! ; the symbolicly labeled "cut" operator ;;; ;;; As a short-hand, some structured regexps can be abbreviated: ;;; ;;; (const "string") == "string" ;;; (* any) == *. ;;; (^ ($ subexp)) == (^$ subexp) ;;; ;;; Each operator has its own syntax, so the precise syntax of a structured ;;; regexp is: ;;; ;;; structured-regexp := (const ) ;;; | ([] ...) ;;; | ([^] ...) ;;; | (^ ...) ;;; | ($ ...) ;;; | (? ...) ;;; | (* ...) ;;; | (+ ...) ;;; | ({} ...) ;;; | (& ...) ;;; | (| ...) ;;; | (= [] ...) ;;; | (@ ) ;;; | (/ ) ;;; | (! [] ...) ;;; ;;; character-set-element := string ;;; | character ;;; | (character . character) ; a range of characters ;;; | ; see the `(standard char-set-lib)' module ;;; ;;; subexpresion-label := ; (a keyword) ;;; cut-label := ; (a keyword) ;;; ;;; A `pick-spec' specifies values to be returned from `regexec' or a ;;; procedure returned by `regexec-function'. It has the form: ;;; ;;; pick-spec := #f ; return #t if a match is found, #f otherwise ;;; ;;; | #t ; return #f or a list `(before match after)' ;;; ; that is the partition of the string implied ;;; ; by a successful match ;;; ;;; | ;;; ;;; ;;; A `recursive-pick-spec' is: ;;; ;;; recursive-pick-spec := ; return only the value implied by `rps-elt' ;;; | ( ...) ; return a list of values implied by ;;; ; the list of `rps-elt's. ;;; ;;; An `rps-elt' is: ;;; ;;; rps-elt := ; return the indicated part of the string ;;; ; (see below) ;;; ;;; | ( ) ; return the substring starting ;;; ; at `' and ending immediately ;;; ; before `' (see below) ;;; ;;; ;;; | state-label ; return the state label of the DFA ending ;;; ; state. If the match terminated at a `cut' ;;; ; operator (`/' in sre notation), this is ;;; ; the integer argument to that operator. ;;; ;;; | ? ; the keyword of the terminating cut label or #f ;;; ;;; | ; return the keyword literally. This is useful ;;; ; for labeling elements in a `recursive-pick-spec' ;;; ; which is a list. ;;; ;;; A `part' indicates the entire match, a parenthesized ;;; subexpression, or the substring that preceeds a match, or the ;;; substring that follows a match: ;;; ;;; part := 0 ; the entire match ;;; ;;; | ; (an integer) the `nth' parenthesized subexpression ;;; ;;; | (@ ) ; the subexpression labeled by `' ;;; ;;; | < ; (the symbol '<') the substring preceeding the match ;;; ;;; | > ; (the symbol '>') the substring following the match ;;; ;;; A `point' indicates a specific position within the string. There ;;; are two kinds of `point': a `start-point' and and `end-point' that together ;;; specify a substring of the string: ;;; ;;; start-point := ; the beginning of the indicated match part. ;;; | ; (see below) ;;; ;;; end-point := ; the end of the indicated match part. ;;; | ; (see below) ;;; ;;; any-point := ( 0) ; the beginning of the indicated match part ;;; | ( 1) ; the end of the indicated match part ;;; ;;; ;;; An example pick spec that returns a list of substrings of the original string: ;;; ;;; (0 ; the entire match ;;; ;;; (< 0) ; from the start of the string to the end of the match ;;; ;;; (2 >) ; from the start of subexpression 2 to the end of the string ;;; ;;; (@ :username) ; the subexpression labeled `:username' ;;; ;;; ((@ :username) ; from the start of the subexpression labeled `:username' ;;; (@ :directory)) ; ... to the end of the subexpression labeled `:directory' ;;; ;;; ((2 1) ; from the end of subexpression 2 ;;; ((@ :directory) 0))) ; ... to the beginning of the subexpression labeled :directory ;;; ;;; _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user