unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Tom Lord <lord@emf.net>
Cc: guile-user@gnu.org
Subject: Re: Stupid module and pregexp questions
Date: Fri, 24 Oct 2003 15:30:05 -0700 (PDT)	[thread overview]
Message-ID: <200310242230.PAA03910@morrowfield.regexps.com> (raw)
In-Reply-To: <E1AD9VA-0007oD-00@surf.glug.org> (message from Thien-Thi Nguyen on Fri, 24 Oct 2003 23:29:44 +0200)



    > From: Thien-Thi Nguyen <ttn@surf.glug.org>

    >    From: Tom Lord <lord@emf.net>
    >    Date: Tue, 29 Apr 2003 23:31:41 -0700 (PDT)

Wow, challenging my memory, eh?

    >    Heck, they were actively excised -- apparently by virtue of some
    >    (sorry, folks) misguided reasoning about the cleanliness of their
    >    semantics.

That was a comment I made about the removal of shared substrings.

I was delighted when 1.6.4 gave me:

  guile> make-shared-substring
  #<primitive-procedure make-shared-substring>

though perturbed that the source code says:

  #if SCM_DEBUG_DEPRECATED == 0
  [...]
  SCM_DEFINE (scm_make_shared_substring, "make-shared-substring", 1, 2, 0,

I can try to write up a "case for shared substrings" if that would be
helpful.

    > in guile 1.4.1.96 you can do `(use-modules (lang librgx))' to try out
    > librx re-integration.  it's even documented to some extent in the manual.
    > below is some work-in-progress flex envy slated for 1.4.2 based on rx...

Yikes.  I'm scared to ask what version of Rx you are using.   You
_should_ (really) be using the latest and greatest in libhackerlab,
which is not currently in release.    However, by shocking
coincidence, I was just today semi-preparing to set-up a savannah
libhackerlab project and get it back out there (separately from arch,
in which it happens to be included).

A nice side effect of that: systas (which I'm not planning on
re-releasing anytime soon but which is trivially available in my
public archives) has a nice libsystas binding for the latest and
greatest rx.  It'd probably take like 2hrs at most to port it to
guile.

Nifty code sample from systas:

  (define-public sans-leading-blanks
    (structured-regexp->procedure `(^ (* ([] blank))) :pick-spec '>))


Defines a procedure that takes a string, compares it to the given
regexp, and returns a shared substring of that string.  The
`pick-spec' says _which_ shared substring to return.  ">" means,
return the shared substring that begins at the first character after
the match.

-t

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;  Structured Regular Expressions

[with apologies to Olin Shivers]

;;; 
;;; 
;;; A structured regexp is a recursively defined list structure.
;;; The general form is:
;;;
;;;	structured-regexp := (<operator> <parameter> ...)
;;;	parameter	  := <integer>
;;;			  |  <character>
;;;			  |  <string>
;;;			  |  <keyword>
;;;			  |  <structured-regexp>
;;;
;;; The valid operators are:
;;;
;;;	operator	  := const	; a string constant
;;;			  |  any	; any character
;;;			  |  []		; character set
;;;			  |  [^]	; negated character set
;;;			  |  ^		; start anchor
;;;			  |  $		; end anchor
;;;			  |  ?		; optional sub-expression
;;;			  |  *		; repeated sub-expression
;;;			  |  +		; non-empty, repeated sub-expression
;;;			  |  {}		; a counted sub-expression
;;;			  |  =		; parenthesized subexpression
;;;			  |  &		; sub-expression concatenation
;;;			  |  |		; alternative sub-expressions
;;;			  |  @		; parenthesized subexpression back-reference
;;;			  |  /		; the "cut" operator
;;; 			  | !		; the symbolicly labeled "cut" operator
;;;
;;; As a short-hand, some structured regexps can be abbreviated:
;;;
;;; 	(const "string") == "string"
;;;	(* any)		 == *.
;;;	(^ ($ subexp))	 == (^$ subexp)
;;; 
;;; Each operator has its own syntax, so the precise syntax of a structured
;;; regexp is:
;;;
;;;	structured-regexp :=	(const <string>)
;;;			  |	([] <character-set-element> ...)
;;;			  |	([^] <character-set-element> ...)
;;;			  |	(^ <structured-regexp> ...)
;;;			  |	($ <structured-regexp> ...)
;;;			  |	(? <structured-regexp> ...)
;;;			  |	(* <structured-regexp> ...)
;;;			  |	(+ <structured-regexp> ...)
;;;			  |	({} <integer> <integer> <structured-regexp> ...)
;;;			  |	(& <structured-regexp> ...)
;;;			  |	(| <structured-regexp> ...)
;;;			  |	(= [<subexpression-label>] <structured-regexp> ...)
;;;			  |	(@ <subexpression-label>)
;;;			  |	(/ <integer>)
;;;			  |	(! [<cut-label>] <structured-regexp> ...)
;;;
;;;	character-set-element	:=	string
;;;				|	character
;;;				|	(character . character)	; a range of characters
;;;				|	<character-set> ; see the `(standard char-set-lib)' module
;;;
;;; 	subexpresion-label	:=	<keyword> ; (a keyword)
;;; 	cut-label		:=	<keyword> ; (a keyword)
;;;
;;; A `pick-spec' specifies values to be returned from `regexec' or a
;;; procedure returned by `regexec-function'.  It has the form:
;;; 
;;; 	pick-spec	:=	#f	; return #t if a match is found, #f otherwise
;;; 
;;;			|	#t	; return #f or a list `(before match after)'
;;; 					; that is the partition of the string implied
;;;					; by a successful match
;;; 
;;; 			|	<recursive-pick-spec>
;;; 
;;; 
;;; A `recursive-pick-spec' is:
;;; 
;;; 	recursive-pick-spec :=	<rps-elt>	; return only the value implied by `rps-elt'
;;; 			    |	(<rps-elt> ...) ; return a list of values implied by 
;;; 						; the list of `rps-elt's.
;;;
;;; An `rps-elt' is:
;;; 
;;; 	rps-elt		:=	<part>	; return the indicated part of the string
;;;					; (see below)
;;; 
;;; 			|	(<start-point> <end-point>) ; return the substring starting
;;;					; at  `<start-point>' and ending immediately
;;;					; before `<end-point>' (see below)
;;; 
;;; 
;;; 			|	state-label ; return the state label of the DFA ending
;;;					; state.  If the match terminated at a `cut'
;;;					; operator (`/' in sre notation), this is
;;; 					; the integer argument to that operator.
;;; 
;;; 			|	?	; the keyword of the terminating cut label or #f
;;; 
;;; 			|	<keyword> ; return the keyword literally.  This is useful
;;;					; for labeling elements in a `recursive-pick-spec'
;;;					; which is a list.
;;; 
;;; A `part' indicates the entire match, a parenthesized
;;; subexpression, or the substring that preceeds a match, or the
;;; substring that follows a match:
;;; 
;;; 	part		:=	0	; the entire match
;;; 
;;; 			|	<n>	; (an integer) the `nth' parenthesized subexpression
;;; 
;;; 			|	(@ <keyword>) ; the subexpression labeled by `<keyword>'
;;;
;;; 			|	<	; (the symbol '<') the substring preceeding the match
;;; 
;;; 			|	>	; (the symbol '>') the substring following the match
;;; 
;;; A `point' indicates a specific position within the string.  There
;;; are two kinds of `point': a `start-point' and and `end-point' that together
;;; specify a substring of the string:
;;; 
;;; 	start-point	:=	<part>		; the beginning of the indicated match part.
;;; 			| 	<any-point> 	; (see below)
;;; 
;;; 	end-point	:=	<part>		; the end of the indicated match part.
;;; 			| 	<any-point> 	; (see below)
;;; 
;;; 	any-point	:=	(<part> 0)	; the beginning of the indicated match part
;;;			|	(<part> 1)	; the end of the indicated match part
;;; 
;;;
;;; An example pick spec that returns a list of substrings of the original string:
;;; 
;;; 	(0		; the entire match
;;; 
;;;	 (< 0)		; from the start of the string to the end of the match
;;; 
;;;	 (2 >)		; from the start of subexpression 2 to the end of the string
;;; 
;;;	 (@ :username)	; the subexpression labeled `:username'
;;; 
;;; 	 ((@ :username)	  ; from the start of the subexpression labeled `:username'
;;;	  (@ :directory)) ; ... to the end of the subexpression labeled `:directory'
;;; 			  
;;;	 ((2 1) 		; from the end of subexpression 2 
;;; 	  ((@ :directory) 0)))  ; ... to the beginning of the subexpression labeled :directory
;;;			       
;;; 	


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


  reply	other threads:[~2003-10-24 22:30 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-23 13:37 Stupid module and pregexp questions MJ Ray
2003-04-23 14:56 ` Paul Jarc
2003-04-24 10:01   ` MJ Ray
2003-04-24 12:52 ` Andreas Rottmann
2003-04-24 13:15   ` MJ Ray
2003-04-24 13:36     ` Andreas Rottmann
2003-04-24 16:58       ` Marius Vollmer
2003-04-24 22:55         ` Andreas Rottmann
2003-04-24 17:58       ` MJ Ray
2003-04-28 16:06 ` Rob Browning
2003-04-28 16:44   ` MJ Ray
2003-04-28 17:03     ` Rob Browning
2003-04-28 17:51       ` MJ Ray
2003-04-28 18:18         ` Rob Browning
2003-04-28 18:07       ` Dr. Peter Ivanyi
2003-04-29 18:38         ` MJ Ray
2003-04-28 17:53   ` tomas
2003-04-28 17:12     ` Rob Browning
2003-04-28 17:55     ` MJ Ray
2003-04-29  8:12       ` Low level things in C or Scheme [was Stupid module and pregexp questions] tomas
2003-04-29 17:35         ` Thamer Al-Harbash
2003-04-29 19:34           ` Low level things in C or Scheme Mikael Djurfeldt
2003-04-29 20:24             ` Ken Anderson
2003-04-30  4:27           ` Low level things in C or Scheme [was Stupid module and pregexp questions] Robert Uhl
2003-04-30 13:27             ` Thamer Al-Harbash
2003-04-30  6:39           ` tomas
2003-04-29  0:45   ` Stupid module and pregexp questions Robert Uhl
2003-04-29 22:06     ` MJ Ray
2003-04-29 23:21       ` Tom Lord
2003-04-30  0:04         ` Ken Anderson
2003-04-30  6:48         ` tomas
2003-04-30  6:31           ` Tom Lord
2003-04-30  6:35             ` Tom Lord
2003-10-24 21:29             ` Thien-Thi Nguyen
2003-10-24 22:30               ` Tom Lord [this message]
2003-10-26 18:38                 ` Thien-Thi Nguyen
2003-04-30  6:58           ` Thien-Thi Nguyen
2003-04-30 10:34             ` tomas
2003-04-30 17:11               ` Tom Lord
2003-05-06  9:50                 ` tomas
2003-05-06  9:28                   ` Tom Lord
2003-05-08 11:47                     ` tomas
2003-10-24 21:45               ` Thien-Thi Nguyen
2003-10-24 22:37                 ` Tom Lord
2003-10-26 18:47                   ` Thien-Thi Nguyen
2003-10-27 10:48                 ` tomas
2003-05-05  5:11         ` Rob Browning
2003-05-05  6:18           ` Tom Lord
2003-05-05  7:47             ` Rob Browning
2003-05-05 17:33               ` Tom Lord
2003-05-05 19:37                 ` Rob Browning
2003-05-05 20:19                   ` Tom Lord
2003-10-24 22:26             ` Thien-Thi Nguyen
2003-10-24 22:58               ` Tom Lord
2003-10-26 19:02                 ` Thien-Thi Nguyen
2003-10-27 10:26                 ` tomas
2003-10-27 14:19                 ` Dale P. Smith
2003-10-27 14:54                   ` rm
2003-10-28  0:57                     ` Robert Marlow
2003-10-28  1:59                       ` Tom Lord
2003-10-29  9:36                         ` Harri Haataja
2003-10-28  2:05                       ` lord
     [not found]                         ` <lord@morrowfield.regexps.com>
2003-10-28  2:23                           ` Thien-Thi Nguyen
2003-04-30  4:38       ` Robert Uhl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200310242230.PAA03910@morrowfield.regexps.com \
    --to=lord@emf.net \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).