* Stupid module and pregexp questions @ 2003-04-23 13:37 MJ Ray 2003-04-23 14:56 ` Paul Jarc ` (2 more replies) 0 siblings, 3 replies; 64+ messages in thread From: MJ Ray @ 2003-04-23 13:37 UTC (permalink / raw) Hi, Can (use-modules ...) take a file from the current directory? How do I load pregexp support into Guile? (It is supplied yet?) Thanks in advance for any help, -- MJR http://mjr.towers.org.uk/ IM: slef@jabber.at This is my home web site. This for Jabber Messaging. How's my writing? Let me know via any of my contact details. _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-23 13:37 Stupid module and pregexp questions MJ Ray @ 2003-04-23 14:56 ` Paul Jarc 2003-04-24 10:01 ` MJ Ray 2003-04-24 12:52 ` Andreas Rottmann 2003-04-28 16:06 ` Rob Browning 2 siblings, 1 reply; 64+ messages in thread From: Paul Jarc @ 2003-04-23 14:56 UTC (permalink / raw) Cc: guile-user MJ Ray <markj@cloaked.freeserve.co.uk> wrote: > Can (use-modules ...) take a file from the current directory? I don't think so. But (load) can. You can also create a new module with (make-module 1021 (list (resolve-interface '(guile)))) and load a file into that module with read and eval. paul _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-23 14:56 ` Paul Jarc @ 2003-04-24 10:01 ` MJ Ray 0 siblings, 0 replies; 64+ messages in thread From: MJ Ray @ 2003-04-24 10:01 UTC (permalink / raw) Paul Jarc <prj@po.cwru.edu> wrote: > I don't think so. But (load) can. You can also create a new module > with (make-module 1021 (list (resolve-interface '(guile)))) and load a > file into that module with read and eval. In a word: yuck. I'm trying to write a scheme program that supports a couple of implementations from largely the same portable source code, but it would be nice to use modules. It is very annoying for the code layout to be dictated by one language, or to have to use the symlink kludge. If anyone has tips about portable code, please share URLs ;-) -- MJR http://mjr.towers.org.uk/ IM: slef@jabber.at This is my home web site. This for Jabber Messaging. How's my writing? Let me know via any of my contact details. _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-23 13:37 Stupid module and pregexp questions MJ Ray 2003-04-23 14:56 ` Paul Jarc @ 2003-04-24 12:52 ` Andreas Rottmann 2003-04-24 13:15 ` MJ Ray 2003-04-28 16:06 ` Rob Browning 2 siblings, 1 reply; 64+ messages in thread From: Andreas Rottmann @ 2003-04-24 12:52 UTC (permalink / raw) Cc: guile-user MJ Ray <markj@cloaked.freeserve.co.uk> writes: > Hi, > > Can (use-modules ...) take a file from the current directory? > IIRC, yes, if %load-path is set accordingly. Regards, Andy -- Andreas Rottmann | Rotty@ICQ | 118634484@ICQ | a.rottmann@gmx.at http://www.8ung.at/rotty | GnuPG Key: http://www.8ung.at/rotty/gpg.asc Fingerprint | DFB4 4EB4 78A4 5EEE 6219 F228 F92F CFC5 01FD 5B62 This reality is really just a fucked-up dream -- Papa Roach _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-24 12:52 ` Andreas Rottmann @ 2003-04-24 13:15 ` MJ Ray 2003-04-24 13:36 ` Andreas Rottmann 0 siblings, 1 reply; 64+ messages in thread From: MJ Ray @ 2003-04-24 13:15 UTC (permalink / raw) Cc: guile-user On 2003-04-24 13:52:18 +0100 Andreas Rottmann <a.rottmann@gmx.at> wrote: >> Can (use-modules ...) take a file from the current directory? > IIRC, yes, if %load-path is set accordingly. Please can you give me an example? I can only make it take from subdirectories. -- MJR http://mjr.towers.org.uk/ IM: slef@jabber.at This is my home web site. This for Jabber Messaging. How's my writing? Let me know via any of my contact details. _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-24 13:15 ` MJ Ray @ 2003-04-24 13:36 ` Andreas Rottmann 2003-04-24 16:58 ` Marius Vollmer 2003-04-24 17:58 ` MJ Ray 0 siblings, 2 replies; 64+ messages in thread From: Andreas Rottmann @ 2003-04-24 13:36 UTC (permalink / raw) Cc: guile-user MJ Ray <markj@cloaked.freeserve.co.uk> writes: > On 2003-04-24 13:52:18 +0100 Andreas Rottmann <a.rottmann@gmx.at> > wrote: > >>> Can (use-modules ...) take a file from the current directory? >> IIRC, yes, if %load-path is set accordingly. > > Please can you give me an example? I can only make it take from > subdirectories. > I just tried it out: simple-math.scm: ----------------------- (define-module (simple-math)) (define gcd (lambda (a b) (if (= a b) a (if (> a b) (gcd (- a b) b) (gcd a (- b a)))))) ------------------- ~% ls *.scm simple-math.scm ~% guile guile> %load-path ("/home/andy/share/guile/site" "/usr/local/share/guile/site" "/usr/local/share/guile/1.7" "/usr/local/share/guile" ".") guile> (gcd 21 6) 3 The same worked also with guile 1.6. Note the trailing "." in %load-path. Andy -- Andreas Rottmann | Rotty@ICQ | 118634484@ICQ | a.rottmann@gmx.at http://www.8ung.at/rotty | GnuPG Key: http://www.8ung.at/rotty/gpg.asc Fingerprint | DFB4 4EB4 78A4 5EEE 6219 F228 F92F CFC5 01FD 5B62 Packages should build-depend on what they should build-depend. _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-24 13:36 ` Andreas Rottmann @ 2003-04-24 16:58 ` Marius Vollmer 2003-04-24 22:55 ` Andreas Rottmann 2003-04-24 17:58 ` MJ Ray 1 sibling, 1 reply; 64+ messages in thread From: Marius Vollmer @ 2003-04-24 16:58 UTC (permalink / raw) Cc: MJ Ray Andreas Rottmann <a.rottmann@gmx.at> writes: > ~% ls *.scm > simple-math.scm > ~% guile > guile> %load-path > ("/home/andy/share/guile/site" "/usr/local/share/guile/site" "/usr/local/share/guile/1.7" "/usr/local/share/guile" ".") > guile> (gcd 21 6) > 3 Heh, there is a builtin 'gcd', of course. You don't load the module as it seems. -- GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3 331E FAF8 226A D5D4 E405 _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-24 16:58 ` Marius Vollmer @ 2003-04-24 22:55 ` Andreas Rottmann 0 siblings, 0 replies; 64+ messages in thread From: Andreas Rottmann @ 2003-04-24 22:55 UTC (permalink / raw) Cc: MJ Ray Marius Vollmer <mvo@zagadka.de> writes: > Andreas Rottmann <a.rottmann@gmx.at> writes: > >> ~% ls *.scm >> simple-math.scm >> ~% guile >> guile> %load-path >> ("/home/andy/share/guile/site" "/usr/local/share/guile/site" "/usr/local/share/guile/1.7" "/usr/local/share/guile" ".") >> guile> (gcd 21 6) >> 3 > > Heh, there is a builtin 'gcd', of course. You don't load the module > as it seems. > Only forgot to paste that, and :export. Should have chosen another function (name), it seems ;-) For completeness: ,---- | (define-module (simple-math) | :export (some-stupid-func)) | | (define (some-stupid-func a) (* a a a a)) `---- Then load with `(use-modules (simple-math))' Regards, Andy -- Andreas Rottmann | Rotty@ICQ | 118634484@ICQ | a.rottmann@gmx.at http://www.8ung.at/rotty | GnuPG Key: http://www.8ung.at/rotty/gpg.asc Fingerprint | DFB4 4EB4 78A4 5EEE 6219 F228 F92F CFC5 01FD 5B62 Python is executable pseudocode, Perl is executable line-noise. _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-24 13:36 ` Andreas Rottmann 2003-04-24 16:58 ` Marius Vollmer @ 2003-04-24 17:58 ` MJ Ray 1 sibling, 0 replies; 64+ messages in thread From: MJ Ray @ 2003-04-24 17:58 UTC (permalink / raw) Andreas Rottmann <a.rottmann@gmx.at> wrote: > The same worked also with guile 1.6. Note the trailing "." in %load-path. The need to still have brackets are what was escaping me, it seems. Now, how do I load pregexp support in? I can't find it in the guile 1.4 distribution anywhere. Is it in 1.6? MJR _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-23 13:37 Stupid module and pregexp questions MJ Ray 2003-04-23 14:56 ` Paul Jarc 2003-04-24 12:52 ` Andreas Rottmann @ 2003-04-28 16:06 ` Rob Browning 2003-04-28 16:44 ` MJ Ray ` (2 more replies) 2 siblings, 3 replies; 64+ messages in thread From: Rob Browning @ 2003-04-28 16:06 UTC (permalink / raw) Cc: guile-user MJ Ray <markj@cloaked.freeserve.co.uk> writes: > Can (use-modules ...) take a file from the current directory? > > How do I load pregexp support into Guile? (It is supplied yet?) Interesting, I didn't know about that, but for my own purposes, I wrote a simple test-interface to libpcre (use-modules (pcre))... More generally, I've been planning (when I have time) to raise the "regexp" issue. It was my perhaps incorrect impression that the various regular expression libraries that we dectect and use in configure.in aren't necessarily all that standard from platform to platform. If so, then it'd be nice to have a truly invariant regex lib that we can rely on. Without that, it's hard to write portable scripts. If there is indeed a compatibility issue with the set of "standard" regular expression libs configure might pick, then I wonder if providing libpcre everywhere might be an alternative, or if perhaps we could just copy the regular expression code from emacs. Of course pregexp looks like another possible alternative. Although until we get a good compiler, or unless we compile it or hand-translate it to C (via stalin somehow, hobbit, or similar), then I wouldn't expect it to be nearly as fast as the first two alternatives. In the end, I'd just like to have a powerful regex lib whose syntax and behavior is invariant across all the platforms on which I'm likely to run guile. Thoughts? -- Rob Browning rlb @defaultvalue.org, @linuxdevel.com, and @debian.org Previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-28 16:06 ` Rob Browning @ 2003-04-28 16:44 ` MJ Ray 2003-04-28 17:03 ` Rob Browning 2003-04-28 17:53 ` tomas 2003-04-29 0:45 ` Stupid module and pregexp questions Robert Uhl 2 siblings, 1 reply; 64+ messages in thread From: MJ Ray @ 2003-04-28 16:44 UTC (permalink / raw) Cc: guile-user On 2003-04-28 17:06:24 +0100 Rob Browning <rlb@defaultvalue.org> wrote: > Interesting, I didn't know about that, but for my own purposes, I > wrote a simple test-interface to libpcre (use-modules (pcre))... At least bigloo and plt-scheme use pregexp. It would be very useful if guile could offer a partially compatible interface, regardless of the underlying implementation. [...] > In the end, I'd just like to have a powerful regex lib whose syntax > and behavior is invariant across all the platforms on which I'm likely > to run guile. > > Thoughts? Good aim, but I'd like the more general: a powerful regex lib whose syntax and behaviour is invariant across all the platforms on which I run scheme. Is that achievable? -- MJR http://mjr.towers.org.uk/ IM: slef@jabber.at This is my home web site. This for Jabber Messaging. How's my writing? Let me know via any of my contact details. _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-28 16:44 ` MJ Ray @ 2003-04-28 17:03 ` Rob Browning 2003-04-28 17:51 ` MJ Ray 2003-04-28 18:07 ` Dr. Peter Ivanyi 0 siblings, 2 replies; 64+ messages in thread From: Rob Browning @ 2003-04-28 17:03 UTC (permalink / raw) Cc: guile-user MJ Ray <markj@cloaked.freeserve.co.uk> writes: > At least bigloo and plt-scheme use pregexp. It would be very useful > if guile could offer a partially compatible interface, regardless of > the underlying implementation. Well that sounds like a pretty good argument in favor of pregexp -- I'll look in to it. Of course we can only include it in guile-core (without a rewrite) if the author's amenable to the LGPL and copyright assignment. If pregexp *is* suitable, I can probably handle improving the performance. I'll ask the author. > Good aim, but I'd like the more general: a powerful regex lib whose > syntax and behaviour is invariant across all the platforms on which I > run scheme. > > Is that achievable? Should be. Ideally, that might be encouraged via a suitable SRFI. Wonder if pregexp might be appropriate... -- Rob Browning rlb @defaultvalue.org, @linuxdevel.com, and @debian.org Previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-28 17:03 ` Rob Browning @ 2003-04-28 17:51 ` MJ Ray 2003-04-28 18:18 ` Rob Browning 2003-04-28 18:07 ` Dr. Peter Ivanyi 1 sibling, 1 reply; 64+ messages in thread From: MJ Ray @ 2003-04-28 17:51 UTC (permalink / raw) Cc: guile-user On 2003-04-28 18:03:22 +0100 Rob Browning <rlb@defaultvalue.org> wrote: > Well that sounds like a pretty good argument in favor of pregexp -- > I'll look in to it. Of course we can only include it in guile-core > (without a rewrite) if the author's amenable to the LGPL and copyright > assignment. If pregexp *is* suitable, I can probably handle improving > the performance. I'll ask the author. As I said, pregexp is included in PLT-Scheme, which is also LGPL. I think the only new ground is copyright assignment. [...] > Should be. Ideally, that might be encouraged via a suitable SRFI. > Wonder if pregexp might be appropriate... Let's see what people say here for a bit and what the author's response is, then cls, then SRFI? -- MJR http://mjr.towers.org.uk/ IM: slef@jabber.at This is my home web site. This for Jabber Messaging. How's my writing? Let me know via any of my contact details. _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-28 17:51 ` MJ Ray @ 2003-04-28 18:18 ` Rob Browning 0 siblings, 0 replies; 64+ messages in thread From: Rob Browning @ 2003-04-28 18:18 UTC (permalink / raw) Cc: guile-user MJ Ray <markj@cloaked.freeserve.co.uk> writes: >> Should be. Ideally, that might be encouraged via a suitable SRFI. >> Wonder if pregexp might be appropriate... > > Let's see what people say here for a bit and what the author's > response is, then cls, then SRFI? He's fine with the assignment if we need one. I'll play around with the code. -- Rob Browning rlb @defaultvalue.org, @linuxdevel.com, and @debian.org Previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-28 17:03 ` Rob Browning 2003-04-28 17:51 ` MJ Ray @ 2003-04-28 18:07 ` Dr. Peter Ivanyi 2003-04-29 18:38 ` MJ Ray 1 sibling, 1 reply; 64+ messages in thread From: Dr. Peter Ivanyi @ 2003-04-28 18:07 UTC (permalink / raw) Rob Browning wrote: > > MJ Ray <markj@cloaked.freeserve.co.uk> writes: > > > At least bigloo and plt-scheme use pregexp. It would be very useful > > if guile could offer a partially compatible interface, regardless of > > the underlying implementation. > > Well that sounds like a pretty good argument in favor of pregexp -- > I'll look in to it. Of course we can only include it in guile-core In this case can I also point out that the module definition is different in guile compared to these (and maybe some other) scheme systems. This really troubles me, since any of my scheme code must be preprocessed according to which scheme system is running it. Is this a good argument to change it or provide an alternative way, consistent with other scheme systems ? :-) Peter Ivanyi _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-28 18:07 ` Dr. Peter Ivanyi @ 2003-04-29 18:38 ` MJ Ray 0 siblings, 0 replies; 64+ messages in thread From: MJ Ray @ 2003-04-29 18:38 UTC (permalink / raw) Dr. Peter Ivanyi <peteri@carme.sect.mce.hw.ac.uk> wrote: > In this case can I also point out that the module definition is different > in guile compared to these (and maybe some other) scheme systems. This If SRFI-7 could be extended to include some features to tell you what platform and implementation you are running on, then I think this is possible. Anyone got opinions on that? Any plans for SRFI-7 in guile? For now, I'm using a bizarre concoction of many small files to emulate what should happen. -- MJR http://mjr.towers.org.uk/ IM: slef@jabber.at This is my home web site. This for Jabber Messaging. How's my writing? Let me know via any of my contact details. _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-28 16:06 ` Rob Browning 2003-04-28 16:44 ` MJ Ray @ 2003-04-28 17:53 ` tomas 2003-04-28 17:12 ` Rob Browning 2003-04-28 17:55 ` MJ Ray 2003-04-29 0:45 ` Stupid module and pregexp questions Robert Uhl 2 siblings, 2 replies; 64+ messages in thread From: tomas @ 2003-04-28 17:53 UTC (permalink / raw) Cc: MJ Ray On Mon, Apr 28, 2003 at 11:06:24AM -0500, Rob Browning wrote: > MJ Ray <markj@cloaked.freeserve.co.uk> writes: > [...] [about letting configure find whatever regexp lib is there] > If so, then it'd be nice to have a truly invariant regex > lib that we can rely on. Without that, it's hard to write portable > scripts. Indeed. [pregexp/pcre] > In the end, I'd just like to have a powerful regex lib whose syntax > and behavior is invariant across all the platforms on which I'm likely > to run guile. > > Thoughts? Call me conservative, what not. I'd think You'd Write A Regexp Lib In C (TM). Apart from that, pregexp shows how a good Scheme interface to a regexp library might look like. I mean: having an S-expression syntax for regexps (and having the string variant just as a convenient shorthand notation) gives you the power to automated construction of regexps. What I have missed most is a streams like interface: in comes a stream of chars, out a stream of matches. Has anyone seen something like this? Have a look to the contortions needed in Perl to do this. This are my random ramblings. But you asked for input ;-) Regards -- tomas _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-28 17:53 ` tomas @ 2003-04-28 17:12 ` Rob Browning 2003-04-28 17:55 ` MJ Ray 1 sibling, 0 replies; 64+ messages in thread From: Rob Browning @ 2003-04-28 17:12 UTC (permalink / raw) Cc: MJ Ray tomas@fabula.de writes: > Call me conservative, what not. I'd think You'd Write A Regexp Lib In C (TM). > > Apart from that, pregexp shows how a good Scheme interface to a > regexp library might look like. I mean: having an S-expression > syntax for regexps (and having the string variant just as a > convenient shorthand notation) gives you the power to automated > construction of regexps. My initial impulse, if the upstream author is amenable, is to take the existing source, and translate it to C (either by hand, or with some automated help). Though without having read the source, I don't yet know what the feasibility of that is. > What I have missed most is a streams like interface: in comes a > stream of chars, out a stream of matches. Has anyone seen something > like this? That would be nice. -- Rob Browning rlb @defaultvalue.org, @linuxdevel.com, and @debian.org Previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-28 17:53 ` tomas 2003-04-28 17:12 ` Rob Browning @ 2003-04-28 17:55 ` MJ Ray 2003-04-29 8:12 ` Low level things in C or Scheme [was Stupid module and pregexp questions] tomas 1 sibling, 1 reply; 64+ messages in thread From: MJ Ray @ 2003-04-28 17:55 UTC (permalink / raw) Cc: Rob Browning On 2003-04-28 18:53:42 +0100 tomas@fabula.de wrote: > Call me conservative, what not. I'd think You'd Write A Regexp Lib In > C (TM). Why do you want to tie one hand behind your back like that? Scheme is such a beautiful language that we should have a good, optimised, tight core and then as much as possible of the remaining system written in Scheme, possibly compiled down, maybe through C, for speed of loading when necessary. Some things have to interface with the outside world and may need to have C layers for that reason, but regular expressions aren't one of them. > This are my random ramblings. But you asked for input ;-) Indeed. IANA Regex Library Implementor... but then, with pregexp, why would I unless I have to? ;-) -- MJR http://mjr.towers.org.uk/ IM: slef@jabber.at This is my home web site. This for Jabber Messaging. How's my writing? Let me know via any of my contact details. _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Low level things in C or Scheme [was Stupid module and pregexp questions] 2003-04-28 17:55 ` MJ Ray @ 2003-04-29 8:12 ` tomas 2003-04-29 17:35 ` Thamer Al-Harbash 0 siblings, 1 reply; 64+ messages in thread From: tomas @ 2003-04-29 8:12 UTC (permalink / raw) Cc: guile-user On Mon, Apr 28, 2003 at 06:55:26AM -1100, MJ Ray wrote: > On 2003-04-28 18:53:42 +0100 tomas@fabula.de wrote: > > >Call me conservative, what not. I'd think You'd Write A Regexp Lib In > >C (TM). > > Why do you want to tie one hand behind your back like that? Scheme is > such a beautiful language that we should have a good, optimised, tight > core and then as much as possible of the remaining system written in > Scheme, possibly compiled down, maybe through C, for speed of loading > when necessary. Some things have to interface with the outside world > and may need to have C layers for that reason, but regular expressions > aren't one of them. Uh, oh. I think we touch a Deep Philosophical Thing (TM) here :-) In theory, Scheme can be as efficient as C (a sloppy way of expressing that a Scheme program, given the right Scheme implementation, etc.). Still, I think it comes at a price (in terms of a sophisticated Scheme implementation, and it seems there are (practical) tradeoffs in the trio code efficiency -- eval flexibility -- complexity of implementation). Of course, if we can have our cake and eat it, I'm all for it, but I'm comfortable with the idea of a layered system where you do the low-level things in one language and the high-level things in another. It correlates quite well with the layering of software and thus feels (to me) very natural. Sorry for the hand-waving :-) Regards -- tomas _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Low level things in C or Scheme [was Stupid module and pregexp questions] 2003-04-29 8:12 ` Low level things in C or Scheme [was Stupid module and pregexp questions] tomas @ 2003-04-29 17:35 ` Thamer Al-Harbash 2003-04-29 19:34 ` Low level things in C or Scheme Mikael Djurfeldt ` (2 more replies) 0 siblings, 3 replies; 64+ messages in thread From: Thamer Al-Harbash @ 2003-04-29 17:35 UTC (permalink / raw) On Tue, 29 Apr 2003 tomas@fabula.de wrote: > Of course, if we can have our cake and eat it, I'm all for it, but I'm > comfortable with the idea of a layered system where you do the low-level > things in one language and the high-level things in another. It correlates > quite well with the layering of software and thus feels (to me) very > natural. It's funny you should talk about layering. I've recently started writing a project at work (or re-writing for the Nth time thanks to changes being requested), and I chose doing the high level work in guile just so I could say "ok done," and get back to more important things. The funny thing is, thanks to guile's seamless use of arbitrarily big numbers (its numerical tower), I don't know if I *want* to do my number crunching in C anymore. This project is slowly becoming 100% scheme as I remove the final bits of C from it. I have not noticed any significant penalty in performance. -- Thamer Al-Harbash http://www.whitefang.com/ (if (> pressure too-much-pressure) 'flame 'work) _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Low level things in C or Scheme 2003-04-29 17:35 ` Thamer Al-Harbash @ 2003-04-29 19:34 ` Mikael Djurfeldt 2003-04-29 20:24 ` Ken Anderson 2003-04-30 4:27 ` Low level things in C or Scheme [was Stupid module and pregexp questions] Robert Uhl 2003-04-30 6:39 ` tomas 2 siblings, 1 reply; 64+ messages in thread From: Mikael Djurfeldt @ 2003-04-29 19:34 UTC (permalink / raw) Cc: guile-user Thamer Al-Harbash <tmh@whitefang.com> writes: > It's funny you should talk about layering. I've recently started > writing a project at work (or re-writing for the Nth time thanks > to changes being requested), and I chose doing the high level > work in guile just so I could say "ok done," and get back to more > important things. > > The funny thing is, thanks to guile's seamless use of arbitrarily > big numbers (its numerical tower), I don't know if I *want* to do > my number crunching in C anymore. This project is slowly becoming > 100% scheme as I remove the final bits of C from it. I've been through something similar. I've written a neuron simulator which I've used in my research for several years. Originally, a lot of things was done with C++. Guile was only used as a scripting language, gluing pieces together. Then, during the years, I tended to do more and more on the Scheme level. Now, I write almost everything in Scheme. > I have not noticed any significant penalty in performance. For me, there would be a penalty if the inner loops were on the Scheme level. Now, however, I do most computations using my Matlab-like GOOPS-based matrix library (http://kvast.blakulla.net/mdj/matrix-1.2.0.tar.gz). So, even though the algorithms are written in Guile Scheme, the overhead gets drowned by the heavy crunching vector and matrix loops on the C level in the library. (BTW, if any of the people who have written to me regarding this library and possible collaboration with other projects, I have to apologize for not getting back to you quickly. I will do that soon.) Best regards, Mikael D. _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Low level things in C or Scheme 2003-04-29 19:34 ` Low level things in C or Scheme Mikael Djurfeldt @ 2003-04-29 20:24 ` Ken Anderson 0 siblings, 0 replies; 64+ messages in thread From: Ken Anderson @ 2003-04-29 20:24 UTC (permalink / raw) Cc: jscheme-user We've had similar experience with JScheme, a Scheme in Java. Two projects i know about are 60% and 80% Scheme. In the second case we prototyped a key component in Scheme and eventually rewrote the guts of it in Java, though we still use Scheme data structures such as pairs inside. All the external control is done in Scheme. At 09:34 PM 4/29/2003 +0200, Mikael Djurfeldt wrote: >Thamer Al-Harbash <tmh@whitefang.com> writes: > >> It's funny you should talk about layering. I've recently started >> writing a project at work (or re-writing for the Nth time thanks >> to changes being requested), and I chose doing the high level >> work in guile just so I could say "ok done," and get back to more >> important things. >> >> The funny thing is, thanks to guile's seamless use of arbitrarily >> big numbers (its numerical tower), I don't know if I *want* to do >> my number crunching in C anymore. This project is slowly becoming >> 100% scheme as I remove the final bits of C from it. > >I've been through something similar. I've written a neuron simulator >which I've used in my research for several years. > >Originally, a lot of things was done with C++. Guile was only used as >a scripting language, gluing pieces together. Then, during the years, >I tended to do more and more on the Scheme level. > >Now, I write almost everything in Scheme. > >> I have not noticed any significant penalty in performance. > >For me, there would be a penalty if the inner loops were on the Scheme >level. Now, however, I do most computations using my Matlab-like >GOOPS-based matrix library >(http://kvast.blakulla.net/mdj/matrix-1.2.0.tar.gz). So, even though >the algorithms are written in Guile Scheme, the overhead gets drowned >by the heavy crunching vector and matrix loops on the C level in the >library. Sounds nice. Do the xlisp-stat guys know about this? >(BTW, if any of the people who have written to me regarding this >library and possible collaboration with other projects, I have to >apologize for not getting back to you quickly. I will do that soon.) > >Best regards, >Mikael D. > > >_______________________________________________ >Guile-user mailing list >Guile-user@gnu.org >http://mail.gnu.org/mailman/listinfo/guile-user _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Low level things in C or Scheme [was Stupid module and pregexp questions] 2003-04-29 17:35 ` Thamer Al-Harbash 2003-04-29 19:34 ` Low level things in C or Scheme Mikael Djurfeldt @ 2003-04-30 4:27 ` Robert Uhl 2003-04-30 13:27 ` Thamer Al-Harbash 2003-04-30 6:39 ` tomas 2 siblings, 1 reply; 64+ messages in thread From: Robert Uhl @ 2003-04-30 4:27 UTC (permalink / raw) Thamer Al-Harbash <tmh@whitefang.com> writes: > > The funny thing is, thanks to guile's seamless use of arbitrarily big > numbers (its numerical tower), I don't know if I *want* to do my > number crunching in C anymore. No offense to any of the developers, but guile has more of a numerical ash-heap than a numerical tower. I tried to fix it parts of it (particularly, number->string's ridiculous insistence on base-10 for fractions), but had issues with the compile and could get no assistance. Don't even get me started on how the fractions don't stay fractions. For a good look at a good numerical tower, take a look at umb-scheme. It does a decent job (although it does not render (sqrt (/ 16 25)) as 4/5 but rather as 0.8). It also can calculate up to at least (factorial 5547), while Guile gets a stack overflow. Guile is very good at many things, but its numerical tower is not the best, > I have not noticed any significant penalty in performance. I've been quite surprised by how fast Guile can be, and in fact how what should have been optimisations in my Scheme can slow it down, it seems. It's a rather remarkable beast, and one whose entrails I'd dearly like to plumb. I _really_ want to add the ability for number->string to work with non-decimal bases. Ten is such an _ugly_ base; twelve is much better. -- Robert Uhl <ruhl@4dv.net> Cristo Resucitado! En Verdad Resucitado! _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Low level things in C or Scheme [was Stupid module and pregexp questions] 2003-04-30 4:27 ` Low level things in C or Scheme [was Stupid module and pregexp questions] Robert Uhl @ 2003-04-30 13:27 ` Thamer Al-Harbash 0 siblings, 0 replies; 64+ messages in thread From: Thamer Al-Harbash @ 2003-04-30 13:27 UTC (permalink / raw) Cc: guile-user On Wed, 29 Apr 2003, Robert Uhl wrote: > No offense to any of the developers, but guile has more of a numerical > ash-heap than a numerical tower. I tried to fix it parts of it > (particularly, number->string's ridiculous insistence on base-10 for > fractions), Aah. My calculations are fairly simple. I just need to keep track of bandwidth on some high speed links. Thus as long as I can add up to enormous numbers in scheme and do some very basic divisions it works. Thanks for the heads up though. AFAIK, guile -current is using GMP which will resolve this issue? -- Thamer Al-Harbash http://www.whitefang.com/ (if (> pressure too-much-pressure) 'flame 'work) _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Low level things in C or Scheme [was Stupid module and pregexp questions] 2003-04-29 17:35 ` Thamer Al-Harbash 2003-04-29 19:34 ` Low level things in C or Scheme Mikael Djurfeldt 2003-04-30 4:27 ` Low level things in C or Scheme [was Stupid module and pregexp questions] Robert Uhl @ 2003-04-30 6:39 ` tomas 2 siblings, 0 replies; 64+ messages in thread From: tomas @ 2003-04-30 6:39 UTC (permalink / raw) Cc: guile-user On Tue, Apr 29, 2003 at 01:35:21PM -0400, Thamer Al-Harbash wrote: > On Tue, 29 Apr 2003 tomas@fabula.de wrote: > > > Of course, if we can have our cake and eat it, I'm all for it, but I'm > > comfortable with the idea of a layered system where you do the low-level > > things in one language and the high-level things in another. It correlates > > quite well with the layering of software and thus feels (to me) very > > natural. > > It's funny you should talk about layering. I've recently started > writing a project at work (or re-writing for the Nth time thanks > to changes being requested), and I chose doing the high level > work in guile just so I could say "ok done," and get back to more > important things. > > The funny thing is, thanks to guile's seamless use of arbitrarily > big numbers (its numerical tower), I don't know if I *want* to do > my number crunching in C anymore. This project is slowly becoming > 100% scheme as I remove the final bits of C from it. > > I have not noticed any significant penalty in performance. That's good news -- and as MJ Ray and me discussed off list, writing everything in Scheme makes the application much more hackable (remember the hacktivation energy?). I would just argue for considering well-defined ``library'' stuff, like bignums, regexps, matrix algebra, what not, for implementation in a ``lower layer''. And then to design a good interface (since it'll be more static, much care has to go into that). And then to reconsider. And then may be to do it. Performance -- well, only if you are forced to :-) Regards -- tomas _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-28 16:06 ` Rob Browning 2003-04-28 16:44 ` MJ Ray 2003-04-28 17:53 ` tomas @ 2003-04-29 0:45 ` Robert Uhl 2003-04-29 22:06 ` MJ Ray 2 siblings, 1 reply; 64+ messages in thread From: Robert Uhl @ 2003-04-29 0:45 UTC (permalink / raw) Rob Browning <rlb@defaultvalue.org> writes: > > In the end, I'd just like to have a powerful regex lib whose syntax > and behavior is invariant across all the platforms on which I'm likely > to run guile. That's useful for some things (when the developer writes the regexps), but for others it's not so good, e.g. when the _user_ writes the regexps. The user probably wants whatever he's locally used to... And then, of course, there's the issue of speed. Regexps are used for enough processing that IMHO they must be matched by compiled, not interpreted, code or they risk being unacceptably slow. Of course, this matter less for some domains than others:-) -- Robert Uhl <ruhl@4dv.net> Let the heavens be glad as is meet, and let the earth rejoice; and let the whole world, visible and invisible, keep festival, for Christ, the eternal gladness, is risen. _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-29 0:45 ` Stupid module and pregexp questions Robert Uhl @ 2003-04-29 22:06 ` MJ Ray 2003-04-29 23:21 ` Tom Lord 2003-04-30 4:38 ` Robert Uhl 0 siblings, 2 replies; 64+ messages in thread From: MJ Ray @ 2003-04-29 22:06 UTC (permalink / raw) Robert Uhl <ruhl@4dv.net> wrote: > That's useful for some things (when the developer writes the regexps), > but for others it's not so good, e.g. when the _user_ writes the > regexps. The user probably wants whatever he's locally used to... Basic, Extended, Perl, ... this probably is general, not just ours. > And then, of course, there's the issue of speed. Regexps are used for > enough processing that IMHO they must be matched by compiled, not > interpreted, code or they risk being unacceptably slow. [...] Compiled code is just interpreted code at a different level, surely? A good optimisation will often beat dropping down levels, and scheme allows easier optimisation while avoiding some typical errors. Do the minimum directly in C, IMHO. -- MJR http://mjr.towers.org.uk/ IM: slef@jabber.at This is my home web site. This for Jabber Messaging. How's my writing? Let me know via any of my contact details. _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-29 22:06 ` MJ Ray @ 2003-04-29 23:21 ` Tom Lord 2003-04-30 0:04 ` Ken Anderson ` (2 more replies) 2003-04-30 4:38 ` Robert Uhl 1 sibling, 3 replies; 64+ messages in thread From: Tom Lord @ 2003-04-29 23:21 UTC (permalink / raw) Cc: guile-user > And then, of course, there's the issue of speed. Regexps are used for > enough processing that IMHO they must be matched by compiled, not > interpreted, code or they risk being unacceptably slow. [...] Compiled code is just interpreted code at a different level, surely? A good optimisation will often beat dropping down levels, and scheme allows easier optimisation while avoiding some typical errors. Do the minimum directly in C, IMHO. I have some experience in regexp implementation, so may I offer my $0.02? a) In popular use on unix, fairly low performance matchers dominate with the exception of apps like awk and grep. But when I say "fairly low performance" -- please don't misunderstand -- I'm oversimplifying. What really dominates are matchers that are pretty slow on patterns that are at all tricky, but that optimize some common cases like a constant-string pattern and patterns close to that. It mostly just doesn't occur to people to use regexps for problems that fall outside of those common cases. b) A really fast and general matcher, like Rx (as in hackerlab C library, not as in the ancient fork on the GNU FTP site), opens a lot of doors. You can apply dynamically generated regexps to applications that were previously out of reach. A nice example might be to write parsers for a really rich wiki language. To my mind, opening the door to applications like that through the provision of an extra fancy regexp engine is a neat thing to do -- and is a way Guile could differentiate itself from other languages. At the same time, it takes a lot of code and it's touchy to tune -- so it risks violating the KISS principle. And, oh yeah -- you'll want shared substrings to make things really hum along nicely (ahem :-). Matchers of type (a) are butt simple to implement. (Hard to get right if you want to nail all the details to be Posix conformant, but the algorithms are pretty straightforward.) People have gotten a lot of milage by writing some in Java. I don't see any reason why a Scheme version couldn't be competitive -- but really, you'd need it to be compiled and to have at least a few relevant compiler optimizations. Matchers of type (b) are hard to implement. Really hard. And the competition among them boils down to O(1-3) instructions in critical loops, and to fast paths through exceptional but, alas, important non-common cases. There isn't a practical chance in hell of any Scheme compiler competing in this space in the next 5 years (at least). The compiler theory is arguably there -- but its practical application is quite a ways off. So whether you prefer (a) or (b), if you want competitive performance, you're stuck doing some compiler hacking if you choose to use an engine in Scheme. A serious but tractable amount of compiler hacking for (a) (catch up to Java, at least), a researchy amount for (b). So, "do the minimum in C": that means write the regexp engine in C, for all practical purposes. Say: suppose I implement an Emacs buffer-like string type either as a gap buffer or, better, as a tree of some sort (perhaps a splay tree) -- a good question for your regexp engine is "can it handle a string that isn't contiguous in memory like that". -t (here's a snippet of what I do with mine -- the SRE-like expressions get compiled down to an extended version of Posix extended regexp syntax:) (begin (define wiki-paragraph-rules ;; (type test separator) ;; ;; Test is a structured regexp to be compiled in a larg `(| ...)' ;; of all of the test expressions. The leftmost-longest matching ;; test expression determines the type of the first paragraph in a ;; given string. ;; ;; `(separator string)' returns a list: `(paragraph remaining-string)', ;; separating the first paragraph from the rest of the string. ;; ;; The `test' expression and `separator' procedure can safely assume ;; that the string is not empty, and does not begin with any blank lines. ;; `((:form-feed (& (* ([] blank)) "\f" (* ([] blank))) ,(lambda (s) (one-line-separator s))) (:comment-line (& "%%%"(* ([^] "\n"))) ,(lambda (s) (one-line-separator s))) (:rfc822ish (& (* ([] blank)) "+++" (* ([] blank))) ,(lambda (s) (ordinary-paragraph-separator s))) (:title (& "!" (+ ([^] "\n"))) ,(lambda (s) (ordinary-paragraph-separator s))) (:card-boundary "\f---" ,(lambda (s) (one-line-separator s))) (:heading (& (* ([] blank)) (+ "*") ([] blank) ([^] ")#*\n") (* ([^] "\n"))) ,(lambda (s) (ordinary-paragraph-separator s))) (:menu (& (* ([] blank)) "-*-*-" (* ([] blank))) ,(lambda (s) (one-line-separator s))) (:verbatim (& (* ([] blank)) "<<<" (* ([] blank))) ,(lambda (s) (verbatim-paragraph-separator s))) (:small-paragraph (& (* ([] blank)) "(((" (* ([] blank))) ,(lambda (s) (small-paragraph-separator s))) (:text-area (& (* ([] blank)) "?<<<" (* ([^] #\nl))) ,(lambda (s) (verbatim-paragraph-separator s))) (:one-line-verbatim (& "#" (* ([^] "\n"))) ,(lambda (s) (one-line-separator s))) (:separator-line (& (* ([] blank)) "---" (* "-") (* ([] blank))) ,(lambda (s) (one-line-separator s))) [....] _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-29 23:21 ` Tom Lord @ 2003-04-30 0:04 ` Ken Anderson 2003-04-30 6:48 ` tomas 2003-05-05 5:11 ` Rob Browning 2 siblings, 0 replies; 64+ messages in thread From: Ken Anderson @ 2003-04-30 0:04 UTC (permalink / raw) Cc: markj At 04:21 PM 4/29/2003 -0700, Tom Lord wrote: > > And then, of course, there's the issue of speed. Regexps are used for > > enough processing that IMHO they must be matched by compiled, not > > interpreted, code or they risk being unacceptably slow. [...] > > Compiled code is just interpreted code at a different level, > surely? A good optimisation will often beat dropping down > levels, and scheme allows easier optimisation while avoiding > some typical errors. Do the minimum directly in C, IMHO. > >I have some experience in regexp implementation, so may I offer my >$0.02? > >a) In popular use on unix, fairly low performance matchers dominate > with the exception of apps like awk and grep. But when I say > "fairly low performance" -- please don't misunderstand -- I'm > oversimplifying. What really dominates are matchers that are > pretty slow on patterns that are at all tricky, but that optimize > some common cases like a constant-string pattern and patterns close > to that. It mostly just doesn't occur to people to use regexps > for problems that fall outside of those common cases. > > >b) A really fast and general matcher, like Rx (as in hackerlab C > library, not as in the ancient fork on the GNU FTP site), opens a > lot of doors. You can apply dynamically generated regexps to > applications that were previously out of reach. A nice example > might be to write parsers for a really rich wiki language. > > To my mind, opening the door to applications like that through the > provision of an extra fancy regexp engine is a neat thing to do -- > and is a way Guile could differentiate itself from other > languages. At the same time, it takes a lot of code and it's > touchy to tune -- so it risks violating the KISS principle. > > And, oh yeah -- you'll want shared substrings to make things really > hum along nicely (ahem :-). > > >Matchers of type (a) are butt simple to implement. (Hard to get right >if you want to nail all the details to be Posix conformant, but the >algorithms are pretty straightforward.) People have gotten a lot of >milage by writing some in Java. I don't see any reason why a Scheme >version couldn't be competitive -- but really, you'd need it to be >compiled and to have at least a few relevant compiler optimizations. > >Matchers of type (b) are hard to implement. Really hard. And the >competition among them boils down to O(1-3) instructions in critical >loops, and to fast paths through exceptional but, alas, important >non-common cases. There isn't a practical chance in hell of any >Scheme compiler competing in this space in the next 5 years (at >least). The compiler theory is arguably there -- but its practical >application is quite a ways off. > >So whether you prefer (a) or (b), if you want competitive performance, >you're stuck doing some compiler hacking if you choose to use an >engine in Scheme. A serious but tractable amount of compiler hacking >for (a) (catch up to Java, at least), a researchy amount for (b). > >So, "do the minimum in C": that means write the regexp engine in C, >for all practical purposes. I think there are alternatives more people should know about. Henry Baker has a "Pragmatic parsing" paper: http://citeseer.nj.nec.com/henry91pragmatic.html that suggests writing tokenizers in a direct style. In terms of regular expressions, its like writing the expression in a fully deterministic way. This may sound bad at first, but i've written a tokenizer and its common lisp compiler for CORBA IDL in a variation on this style and it was smaller than Sun's input to LEX for the same task. It compiled into goto's not objects or tables, so i suspect it was relatively fast. The domain of regex matching requires a fairly narrow subset of Scheme so it might be possible to do fairly well, but it ultimately depends on how good the compiler is. At least in CL, with some work, you could get C like code produced. Another issue is when you should convert your regex from nondetermanistic form to deterministic form. I've seen Java regex that require 25 seconds to compile, so lazy compiling may pay off. >Say: suppose I implement an Emacs buffer-like string type either as a >gap buffer or, better, as a tree of some sort (perhaps a splay tree) >-- a good question for your regexp engine is "can it handle a string >that isn't contiguous in memory like that". > >-t > >(here's a snippet of what I do with mine -- the SRE-like expressions >get compiled down to an extended version of Posix extended regexp >syntax:) > >(begin > (define wiki-paragraph-rules > ;; (type test separator) > ;; > ;; Test is a structured regexp to be compiled in a larg `(| ...)' > ;; of all of the test expressions. The leftmost-longest matching > ;; test expression determines the type of the first paragraph in a > ;; given string. > ;; > ;; `(separator string)' returns a list: `(paragraph remaining-string)', > ;; separating the first paragraph from the rest of the string. > ;; > ;; The `test' expression and `separator' procedure can safely assume > ;; that the string is not empty, and does not begin with any blank lines. > ;; > > `((:form-feed (& (* ([] blank)) "\f" (* ([] blank))) > ,(lambda (s) (one-line-separator s))) > (:comment-line (& "%%%"(* ([^] "\n"))) > ,(lambda (s) (one-line-separator s))) > (:rfc822ish (& (* ([] blank)) "+++" (* ([] blank))) > ,(lambda (s) (ordinary-paragraph-separator s))) > > (:title (& "!" (+ ([^] "\n"))) > ,(lambda (s) (ordinary-paragraph-separator s))) > > (:card-boundary "\f---" > ,(lambda (s) (one-line-separator s))) > > (:heading (& (* ([] blank)) > (+ "*") > ([] blank) > ([^] ")#*\n") > (* ([^] "\n"))) > ,(lambda (s) (ordinary-paragraph-separator s))) > > (:menu (& (* ([] blank)) > "-*-*-" > (* ([] blank))) > ,(lambda (s) (one-line-separator s))) > > (:verbatim (& (* ([] blank)) "<<<" (* ([] blank))) > ,(lambda (s) (verbatim-paragraph-separator s))) > > (:small-paragraph (& (* ([] blank)) "(((" (* ([] blank))) > ,(lambda (s) (small-paragraph-separator s))) > > (:text-area (& (* ([] blank)) "?<<<" (* ([^] #\nl))) > ,(lambda (s) (verbatim-paragraph-separator s))) > > (:one-line-verbatim (& "#" (* ([^] "\n"))) > ,(lambda (s) (one-line-separator s))) > > (:separator-line (& (* ([] blank)) "---" (* "-") (* ([] blank))) > ,(lambda (s) (one-line-separator s))) > > > >[....] > > > >_______________________________________________ >Guile-user mailing list >Guile-user@gnu.org >http://mail.gnu.org/mailman/listinfo/guile-user _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-29 23:21 ` Tom Lord 2003-04-30 0:04 ` Ken Anderson @ 2003-04-30 6:48 ` tomas 2003-04-30 6:31 ` Tom Lord 2003-04-30 6:58 ` Thien-Thi Nguyen 2003-05-05 5:11 ` Rob Browning 2 siblings, 2 replies; 64+ messages in thread From: tomas @ 2003-04-30 6:48 UTC (permalink / raw) Cc: markj On Tue, Apr 29, 2003 at 04:21:10PM -0700, Tom Lord wrote: [...] > I have some experience in regexp implementation, so may I offer my > $0.02? FWIW I do appreciate yours always :-) [...] > b) A really fast and general matcher, like Rx (as in hackerlab C > library, not as in the ancient fork on the GNU FTP site), opens a > lot of doors. You can apply dynamically generated regexps to > applications that were previously out of reach. A nice example > might be to write parsers for a really rich wiki language. > > To my mind, opening the door to applications like that through the > provision of an extra fancy regexp engine is a neat thing to do -- > and is a way Guile could differentiate itself from other > languages. At the same time, it takes a lot of code and it's > touchy to tune -- so it risks violating the KISS principle. Two questions pop up: - Do you think that it's viable to build Rx into Guile? What about the licenses (as Guile is now LGPL)? - Do you think a pregexp-like interface to Rx is possible? Something along this lines would shorten the path towards a `regexp SRFI', right? Do you think ti's desirable? (some on the list think not). > And, oh yeah -- you'll want shared substrings to make things really > hum along nicely (ahem :-). Yes, I know. This issue was up on the list for quite a while. I'd be a friend of shared substrings too (this would give more freedom on string implementation), but since I don't contribute in this area I just shut up :-) Regards -- tomas _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-30 6:48 ` tomas @ 2003-04-30 6:31 ` Tom Lord 2003-04-30 6:35 ` Tom Lord 2003-10-24 21:29 ` Thien-Thi Nguyen 2003-04-30 6:58 ` Thien-Thi Nguyen 1 sibling, 2 replies; 64+ messages in thread From: Tom Lord @ 2003-04-30 6:31 UTC (permalink / raw) Cc: markj > Two questions pop up: > - Do you think that it's viable to build Rx into Guile? What about > the licenses (as Guile is now LGPL)? Rx is GPL, currently. I have a desparate need for cash. So that's one route. If I independently overcame my desparate need for cash, I'd certainly consider a Guile-friendly license agreement gratis if there was serious interest in it. But Rx ain't cheap -- it's a lot of code, and a lot of run-time memory. Alas, it isn't a "no brainer" choice -- it's a genuine case of "which trade-offs do you prefer?" So I don't mean to be saying "obviously guile should be using Rx." I only mean to report my good experiences with it in the limited context of systas scheme. > - Do you think a pregexp-like interface to Rx is possible? I'm not familiar with pregexp, specifically. Rx has a few layers. It's pretty general. It's extended take on Posix syntax is pretty flexible. I doubt there'd be any problem here. > Something > along this lines would shorten the path towards a `regexp SRFI', > right? Do you think ti's desirable? (some on the list think not). Regexps are pretty freekin fundamental. I think a SRFI is a good idea. Ironically, I think Olin's SRE's do 80% of the job :-) >> And, oh yeah -- you'll want shared substrings to make >> things really hum along nicely (ahem :-). > Yes, I know. This issue was up on the list for quite a > while. I'd be a friend of shared substrings too (this would > give more freedom on string implementation), but since I > don't contribute in this area I just shut up :-) Contribute? Heck, they were actively excised -- apparently by virtue of some (sorry, folks) misguided reasoning about the cleanliness of their semantics. (As I recall: "Gosh, if you modify a shared substring, you hose the containing string," (though, of course, that's actually useful.)) -t _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-30 6:31 ` Tom Lord @ 2003-04-30 6:35 ` Tom Lord 2003-10-24 21:29 ` Thien-Thi Nguyen 1 sibling, 0 replies; 64+ messages in thread From: Tom Lord @ 2003-04-30 6:35 UTC (permalink / raw) Cc: tomas > But Rx ain't cheap -- it's a lot of code, and a lot of > run-time memory. More precisely: It's a decent amount of run-time memory if yr running hard expressions and want optimal performance. You can limit run-time allocation pretty low (a couple 10K) and get _ok_ performance and great correctness (and good performance on very simple expressions). Basically, the space/time trade-off is reduced to a parameter that you can tweak at run-time. "whatever", -t _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-30 6:31 ` Tom Lord 2003-04-30 6:35 ` Tom Lord @ 2003-10-24 21:29 ` Thien-Thi Nguyen 2003-10-24 22:30 ` Tom Lord 1 sibling, 1 reply; 64+ messages in thread From: Thien-Thi Nguyen @ 2003-10-24 21:29 UTC (permalink / raw) Cc: guile-user [-- Attachment #1: Type: text/plain, Size: 526 bytes --] From: Tom Lord <lord@emf.net> Date: Tue, 29 Apr 2003 23:31:41 -0700 (PDT) Heck, they were actively excised -- apparently by virtue of some (sorry, folks) misguided reasoning about the cleanliness of their semantics. in guile 1.4.1.96 you can do `(use-modules (lang librgx))' to try out librx re-integration. it's even documented to some extent in the manual. below is some work-in-progress flex envy slated for 1.4.2 based on rx... thi [cc trimmed] _______________________________________________________ [-- Attachment #2: lang-examples.tar.gz --] [-- Type: application/x-gunzip, Size: 5351 bytes --] [-- Attachment #3: Type: text/plain, Size: 139 bytes --] _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-10-24 21:29 ` Thien-Thi Nguyen @ 2003-10-24 22:30 ` Tom Lord 2003-10-26 18:38 ` Thien-Thi Nguyen 0 siblings, 1 reply; 64+ messages in thread From: Tom Lord @ 2003-10-24 22:30 UTC (permalink / raw) Cc: guile-user > From: Thien-Thi Nguyen <ttn@surf.glug.org> > From: Tom Lord <lord@emf.net> > Date: Tue, 29 Apr 2003 23:31:41 -0700 (PDT) Wow, challenging my memory, eh? > Heck, they were actively excised -- apparently by virtue of some > (sorry, folks) misguided reasoning about the cleanliness of their > semantics. That was a comment I made about the removal of shared substrings. I was delighted when 1.6.4 gave me: guile> make-shared-substring #<primitive-procedure make-shared-substring> though perturbed that the source code says: #if SCM_DEBUG_DEPRECATED == 0 [...] SCM_DEFINE (scm_make_shared_substring, "make-shared-substring", 1, 2, 0, I can try to write up a "case for shared substrings" if that would be helpful. > in guile 1.4.1.96 you can do `(use-modules (lang librgx))' to try out > librx re-integration. it's even documented to some extent in the manual. > below is some work-in-progress flex envy slated for 1.4.2 based on rx... Yikes. I'm scared to ask what version of Rx you are using. You _should_ (really) be using the latest and greatest in libhackerlab, which is not currently in release. However, by shocking coincidence, I was just today semi-preparing to set-up a savannah libhackerlab project and get it back out there (separately from arch, in which it happens to be included). A nice side effect of that: systas (which I'm not planning on re-releasing anytime soon but which is trivially available in my public archives) has a nice libsystas binding for the latest and greatest rx. It'd probably take like 2hrs at most to port it to guile. Nifty code sample from systas: (define-public sans-leading-blanks (structured-regexp->procedure `(^ (* ([] blank))) :pick-spec '>)) Defines a procedure that takes a string, compares it to the given regexp, and returns a shared substring of that string. The `pick-spec' says _which_ shared substring to return. ">" means, return the shared substring that begins at the first character after the match. -t ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; Structured Regular Expressions [with apologies to Olin Shivers] ;;; ;;; ;;; A structured regexp is a recursively defined list structure. ;;; The general form is: ;;; ;;; structured-regexp := (<operator> <parameter> ...) ;;; parameter := <integer> ;;; | <character> ;;; | <string> ;;; | <keyword> ;;; | <structured-regexp> ;;; ;;; The valid operators are: ;;; ;;; operator := const ; a string constant ;;; | any ; any character ;;; | [] ; character set ;;; | [^] ; negated character set ;;; | ^ ; start anchor ;;; | $ ; end anchor ;;; | ? ; optional sub-expression ;;; | * ; repeated sub-expression ;;; | + ; non-empty, repeated sub-expression ;;; | {} ; a counted sub-expression ;;; | = ; parenthesized subexpression ;;; | & ; sub-expression concatenation ;;; | | ; alternative sub-expressions ;;; | @ ; parenthesized subexpression back-reference ;;; | / ; the "cut" operator ;;; | ! ; the symbolicly labeled "cut" operator ;;; ;;; As a short-hand, some structured regexps can be abbreviated: ;;; ;;; (const "string") == "string" ;;; (* any) == *. ;;; (^ ($ subexp)) == (^$ subexp) ;;; ;;; Each operator has its own syntax, so the precise syntax of a structured ;;; regexp is: ;;; ;;; structured-regexp := (const <string>) ;;; | ([] <character-set-element> ...) ;;; | ([^] <character-set-element> ...) ;;; | (^ <structured-regexp> ...) ;;; | ($ <structured-regexp> ...) ;;; | (? <structured-regexp> ...) ;;; | (* <structured-regexp> ...) ;;; | (+ <structured-regexp> ...) ;;; | ({} <integer> <integer> <structured-regexp> ...) ;;; | (& <structured-regexp> ...) ;;; | (| <structured-regexp> ...) ;;; | (= [<subexpression-label>] <structured-regexp> ...) ;;; | (@ <subexpression-label>) ;;; | (/ <integer>) ;;; | (! [<cut-label>] <structured-regexp> ...) ;;; ;;; character-set-element := string ;;; | character ;;; | (character . character) ; a range of characters ;;; | <character-set> ; see the `(standard char-set-lib)' module ;;; ;;; subexpresion-label := <keyword> ; (a keyword) ;;; cut-label := <keyword> ; (a keyword) ;;; ;;; A `pick-spec' specifies values to be returned from `regexec' or a ;;; procedure returned by `regexec-function'. It has the form: ;;; ;;; pick-spec := #f ; return #t if a match is found, #f otherwise ;;; ;;; | #t ; return #f or a list `(before match after)' ;;; ; that is the partition of the string implied ;;; ; by a successful match ;;; ;;; | <recursive-pick-spec> ;;; ;;; ;;; A `recursive-pick-spec' is: ;;; ;;; recursive-pick-spec := <rps-elt> ; return only the value implied by `rps-elt' ;;; | (<rps-elt> ...) ; return a list of values implied by ;;; ; the list of `rps-elt's. ;;; ;;; An `rps-elt' is: ;;; ;;; rps-elt := <part> ; return the indicated part of the string ;;; ; (see below) ;;; ;;; | (<start-point> <end-point>) ; return the substring starting ;;; ; at `<start-point>' and ending immediately ;;; ; before `<end-point>' (see below) ;;; ;;; ;;; | state-label ; return the state label of the DFA ending ;;; ; state. If the match terminated at a `cut' ;;; ; operator (`/' in sre notation), this is ;;; ; the integer argument to that operator. ;;; ;;; | ? ; the keyword of the terminating cut label or #f ;;; ;;; | <keyword> ; return the keyword literally. This is useful ;;; ; for labeling elements in a `recursive-pick-spec' ;;; ; which is a list. ;;; ;;; A `part' indicates the entire match, a parenthesized ;;; subexpression, or the substring that preceeds a match, or the ;;; substring that follows a match: ;;; ;;; part := 0 ; the entire match ;;; ;;; | <n> ; (an integer) the `nth' parenthesized subexpression ;;; ;;; | (@ <keyword>) ; the subexpression labeled by `<keyword>' ;;; ;;; | < ; (the symbol '<') the substring preceeding the match ;;; ;;; | > ; (the symbol '>') the substring following the match ;;; ;;; A `point' indicates a specific position within the string. There ;;; are two kinds of `point': a `start-point' and and `end-point' that together ;;; specify a substring of the string: ;;; ;;; start-point := <part> ; the beginning of the indicated match part. ;;; | <any-point> ; (see below) ;;; ;;; end-point := <part> ; the end of the indicated match part. ;;; | <any-point> ; (see below) ;;; ;;; any-point := (<part> 0) ; the beginning of the indicated match part ;;; | (<part> 1) ; the end of the indicated match part ;;; ;;; ;;; An example pick spec that returns a list of substrings of the original string: ;;; ;;; (0 ; the entire match ;;; ;;; (< 0) ; from the start of the string to the end of the match ;;; ;;; (2 >) ; from the start of subexpression 2 to the end of the string ;;; ;;; (@ :username) ; the subexpression labeled `:username' ;;; ;;; ((@ :username) ; from the start of the subexpression labeled `:username' ;;; (@ :directory)) ; ... to the end of the subexpression labeled `:directory' ;;; ;;; ((2 1) ; from the end of subexpression 2 ;;; ((@ :directory) 0))) ; ... to the beginning of the subexpression labeled :directory ;;; ;;; _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-10-24 22:30 ` Tom Lord @ 2003-10-26 18:38 ` Thien-Thi Nguyen 0 siblings, 0 replies; 64+ messages in thread From: Thien-Thi Nguyen @ 2003-10-26 18:38 UTC (permalink / raw) Cc: guile-user From: Tom Lord <lord@emf.net> Date: Fri, 24 Oct 2003 15:30:05 -0700 (PDT) I can try to write up a "case for shared substrings" if that would be helpful. that's not necessary, 1.4.1.x will never remove shared substrings. if there are design problems down the road, they will be solved keeping shared substrings. (your help at that time would be very welcome.) Yikes. I'm scared to ask what version of Rx you are using. You _should_ (really) be using the latest and greatest in libhackerlab, which is not currently in release. i think i'll stick w/ what i have for 1.4.2, and then look at incorporating the latest Rx later (first the framework, and then the implementation improvements). However, by shocking coincidence, I was just today semi-preparing to set-up a savannah libhackerlab project and get it back out there (separately from arch, in which it happens to be included). cool. A nice side effect of that: systas (which I'm not planning on re-releasing anytime soon but which is trivially available in my public archives) has a nice libsystas binding for the latest and greatest rx. It'd probably take like 2hrs at most to port it to guile. [sre example] do it, man. at some point after 1.4.2 i will take (this branch of) guile strict GPL and reap all the goodies out there. thi _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-30 6:48 ` tomas 2003-04-30 6:31 ` Tom Lord @ 2003-04-30 6:58 ` Thien-Thi Nguyen 2003-04-30 10:34 ` tomas 1 sibling, 1 reply; 64+ messages in thread From: Thien-Thi Nguyen @ 2003-04-30 6:58 UTC (permalink / raw) Cc: guile-user From: tomas@fabula.de Date: Wed, 30 Apr 2003 08:48:40 +0200 - Do you think that it's viable to build Rx into Guile? What about the licenses (as Guile is now LGPL)? see http://www.glug.org/alt/guile-1.4.1.93.tar.gz (1.4.2 precursor). included are the (lang *) scheme modules that make use of Rx. back "ports" from hackerlab Rx welcome. thi _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-30 6:58 ` Thien-Thi Nguyen @ 2003-04-30 10:34 ` tomas 2003-04-30 17:11 ` Tom Lord 2003-10-24 21:45 ` Thien-Thi Nguyen 0 siblings, 2 replies; 64+ messages in thread From: tomas @ 2003-04-30 10:34 UTC (permalink / raw) Cc: tomas On Wed, Apr 30, 2003 at 02:58:25AM -0400, Thien-Thi Nguyen wrote: > From: tomas@fabula.de > Date: Wed, 30 Apr 2003 08:48:40 +0200 > > - Do you think that it's viable to build Rx into Guile? What about > the licenses (as Guile is now LGPL)? > > see http://www.glug.org/alt/guile-1.4.1.93.tar.gz (1.4.2 precursor). > included are the (lang *) scheme modules that make use of Rx. back > "ports" from hackerlab Rx welcome. Uh, oh. What I'd like to see would be a `standard' regexp interface for Scheme (and may be a `standard' regexp implementation for Guile). This would boost all those portable programs which nowadays try to parse strange things the hard way. Regards -- tomas _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-30 10:34 ` tomas @ 2003-04-30 17:11 ` Tom Lord 2003-05-06 9:50 ` tomas 2003-10-24 21:45 ` Thien-Thi Nguyen 1 sibling, 1 reply; 64+ messages in thread From: Tom Lord @ 2003-04-30 17:11 UTC (permalink / raw) Cc: tomas > Uh, oh. What I'd like to see would be a `standard' regexp > interface for Scheme (and may be a `standard' regexp > implementation for Guile). > This would boost all those portable programs which nowadays > try to parse strange things the hard way. I realize it's a bit cliche but: the nice thing about regexp standards is that there are so many to choose from. Just to throw out some observations: Posix regexps are pretty well specified, big subsets of them have the nice mathematical properties we expect of regular expressions, and they are _fairly_ friendly to DFA based matchers. There are a bunch of decent test suites for them and, as of last year, one huge test suite that combines all of those and adds some new tests. Alas, there are a lot of buggy implmentations of them, and one apparently irreconcilable disagreement about interpreting the spec (is regexp concatenation left associative or right associative?). I've always thought of Perl regexps as a kind of moving target and the first-found semantics are not so mathematically nice (so are they really "Schemey"? and does that matter). They are easier to implement but first-found doesn't mix too well with DFA techniques. There are lots of implementations of variations on Perl regexps. Lots of programmers are familiar with them. Some of their non-regular extensions (e.g., look-ahead) are apparently pretty convenient. The Unicode consortium publishes a tech report that almost specifies a really minimalist, very clean regular expression language (particularly addressing how character sets should work given the size and complexity of Unicode). The regular expression language in the XML Schema standard is based on this. It's a bit low on expressive power compared to Posix and Perl -- but the limitations make it a DFA-friendly regexp language (you can always see if a given string matches with a single pass over the string). GNU Emacs has its own little language: roughly a Posix-like syntax and feature set, with a Perl-like semantics. It's chief virtue is a simple and long-lived implementation with syntax options and the option to do matching with strict Posix semantics. It's chief drawback is speed -- it's frustratingly too slow for some applications that would otherwise be natural in an extensible editor. -------- I think using Perl or Emacs expressions as the primary type of regexp in Guile would be a mistake. Since they're mathematically icky, there's less you can do to process them automatically. Since they're mostly DFA-unfriendly, they don't scale too well to performance-intensive applications of non-trivial expressions. If someone needs Perl expressions (because, say, their application has to read in expressions from some data file that Perl apps might also read), can't they just add a trivial Guile binding for them? Posix regexps would have some advantages. They can scale _pretty_ well (if you avoid submatch reporting, backrefs, and context operators). They're reasonably clean. You can do tricks like calling `grep' or `sed' as a subprocess with a Posix expression, then use that same expression internally. But also disadvantages: the backreference operator, context operators, and sub-match position reporting features are hard to implement correctly and are DFA-unfriendly (meaning that they can force backtracking to take place either in time (lots of string passes) or space (lots of ancillary data during a sing string pass)). As a historic note: my impression is that context, sub-match reporting and backrefs happened to be easy to add to an early NFA-based implementation. I doubt that the early forms had the strict recursively-leftmost-longest semantics of the current standard. In other words, I think it's just an unfortunate accident that Posix regexps have these features, and strongly suspect that when those features were first standardized, there were exactly 0 correct implementations. The standard authors, I'm guessing, simply underestimated the difficulty and cost of getting them right as specified. If one had the freedom to start fresh and design regexp tools today, I don't think the result would look much like Posix regexps. ----- So what does that leave? Something really minimalist like the XML Schema regular expression language -- a language that's less expressive, but mathematically clean and very DFA-friendly. In a non-lisp language, that minimalism would be a problem. In something like Perl, it seems to me, the regexp language grows essentially to embed control structures that would be hard or at least tedious to express directly in regular Perl code. It's as if there's an embedded language in Perl just for writing pattern-matching "loops" that use more traditional regular expressions as primitive statements. But in lisp languages: well, that's part of what macros and higher-order functions are for -- building little languages. So my vague idea is: pick a truly minimalist, DFA-friendly pattern language. Leave out anchors ("^" and "$"), context operators (e.g. "match the empty string if followed by whitespace"), sub-match position reporting, and backreferences. Provide primitives like "return the length of the shortest/longest match starting here", "return #t if any match starts here". Add features like state-labeling (let users attach data to NFA states) and interfaces like "run the DFA for the next 10 characters of this string" and "return a list of all the NFA state labels of the current state". Next, invent little languages on top of that to take over the role of backrefs, sub-match reporting, context operators and so forth. Things like sub-match reporting should probably _not_ have the same semantics as in Posix -- but should be designed so that scans can avoid backtracking. Finally, design a surface syntax that can compete with the Posix or Perl syntax, but that "compiles" fairly trivially into Scheme expressions over the minimalist expression language. In other words, even though the minimalist language would lack anchors or submatches, a user typing a pattern into a dialog box for a regexp-search or regexp-query-replace command should have _some_ access to anchoring and submatches without having to resort to typeing s-exps. Maybe that suggestion, to choose a minimalist, truly regular regular expression language -- then do the rest in scheme -- satisfies the spirit of "do as little as possible in C". Another design dimension to consider: what are Guile's plans re: Unicode? -t _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-30 17:11 ` Tom Lord @ 2003-05-06 9:50 ` tomas 2003-05-06 9:28 ` Tom Lord 0 siblings, 1 reply; 64+ messages in thread From: tomas @ 2003-05-06 9:50 UTC (permalink / raw) Cc: ttn On Wed, Apr 30, 2003 at 10:11:53AM -0700, Tom Lord wrote: > [about `standard' regexp implementation] > I realize it's a bit cliche but: the nice thing about regexp standards > is that there are so many to choose from. Just to throw out some > observations: [BTW. Thanks, Tom. You answer a question I posed to you off list] [Posix vs Perl vs Unicode cons vs Emacs regexps] [...] > Maybe that suggestion, to choose a minimalist, truly regular regular > expression language -- then do the rest in scheme -- satisfies the > spirit of "do as little as possible in C". Hm. Technically, the idea sounds quite attractive, in a way. I see several issues, though. - This leaves still the question open whether it'd be possible to have a regexp interface spec which could be fairly portable across Schemes. It might leave many things unspecified, but it would have to be powerful/specific enough that people dare to use it (when trying to write portable Scheme, that is). - If there is a possibility to provide a ``high level'' interface resembling more traditional regexp languages, I see no problem. It's this ``high level'' interface I was talking about (after all it seems pregexp does *everything* in Scheme). > Another design dimension to consider: what are Guile's plans re: > Unicode? Uh, oh. Regards -- tomas _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-05-06 9:50 ` tomas @ 2003-05-06 9:28 ` Tom Lord 2003-05-08 11:47 ` tomas 0 siblings, 1 reply; 64+ messages in thread From: Tom Lord @ 2003-05-06 9:28 UTC (permalink / raw) Cc: guile-user > From: tomas@fabula.de > > Maybe that suggestion, to choose a minimalist, truly regular regular > > expression language -- then do the rest in scheme -- satisfies the > > spirit of "do as little as possible in C". > Hm. Technically, the idea sounds quite attractive, in a way. I > see several issues, though. > - This leaves still the question open whether it'd be possible to > have a regexp interface spec which could be fairly portable > across Schemes. It might leave many things unspecified, but it > would have to be powerful/specific enough that people dare to > use it (when trying to write portable Scheme, that is). POSIX regexps are your friend, in this regard. A _subset_ of Posix regexps is a minimalist, truly regular, regular expression pattern language. So, to the (slightly problematic) extent that you can lay your hands on accurate Posix regexp engines, you can use such engines to implement the kind of Scheme regexp library I'm suggesting. > - If there is a possibility to provide a ``high level'' interface > resembling more traditional regexp languages, I see no problem. > It's this ``high level'' interface I was talking about (after all > it seems pregexp does *everything* in Scheme). It's really foolish, performance-wise, to do _all_ of a regexp engine in scheme until you can scan a string through a dfa table at <20 instructions per character. If some of the hard-core compilers are up to that, I'm impressed -- but I'm quite sure none of the interpreters are. The interpreters will be off by no less than 1, and I'd expect 2 or 3 orders of magnitude (powers of 10, here). > > Another design dimension to consider: what are Guile's plans re: > > Unicode? > Uh, oh. Tee hee. No point talking regexps there until you get characters and strings right. I've actually mapped a bunch of that stuff out: how to do strings at the C level and chars and strings at the C level. I'm starting to fear I'm getting too old to ever make it real, though. -t _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-05-06 9:28 ` Tom Lord @ 2003-05-08 11:47 ` tomas 0 siblings, 0 replies; 64+ messages in thread From: tomas @ 2003-05-08 11:47 UTC (permalink / raw) Cc: guile-user On Tue, May 06, 2003 at 02:28:21AM -0700, Tom Lord wrote: [I promised to come back after having a look at pregexp] > It's really foolish, performance-wise, to do _all_ of a regexp engine > in scheme until you can scan a string through a dfa table at <20 > instructions per character. If some of the hard-core compilers are > up to that, I'm impressed -- but I'm quite sure none of the > interpreters are. The interpreters will be off by no less than 1, > and I'd expect 2 or 3 orders of magnitude (powers of 10, here). After having a look at pregexp (in a way it's impressive: it implements a compiler/matcher for a massive, Perl-like regexp language in just over 29K of Scheme. And it's fairly readable, even for a Scheme novice like me), here's the results: - Yes, it is a classical backtracking implementation, Perl style. - It's completely done in Scheme, moving around with string-ref. - IMHO it doesn't stand a chance to compete, performance-wise with carefully written matchers (of the backtracking type: of course DFA ones are miles away, depending on input). Even with the best Scheme compilers available (I'm ready to bet my plush penguin on that ;-) Still, as an educational tool, and as a display of Scheme's expressive power, it's a jewel. I don't think it was written with efficiency in mind -- rather with clarity. Besides, it makes for a good proposal of how a regexp interface to Scheme[1] might look like. And to me, this seems to be the most important thing in this thread. ---------- [1] Of the backtracking type, that is. Regards -- tomas _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-30 10:34 ` tomas 2003-04-30 17:11 ` Tom Lord @ 2003-10-24 21:45 ` Thien-Thi Nguyen 2003-10-24 22:37 ` Tom Lord 2003-10-27 10:48 ` tomas 1 sibling, 2 replies; 64+ messages in thread From: Thien-Thi Nguyen @ 2003-10-24 21:45 UTC (permalink / raw) Cc: guile-user From: tomas@fabula.de Date: Wed, 30 Apr 2003 12:34:45 +0200 Uh, oh. What I'd like to see would be a `standard' regexp interface for Scheme (and may be a `standard' regexp implementation for Guile). supposedly scheme is more general than everything else so any (language) interface can be expressed in scheme. by twisted scotch-addled analogy i posit there exists some super general regular expression engine upon which any (regexp engine) interface can be built. maybe that would be librx, maybe not. This would boost all those portable programs which nowadays try to parse strange things the hard way. yes. a good interface is most definitely appreciated. a good architecture is one that has more than one good interface, although not all interfaces need be visible to everyone. that is the paradox: quality arises from perception but once crafted, no one need see the quality to appreciate its effect. as long as what seeds are eaten are shat out in equilibrium the system is sustainable and perhaps even beautiful. thi _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-10-24 21:45 ` Thien-Thi Nguyen @ 2003-10-24 22:37 ` Tom Lord 2003-10-26 18:47 ` Thien-Thi Nguyen 2003-10-27 10:48 ` tomas 1 sibling, 1 reply; 64+ messages in thread From: Tom Lord @ 2003-10-24 22:37 UTC (permalink / raw) Cc: guile-user, tomas > From: Thien-Thi Nguyen <ttn@surf.glug.org> > i posit there exists some super general > regular expression engine upon which any (regexp engine) > interface can be built. maybe that would be librx, maybe not. Do you _really_ want to know? It's a _large_ topic but one that does admit very precise analysis. > a good interface is most definitely appreciated. a good > architecture is one that has more than one good interface, > although not all interfaces need be visible to everyone. > that is the paradox: quality arises from perception but once > crafted, no one need see the quality to appreciate its effect. > as long as what seeds are eaten are shat out in equilibrium > the system is sustainable and perhaps even beautiful. Fully agreed, Chauncy, but what do we do about text pattern languages that require backtracking (but don't obviously lack linear alternatives of comperable expressiveness (but are dirt simple to implement)) and what do we do about regular expression engines that think they are only about contiguous text? -t _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-10-24 22:37 ` Tom Lord @ 2003-10-26 18:47 ` Thien-Thi Nguyen 0 siblings, 0 replies; 64+ messages in thread From: Thien-Thi Nguyen @ 2003-10-26 18:47 UTC (permalink / raw) Cc: guile-user From: Tom Lord <lord@emf.net> Date: Fri, 24 Oct 2003 15:37:29 -0700 (PDT) Do you _really_ want to know? It's a _large_ topic but one that does admit very precise analysis. for me, to know is not as important as how i come to know (that is, the learning process). one of these days i'll finish writing the program that will automatically produce the "illustrated guide to regular expression engine internals" (as a specialization of the more general illustrated guide to digital logic specification and evaluation). text pattern languages that require backtracking (but don't obviously lack linear alternatives of comperable expressiveness (but are dirt simple to implement)) and what do we do about regular expression engines that think they are only about contiguous text? if i understand you correctly, this illustrated guide will have to admit several fundamental regexp engines; there is no unifying architecture possible. (shrug.) that's fine, too. so it goes. thi [cc trimmed] _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-10-24 21:45 ` Thien-Thi Nguyen 2003-10-24 22:37 ` Tom Lord @ 2003-10-27 10:48 ` tomas 1 sibling, 0 replies; 64+ messages in thread From: tomas @ 2003-10-27 10:48 UTC (permalink / raw) Cc: guile-user, tomas On Fri, Oct 24, 2003 at 11:45:59PM +0200, Thien-Thi Nguyen wrote: > From: tomas@fabula.de > Date: Wed, 30 Apr 2003 12:34:45 +0200 > > Uh, oh. What I'd like to see would be a `standard' regexp > interface for Scheme (and may be a `standard' regexp > implementation for Guile). > > supposedly scheme is more general than everything else so any > (language) interface can be expressed in scheme. by twisted > scotch-addled analogy i posit there exists some super general > regular expression engine upon which any (regexp engine) > interface can be built. maybe that would be librx, maybe not. Hm. All our sins come back to haunt us ;-) I had a more humble target: just have an interface (for which we'd have to decide, among lots of other things, whether this interface forbids higher levels in the Chomsky hierarchy and enforces `true' regexps, as seems to be Tom's approach --and he knows a lot about regexps--) or not). Cheers -- tomas _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-29 23:21 ` Tom Lord 2003-04-30 0:04 ` Ken Anderson 2003-04-30 6:48 ` tomas @ 2003-05-05 5:11 ` Rob Browning 2003-05-05 6:18 ` Tom Lord 2 siblings, 1 reply; 64+ messages in thread From: Rob Browning @ 2003-05-05 5:11 UTC (permalink / raw) Cc: markj Tom Lord <lord@emf.net> writes: > I have some experience in regexp implementation, so may I offer my > $0.02? Much appreciated, in fact. My feeling is that having a sophisticated and potentially fast system like the one you have described could be quite valuable, but I think it may still be useful to many people to have built-in support for a well known regular expression syntax as well, perhaps perl (via libpcre), elisp, or POSIX. This would afford a standard syntax that many people are already familiar with, and one that's fast enough for a substantial number of jobs. With respect to which syntax we might choose, I don't really have a strong preference for one or the other, but my default inclination might be libpcre since it's the syntax that both perl and python support and since the author has offered to let us use the source. In any case, thanks for all the information. -- Rob Browning rlb @defaultvalue.org, @linuxdevel.com, and @debian.org Previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-05-05 5:11 ` Rob Browning @ 2003-05-05 6:18 ` Tom Lord 2003-05-05 7:47 ` Rob Browning 2003-10-24 22:26 ` Thien-Thi Nguyen 0 siblings, 2 replies; 64+ messages in thread From: Tom Lord @ 2003-05-05 6:18 UTC (permalink / raw) Cc: guile-user >> I have some experience in regexp implementation, so may I offer my >> $0.02? > Much appreciated, in fact. Thanks. Let me therefore use up my good karma and overstep my bounds: > With respect to which syntax we might choose, I don't really have a > strong preference for one or the other, but my default inclination > might be libpcre since it's the syntax that both perl and python > support You can not lead by following. You can not hack without understanding. </Thien-Thi-mode> (Ahem!) The distinction between PCRE and other matchers (posix matchers in genral, Rx specifically - is not _syntactic_. It's semantic and has deep implications for implementation techniques and performance, in both short and long time frames. So, choices you make today, assuming that guile persists and spreads, have _long_ term consequences. Now, to be sure, a little compatibility here and there can get people to make leaps from here to there. I could point, for example, to emacs' "posix-looking-at": it's a compatibility hack that's "off to the side", but there when you need it. Guile-dialect regexp choices should be (imho) no less casual than, say, number-tower choices. -t _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-05-05 6:18 ` Tom Lord @ 2003-05-05 7:47 ` Rob Browning 2003-05-05 17:33 ` Tom Lord 2003-10-24 22:26 ` Thien-Thi Nguyen 1 sibling, 1 reply; 64+ messages in thread From: Rob Browning @ 2003-05-05 7:47 UTC (permalink / raw) Cc: guile-user Tom Lord <lord@emf.net> writes: > (Ahem!) The distinction between PCRE and other matchers (posix > matchers in genral, Rx specifically - is not _syntactic_. It's > semantic and has deep implications for implementation techniques and > performance, in both short and long time frames. So, choices you make > today, assuming that guile persists and spreads, have _long_ term > consequences. Sure, but a counter argument would be that just guaranteeing that we have elisp, or perl, or "well defined POSIX" (perhaps Rx[1]) regular expressions available (for example) doesn't say anything positive or negative about what *else* we might have available, and it does mean that anyone that is familiar and comfortable with whichever one we might pick can reach for guile more easily whenever they want to get something done (something they already know how to do). They can always learn the better thing later, once we have it. Note that it's possible I'm trying to fix something that isn't broken here. If all (or nearly all) of the libs that guile might choose to link against for (ice-9 regex) on various platforms are consistent with each other, then I should perhaps withdraw my suggestions. I was just under the impression that they vary substantially, and wanted to have at least one familiar regex subsystem in the core that eliminated the variance. Also, if one of the main things you're arguing is that perl and emacs-style regexes have extensions that we need to do without if we want good performance, then I'm not trying to argue against that assertion. I'm really just ruminating on the advisability of a well-defined, invariant, and reasonably familiar regex syntax for guile's core. I'd probably be perfectly happy with a good POSIX implementation in the core, perhaps even with a subset of POSIX if dropping certain bits were somehow important... Thanks again [1] Of course, I completely understand if you don't feel you're in a position to make that available for inclusion ATM. -- Rob Browning rlb @defaultvalue.org, @linuxdevel.com, and @debian.org Previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-05-05 7:47 ` Rob Browning @ 2003-05-05 17:33 ` Tom Lord 2003-05-05 19:37 ` Rob Browning 0 siblings, 1 reply; 64+ messages in thread From: Tom Lord @ 2003-05-05 17:33 UTC (permalink / raw) Cc: guile-user > If all (or nearly all) of the libs that guile might choose to > link against for (ice-9 regex) on various platforms are > consistent with each other, They are not. > then I should perhaps withdraw my > suggestions. I was just under the impression that they vary > substantially, and wanted to have at least one familiar regex > subsystem in the core that eliminated the variance. You might want to ask RMS if he has a version of GNU regex that is reasonably correct these days. I recall that he did some work on that last year -- but that might have been just for the version in Emacs. GNU regex has the virtues of being a very small implementation with standard Posix entry points, and the drawback (for _some_ apps) of using a backtracking-based implementation. That would be a net win for (ice-9 regex): (a correct) GNU regex (if you can scare one up) is small enough to distribute with Guile, and then those people who need a faster version need only find one that offers posix entry points. Be careful because I think that most versions of GNU regex in circulation (many slightly different versions, all labeled "0.12") have subtle bugs. In src/hackerlab/tests/rx-posix-tests (e.g., in arch-1.0pre18) there are some conformance tests for Posix matchers and a driver program "=generic-test-regex.c" that you can use to run them. We were slightly talking at cross purposes: when I concluded that you should really consider starting with a minimalist truly-regular pattern language, I was thinking beyond just (ice-9 regex) -- for example, what would we want in a regex srfi? Then, when people start talking about using Perl regexps to be "compatible" to Perl and Python: well, what's the point of that? Some of the biggest wins for using Scheme as the basis of an app-framework/extension/scripting language are the opportunities to provide a far superior programming environment based on a language that a fair number of implementors work on making quite fast. > Also, if one of the main things you're arguing is that perl > and emacs-style regexes have extensions that we need to do > without if we want good performance, then I'm not trying to > argue against that assertion. That's not quite what I'm saying. It isn't about "extensions": it's about the basic semantics of regexp operators like | and *. Some regexp languages (Emacs is one example, Javascript's another) are defined in terms of backtracking implementations. They aren't "regular expression" engines in the theoretical CS sense at all. They don't provide equivalent DFA's even for very simple patterns like "else|elseif". You can't build a program like "lex" to statically compile their patterns for a linear scanner. You can't build a dynamic DFA engine like Rx. Those pattern languages simply don't scale to large problems (font lock mode, anyone?) and don't have nice simple algebraic properties for automatic manipulation of patterns (SREs, anyone?). The extensions in those languages are a little bit relevant: they tend to be extensions that are easy to implement given a backtracking matcher as a starting point, and basically impossible to implement (efficiently) in DFA-based implementations. So the extensions compound the problem -- but the root of the problem is the semantics of very basic operators. (Subexpression position reporting, anchors, and backreferences in Posix appear to have started out as similar backtracking-oriented extensions. When Posix was standardized, those extensions were given semantics wrongly presumed to be DFA-friendly. As a result, Posix regexps are in some sense the worst of both worlds: they force backtracking matchers to do nearly exhaustive searches and they force DFA-based implementations to add a complicated layer of weird-ass recursive backtracking.) > I'm really just ruminating on the advisability of a > well-defined, invariant, and reasonably familiar regex > syntax for guile's core. To my ears, that's different from just thinking about what backs up (ice-9 regex). That should be more at the level of "What would a good regex SRFI do?" And I stand by my answer there: none of the standard regexp languages are any good, though the XML Schema version comes closest. A pattern language that managed to be truly a regular expression language in the CS-theory sense is the only sane basis for something like an SRE approach. > I'd probably be perfectly happy with a good POSIX > implementation in the core, perhaps even with a subset of > POSIX if dropping certain bits were somehow important... As I said, ask about for a working GNU regex for (ice-9 regex). I think that's the only practical choice you (might) have. But I encourage you also to have higher ambitions and just point to my experience using Rx in systas. Having a dynamic DFA engine in there enabled a bunch of neat applications that would be simply out of reach in many environments. -t _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-05-05 17:33 ` Tom Lord @ 2003-05-05 19:37 ` Rob Browning 2003-05-05 20:19 ` Tom Lord 0 siblings, 1 reply; 64+ messages in thread From: Rob Browning @ 2003-05-05 19:37 UTC (permalink / raw) Cc: guile-user Tom Lord <lord@emf.net> writes: > We were slightly talking at cross purposes: when I concluded that you > should really consider starting with a minimalist truly-regular > pattern language, I was thinking beyond just (ice-9 regex) -- for > example, what would we want in a regex srfi? > > ... > > To my ears, that's different from just thinking about what backs up > (ice-9 regex). That should be more at the level of "What would a > good regex SRFI do?" And I stand by my answer there: none of the > standard regexp languages are any good, though the XML Schema version > comes closest. A pattern language that managed to be truly a regular > expression language in the CS-theory sense is the only sane basis > for something like an SRE approach. Right. I definitely see two issues here, a short term one and a long term one. I was currently focused on the former, i.e. how do we get (ice-9 regex) to have invariant behavior across platforms. I completely agree that this still leaves open the very interesting question, "what would a good regex SRFI do?". With respect (ice-9 regex), I'm inclined to agree with you. If we can include a good POSIX implementation, then that should fix the problems I've been asking about. Thanks again for the very interesting comments. -- Rob Browning rlb @defaultvalue.org, @linuxdevel.com, and @debian.org Previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-05-05 19:37 ` Rob Browning @ 2003-05-05 20:19 ` Tom Lord 0 siblings, 0 replies; 64+ messages in thread From: Tom Lord @ 2003-05-05 20:19 UTC (permalink / raw) Cc: guile-user > Thanks again for the very interesting comments. Hope they're useful. Along those lines: > With respect (ice-9 regex), I'm inclined to agree with you. > If we can include a good POSIX implementation, then that > should fix the problems I've been asking about. State of the world, as I know it: * GNU regex Most versions are buggy. RMS did some work on one fork (perhaps just the one in emacs) and I'm pretty sure he fixed all the Posix bugs. So, ask in that direction. Advantages: simple, small, LGPL Disadvantages: slow on expressions that cause backtracking Uknowns: Posix compliant? (guess bias towards "yes" for the latest from RMS) * Henry Spencer's I haven't seen any more recent release than the one included in Tcl. Advantages: fast, Berkeley-ish license (GPL compatible), Unicode support Disadvantages: big and complicated. Some Posix bugs (at least ca. mid-2002) Uknowns: still maintained? * Isamu Hasegawa's (Latest glibc?) Advantages: smart implementor, DFAish, in glibc (so perhaps gets beaten upon), LGPL Disadvantages: odd space requirements, big and complicated Unknowns: Posix conformance status (guess bias: "good") and performance (guess bias: "good for short strings") * Tom Lord's (latest arch, src/hackerlab) Advantages: DFAish, fast, good correctness tests, Unicode in low-level engine (but not (yet) via Posix entry points), good growth path basis for "what should a regexp srfi do". Disadvantages: big and complicated, GPL (probably flexible on that), regcomp is mildly slow (compared to GNU regex but regexec fast), can be a fickle beast to tune (but conversely: flexibly tunable). Unknowns: maintained? (c.f., my so-called life :-) * Others Don't bother, imho. It's mostly the "big and complicated" on all but one of those that makes me suggest bundling a good fork of GNU regex, if you can get one. -t _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-05-05 6:18 ` Tom Lord 2003-05-05 7:47 ` Rob Browning @ 2003-10-24 22:26 ` Thien-Thi Nguyen 2003-10-24 22:58 ` Tom Lord 1 sibling, 1 reply; 64+ messages in thread From: Thien-Thi Nguyen @ 2003-10-24 22:26 UTC (permalink / raw) Cc: guile-user From: Tom Lord <lord@emf.net> Date: Sun, 4 May 2003 23:18:08 -0700 (PDT) Guile-dialect regexp choices should be (imho) no less casual than, say, number-tower choices. but a tower is an abstract model that is supported to varying degrees when it comes to compilation down to the the physical layer. why are surface regexp syntaxes different? sure, there are non-dfa-friendly approaches, so provide non-dfa-requiring compilation for those. at the bottom it's all just fixed-width NANDs and NORs (logically speaking)... thi [cc trimmed] _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-10-24 22:26 ` Thien-Thi Nguyen @ 2003-10-24 22:58 ` Tom Lord 2003-10-26 19:02 ` Thien-Thi Nguyen ` (2 more replies) 0 siblings, 3 replies; 64+ messages in thread From: Tom Lord @ 2003-10-24 22:58 UTC (permalink / raw) Cc: guile-user > From: Thien-Thi Nguyen <ttn@surf.glug.org> > From: Tom Lord <lord@emf.net> > Date: Sun, 4 May 2003 23:18:08 -0700 (PDT) > Guile-dialect regexp choices should be (imho) no less casual than, > say, number-tower choices. > but a tower is an abstract model that is supported to varying degrees > when it comes to compilation down to the the physical layer. That's equally true of regexps. > why are surface regexp syntaxes different? I believe I said they were similar, not different -- so I'm not sure why you ask me why they are different. > sure, there are non-dfa-friendly approaches, so provide > non-dfa-requiring compilation for those. Although you're mostly posting ink-blots to the list today (or because of that) I'll say (loosely, but I think it could be tighted up): The chomsky hierarchy points to some platonic truth that has real and practical implications for programs. If you stick to the lower levels of the hierarchy, you get better performance guarantees than if you don't. In the history of the design of programming language, including regexp languages, we have on the one hand that abstract observation about performance along the chomsky hierarchy, and on the other hand the historical accident of standard regexp languages ignoring that observation. It's easy to tweak a backtracking "true regular expression" engine to add, for example, subexpression position capture and backreferencing. Perl regexps are an example of what happens if you plop yourself down on a sled at the top of that slippery slope. The history is easy to understand: on dinky-little 1970s hardware (for example) a dynamic pattern matching pretty much has to be implemented as a backtracking engine. If I give you source code to a backtracking engine, it's easy to make incremental tweaks to it which are very convenient -- but which also raise the nature of that engine on the scale of the chomsky hierarchy. The so-far (mostly) unasked question is whether we can achieve similar (practical, not theoretic) expressivity to those tweaked matchers _without_ climbing the chomsky hierarchy: can we be fast and convenient at the same time? > at the bottom it's all just fixed-width NANDs and NORs > (logically speaking)... What's your point? I don't see how that observation relates to choices in language or run-time system design. Are you off your meds? "nothing really matters, to me" -- Queen (Bohemian Rhapsody) -t _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-10-24 22:58 ` Tom Lord @ 2003-10-26 19:02 ` Thien-Thi Nguyen 2003-10-27 10:26 ` tomas 2003-10-27 14:19 ` Dale P. Smith 2 siblings, 0 replies; 64+ messages in thread From: Thien-Thi Nguyen @ 2003-10-26 19:02 UTC (permalink / raw) Cc: guile-user From: Tom Lord <lord@emf.net> Date: Fri, 24 Oct 2003 15:58:18 -0700 (PDT) I believe I said they were similar, not different -- so I'm not sure why you ask me why they are different. i was confused. ignore. can we be fast and convenient at the same time? what is your take on this question? > at the bottom it's all just fixed-width NANDs and NORs > (logically speaking)... What's your point? I don't see how that observation relates to choices in language or run-time system design. it doesn't. let's just call it a Bill Richter moment. Are you off your meds? probably. back to lurking now. thi _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-10-24 22:58 ` Tom Lord 2003-10-26 19:02 ` Thien-Thi Nguyen @ 2003-10-27 10:26 ` tomas 2003-10-27 14:19 ` Dale P. Smith 2 siblings, 0 replies; 64+ messages in thread From: tomas @ 2003-10-27 10:26 UTC (permalink / raw) Cc: guile-user, ttn On Fri, Oct 24, 2003 at 03:58:18PM -0700, Tom Lord wrote: > [snip] > The so-far (mostly) unasked question is whether we can achieve similar > (practical, not theoretic) expressivity to those tweaked matchers > _without_ climbing the chomsky hierarchy: can we be fast and > convenient at the same time? It's not unasked. I've asked it myself, in several variations. One variation I particularly like is whether there is a (practical) way to let the regexp compiler decide how far to climb the Chomsky ladder (and not to clutter/limit the `regexp language' or `the interface' with such performance considerations). Thanks for your insightful posts. -- tomas _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-10-24 22:58 ` Tom Lord 2003-10-26 19:02 ` Thien-Thi Nguyen 2003-10-27 10:26 ` tomas @ 2003-10-27 14:19 ` Dale P. Smith 2003-10-27 14:54 ` rm 2 siblings, 1 reply; 64+ messages in thread From: Dale P. Smith @ 2003-10-27 14:19 UTC (permalink / raw) On Fri, 24 Oct 2003 15:58:18 -0700 (PDT) Tom Lord <lord@emf.net> wrote: > The chomsky hierarchy points to some platonic truth that has real and > practical implications for programs. If you stick to the lower > levels of the hierarchy, you get better performance guarantees than if > you don't. Got any nice pointers to this chomsky hierarchy? Thanks! -Dale -- Dale P. Smith dsmith at actron dot com _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-10-27 14:19 ` Dale P. Smith @ 2003-10-27 14:54 ` rm 2003-10-28 0:57 ` Robert Marlow 0 siblings, 1 reply; 64+ messages in thread From: rm @ 2003-10-27 14:54 UTC (permalink / raw) Cc: guile-user On Mon, Oct 27, 2003 at 09:19:56AM -0500, Dale P. Smith wrote: > On Fri, 24 Oct 2003 15:58:18 -0700 (PDT) > Tom Lord <lord@emf.net> wrote: > > > The chomsky hierarchy points to some platonic truth that has real and > > practical implications for programs. If you stick to the lower > > levels of the hierarchy, you get better performance guarantees than if > > you don't. > > Got any nice pointers to this chomsky hierarchy? Well, here are two introductory links: http://en.wikipedia.org/wiki/Noam_Chomsky and esp: http://en.wikipedia.org/wiki/Chomsky_hierarchy as well as: http://en.wikipedia.org/wiki/Regular_language If you're looking for printed references -- any good compiler book should cover this hierarchy (i think both the Dragon book and Hopcroft/Ullman 'Introduction to Automata Theory' cover it). hth Ralf Mattes > Thanks! > -Dale > > -- > Dale P. Smith > dsmith at actron dot com > > > _______________________________________________ > Guile-user mailing list > Guile-user@gnu.org > http://mail.gnu.org/mailman/listinfo/guile-user _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-10-27 14:54 ` rm @ 2003-10-28 0:57 ` Robert Marlow 2003-10-28 1:59 ` Tom Lord 2003-10-28 2:05 ` lord 0 siblings, 2 replies; 64+ messages in thread From: Robert Marlow @ 2003-10-28 0:57 UTC (permalink / raw) Cc: guile-user, Dale P. Smith Whoa! when I saw "Chomsky" I at first thought of the Anarchist Linguistics Professor Noam Chomsky. I didn't realise it actually WAS named after him. I only really knew of his Anarchist acheivements (being an anarchist myself and all). never thought I'd come accross an application of his Linguistics research. That's made my day, thanks! On Mon, 2003-10-27 at 22:54, rm@fabula.de wrote: > On Mon, Oct 27, 2003 at 09:19:56AM -0500, Dale P. Smith wrote: > > On Fri, 24 Oct 2003 15:58:18 -0700 (PDT) > > Tom Lord <lord@emf.net> wrote: > > > > > The chomsky hierarchy points to some platonic truth that has real and > > > practical implications for programs. If you stick to the lower > > > levels of the hierarchy, you get better performance guarantees than if > > > you don't. > > > > Got any nice pointers to this chomsky hierarchy? > > Well, here are two introductory links: > > http://en.wikipedia.org/wiki/Noam_Chomsky > > and esp: > > http://en.wikipedia.org/wiki/Chomsky_hierarchy > > as well as: > > http://en.wikipedia.org/wiki/Regular_language > > If you're looking for printed references -- any good compiler > book should cover this hierarchy (i think both the Dragon book > and Hopcroft/Ullman 'Introduction to Automata Theory' cover > it). > > hth Ralf Mattes > > > Thanks! > > -Dale > > > > -- > > Dale P. Smith > > dsmith at actron dot com > > > > > > _______________________________________________ > > Guile-user mailing list > > Guile-user@gnu.org > > http://mail.gnu.org/mailman/listinfo/guile-user > > > _______________________________________________ > Guile-user mailing list > Guile-user@gnu.org > http://mail.gnu.org/mailman/listinfo/guile-user -- Regards, Robert Marlow _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-10-28 0:57 ` Robert Marlow @ 2003-10-28 1:59 ` Tom Lord 2003-10-29 9:36 ` Harri Haataja 2003-10-28 2:05 ` lord 1 sibling, 1 reply; 64+ messages in thread From: Tom Lord @ 2003-10-28 1:59 UTC (permalink / raw) Cc: guile-user, rm, dsmith > From: Robert Marlow <bobstopper@australispro.com.au> > Whoa! when I saw "Chomsky" I at first thought of the Anarchist > Linguistics Professor Noam Chomsky. I didn't realise it actually WAS > named after him. I only really knew of his Anarchist acheivements (being > an anarchist myself and all). never thought I'd come accross an > application of his Linguistics research. That's made my day, thanks! What frustrates me is that web resources on this topic seem to be so thoroughly lacking. I actually did, I'm sorry to admit, spend about 2 hrs looking for a good link on Google that covered the topic. The cited wikpedia entries are fine, in their way, but if you don't already know what they're talking about they won't help. There's a gazillion instances of "lecture notes" from this or that college course that mostly just duplicate the wikpedia entry. There's not much beyond that. There's an on-line doc from CMU, aimed at high-school students in a summer science program, that is a fantastic one-pager (even for "grown ups") -- but that is incomplete and non-rigorous. It's really disappointing that if someone asks "where can I learn the meaning of the chomsky hierarchy" the answer has to be "go to the library or buy a book." Gah!!! -t _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-10-28 1:59 ` Tom Lord @ 2003-10-29 9:36 ` Harri Haataja 0 siblings, 0 replies; 64+ messages in thread From: Harri Haataja @ 2003-10-29 9:36 UTC (permalink / raw) Cc: guile-user On Mon, Oct 27, 2003 at 05:59:02PM -0800, Tom Lord wrote: > > From: Robert Marlow <bobstopper@australispro.com.au> > > Whoa! when I saw "Chomsky" I at first thought of the Anarchist > > Linguistics Professor Noam Chomsky. I didn't realise it actually WAS > > named after him. I only really knew of his Anarchist acheivements > > (being an anarchist myself and all). never thought I'd come accross > > an application of his Linguistics research. That's made my day, > > thanks! > What frustrates me is that web resources on this topic seem to be so > thoroughly lacking. I actually did, I'm sorry to admit, spend about > 2 hrs looking for a good link on Google that covered the topic. > > The cited wikpedia entries are fine, in their way, but if you don't > already know what they're talking about they won't help. There's a > gazillion instances of "lecture notes" from this or that college > course that mostly just duplicate the wikpedia entry. There's not > much beyond that. Well, Wikipedia is there to be made better. The easiest step is to complain about the problem on a matching talk page. > There's an on-line doc from CMU, aimed at high-school students in a > summer science program, that is a fantastic one-pager (even for "grown > ups") -- but that is incomplete and non-rigorous. .. or add links. > It's really disappointing that if someone asks "where can I learn the > meaning of the chomsky hierarchy" the answer has to be "go to the > library or buy a book." Indeed. (And someone already mentioned the poor state of public libraries wrt these kind of topics.) _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-10-28 0:57 ` Robert Marlow 2003-10-28 1:59 ` Tom Lord @ 2003-10-28 2:05 ` lord [not found] ` <lord@morrowfield.regexps.com> 1 sibling, 1 reply; 64+ messages in thread From: lord @ 2003-10-28 2:05 UTC (permalink / raw) Cc: guile-user, rm, dsmith me: > It's really disappointing that if someone asks "where can I learn the > meaning of the chomsky hierarchy" the answer has to be "go to the > library or buy a book." Not least because a _public_ library (at which "anyone" can obtain borrowing privileges) is unlikely to have any materials that cover this while at an academic library likely to have these materials, borrowing privileges cost a lot of money. Yes, folks, access to fundamental truths about the nature of the universe (the platonic universe of ideals, no less) -- even things so trivial they've been written down many, many times -- is a class-based privelege. Everyone go on strike, please? Crash everything and then let's just work it out like civilized people can, shall we :-) I want a global, cultural "do-over". "When I say stop, continue." -- Fripp -t _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <lord@morrowfield.regexps.com>]
* Re: Stupid module and pregexp questions [not found] ` <lord@morrowfield.regexps.com> @ 2003-10-28 2:23 ` Thien-Thi Nguyen 0 siblings, 0 replies; 64+ messages in thread From: Thien-Thi Nguyen @ 2003-10-28 2:23 UTC (permalink / raw) Cc: guile-user From: "Tom Lord <lord@morrowfield.regexps.com> Date: Mon, 27 Oct 2003 18:05:22 -0800 (PST) access to fundamental truths about the nature of the universe (the platonic universe of ideals, no less) -- even things so trivial they've been written down many, many times -- is a class-based privelege. this is why only the trickster class has any hope of Making Things Better. unfortunately, lots of those types descended into the banality that is marketing, and forgot about the future that bears down on not only them but also their purported targets. the result is that access is less the gating factor than simple constriction of desire into very limited/limiting slices of progress. a floating cgi smiley buzzsawing numeric slave symbolism still has the potential to convey the truths underlying machine capability boundaries, but only once the humans consuming such an image become comfortable w/ their own limitations. da vinci and pacioli did not vie against each other; that balance is yet to be regained. in the meantime, what is the practiced "freedom"? what is practiced "peace"? what is the practiced "strength"? thi _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Stupid module and pregexp questions 2003-04-29 22:06 ` MJ Ray 2003-04-29 23:21 ` Tom Lord @ 2003-04-30 4:38 ` Robert Uhl 1 sibling, 0 replies; 64+ messages in thread From: Robert Uhl @ 2003-04-30 4:38 UTC (permalink / raw) MJ Ray <markj@cloaked.freeserve.co.uk> writes: > > > That's useful for some things (when the developer writes the > > regexps), but for others it's not so good, e.g. when the _user_ > > writes the regexps. The user probably wants whatever he's locally > > used to... > > Basic, Extended, Perl, ... this probably is general, not just ours. Granted. But it seems to me that introducing Yet Another Regexp Scheme is not the way to solve things. I'd use the native facility and let scripts be non-portable; hopefully the friction would cause the native libraries to be made more uniform. I'll admit to a touch of sunny optimism, though:-) > > And then, of course, there's the issue of speed. Regexps are used > > for enough processing that IMHO they must be matched by compiled, > > not interpreted, code or they risk being unacceptably slow. [...] > > Compiled code is just interpreted code at a different level, surely? > A good optimisation will often beat dropping down levels, and scheme > allows easier optimisation while avoiding some typical errors. Do the > minimum directly in C, IMHO. Oh, of course. OTOH, one of the things I feel (and I could very well be quite incorrect) is that sometimes folks ignore the constant factors in O() issues. That is, it can very well be that O(n^2) is faster than O(n) when the latter is interpreted and the former compile, for small n. N is often small, which is the problem. Obviously the ideal is for all code to be compiled and run as close to the machine as possible, and obviously Scheme can do that. A guile compiler would go most of the way toward resolving my own objections. The overhead of using a compiled Lisp vs. a compiled C in this day and age is, in most cases, pretty close to negligible. -- Robert Uhl <ruhl@4dv.net> It is the day of Resurrection, let us be radiant for the feast, and let us embrace one another. Let us say: `Brethren,' even to them that hate us, let us forgive all things on the Resurrection, and thus let us cry out: Christ is risen from the dead, trampling down death by death, and on those in the tombs bestowing life. _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 64+ messages in thread
end of thread, other threads:[~2003-10-29 9:36 UTC | newest] Thread overview: 64+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-04-23 13:37 Stupid module and pregexp questions MJ Ray 2003-04-23 14:56 ` Paul Jarc 2003-04-24 10:01 ` MJ Ray 2003-04-24 12:52 ` Andreas Rottmann 2003-04-24 13:15 ` MJ Ray 2003-04-24 13:36 ` Andreas Rottmann 2003-04-24 16:58 ` Marius Vollmer 2003-04-24 22:55 ` Andreas Rottmann 2003-04-24 17:58 ` MJ Ray 2003-04-28 16:06 ` Rob Browning 2003-04-28 16:44 ` MJ Ray 2003-04-28 17:03 ` Rob Browning 2003-04-28 17:51 ` MJ Ray 2003-04-28 18:18 ` Rob Browning 2003-04-28 18:07 ` Dr. Peter Ivanyi 2003-04-29 18:38 ` MJ Ray 2003-04-28 17:53 ` tomas 2003-04-28 17:12 ` Rob Browning 2003-04-28 17:55 ` MJ Ray 2003-04-29 8:12 ` Low level things in C or Scheme [was Stupid module and pregexp questions] tomas 2003-04-29 17:35 ` Thamer Al-Harbash 2003-04-29 19:34 ` Low level things in C or Scheme Mikael Djurfeldt 2003-04-29 20:24 ` Ken Anderson 2003-04-30 4:27 ` Low level things in C or Scheme [was Stupid module and pregexp questions] Robert Uhl 2003-04-30 13:27 ` Thamer Al-Harbash 2003-04-30 6:39 ` tomas 2003-04-29 0:45 ` Stupid module and pregexp questions Robert Uhl 2003-04-29 22:06 ` MJ Ray 2003-04-29 23:21 ` Tom Lord 2003-04-30 0:04 ` Ken Anderson 2003-04-30 6:48 ` tomas 2003-04-30 6:31 ` Tom Lord 2003-04-30 6:35 ` Tom Lord 2003-10-24 21:29 ` Thien-Thi Nguyen 2003-10-24 22:30 ` Tom Lord 2003-10-26 18:38 ` Thien-Thi Nguyen 2003-04-30 6:58 ` Thien-Thi Nguyen 2003-04-30 10:34 ` tomas 2003-04-30 17:11 ` Tom Lord 2003-05-06 9:50 ` tomas 2003-05-06 9:28 ` Tom Lord 2003-05-08 11:47 ` tomas 2003-10-24 21:45 ` Thien-Thi Nguyen 2003-10-24 22:37 ` Tom Lord 2003-10-26 18:47 ` Thien-Thi Nguyen 2003-10-27 10:48 ` tomas 2003-05-05 5:11 ` Rob Browning 2003-05-05 6:18 ` Tom Lord 2003-05-05 7:47 ` Rob Browning 2003-05-05 17:33 ` Tom Lord 2003-05-05 19:37 ` Rob Browning 2003-05-05 20:19 ` Tom Lord 2003-10-24 22:26 ` Thien-Thi Nguyen 2003-10-24 22:58 ` Tom Lord 2003-10-26 19:02 ` Thien-Thi Nguyen 2003-10-27 10:26 ` tomas 2003-10-27 14:19 ` Dale P. Smith 2003-10-27 14:54 ` rm 2003-10-28 0:57 ` Robert Marlow 2003-10-28 1:59 ` Tom Lord 2003-10-29 9:36 ` Harri Haataja 2003-10-28 2:05 ` lord [not found] ` <lord@morrowfield.regexps.com> 2003-10-28 2:23 ` Thien-Thi Nguyen 2003-04-30 4:38 ` Robert Uhl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).