Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
@ 2010-02-18  6:10 David Combs
  2010-02-18 11:46 ` Pascal J. Bourguignon
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: David Combs @ 2010-02-18  6:10 UTC (permalink / raw)
  To: help-gnu-emacs

Subj:  Perl, etc has these "?"-prefix modifiers/codes/whatever.  
       Precisely which does emacs have (and NOT have)?

Please, someone, make an ascii or html table or even plain text 
list of all these neat "new" non-standard ops that perl and
even php and ruby etc seem to have now, comparing them
to what Emacs has or don't have.

Also point to any .el-files that upgrade that stuff.

---

Friedl's book "Mastering Regular Expresisions", both editions, 
covered Emacs re its regexp ability.  Very Nice!

Now there's A NEW REGEXP BOOK (also O'Reilly) a "COOKBOOK" on regexps, 
and covers, with examples and also pointed-out differences --
for a whole bunch of systems: perl, php, ruby, ... <I forget, but there's
a bunch of them>, but does NOT (DAMNIT!) even MENTION emacs!

Note: the credits say thanks to Friedl (maybe also Ilya, I 
forget) for suggestions on the various drafts.

So you see a nifty regexp in that book, and you want to try
it in emacs --- what a bear, trying to convert it into
something that emacs understands.

What say, some guru in emacs regexps?  (Ilya?  Friedl? ...)

Thanks!

David

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
  2010-02-18  6:10 Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? David Combs
@ 2010-02-18 11:46 ` Pascal J. Bourguignon
  2010-02-18 16:57   ` John Withers
       [not found]   ` <mailman.1450.1266512270.14305.help-gnu-emacs@gnu.org>
  2010-02-18 16:23 ` Tyler Smith
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 16+ messages in thread
From: Pascal J. Bourguignon @ 2010-02-18 11:46 UTC (permalink / raw)
  To: help-gnu-emacs

dkcombs@panix.com (David Combs) writes:

> Subj:  Perl, etc has these "?"-prefix modifiers/codes/whatever.  
>        Precisely which does emacs have (and NOT have)?
>
>
> Please, someone, make an ascii or html table or even plain text 
> list of all these neat "new" non-standard ops that perl and
> even php and ruby etc seem to have now, comparing them
> to what Emacs has or don't have.

emacs lisp has a lot of data types.  But in lisp, the types are not
associated to the variables, but to the values.  Therefore names
(symbols are used to name things in general in lisp) don't need to be
marked in any special way.

In lisps where there is both lexical bindings and dynamic bindings, such
as Common Lisp, there's a convention to distinguish dynamic variable
from lexical variables:

    - Dynamic variables are surrounded by stars:    *variable*

    - Lexical variables are surrounded by nothing:   variable

in addition:

    - Constant variables are surrounded by pluses:  +variable+

But in emacs lisp, there is only dynamic binding, so this convention is
not applied by emacs lisp programmers in general (but Common Lisp
programmers writing emacs lisp code tend to use it, so you may see it
applied for *global-variables* vs. local-variables).


Finally, in lisp, a name may have several meanings.  We distinguish
often the variable and the function meanings, and call lisps that
distinguish them "lisp-2", while lisps that don't (eg. scheme or LeLisp)
are called "lisp-1".  But there are a lot of other meanings a name may
take.  For example, in emacs lisp, a same name may be used to variable,
a function, a tagbody tag, a catch tag (catch also takes other objects),
a block name, etc. And moreover, as a programmer, you can add new
meanings to a name by writing new functions and macros (so the
classifications should really be "lisp-infinity+1" vs. "lisp-infinity").

Anyways, the distinction of meaning of a name in lisp is not done by the
form of the name, but by the context in which it is found.

For example, in a function call, a name in first position is interpreted
as a function name, while the same name in the other position would be
interpreted as a variable name. In the case of a block name, the first
argument (second position in the block form) is interpreted as a block
name.

(defun f (f) (1+ f))
(let ((f 41)) 
   (block f
     (return-from f (f f))))  ; On this line, first f is a block name, 
            ; second f is a function name, third f is a variable name.
--> 42




Here is an non-exclusive example of the various meaning the name haha
may be attached to in lisp:

(require 'cl)
                   
(defmacro show (&rest exprs)
  `(progn
     ,@(mapcar (lambda (expr) `(insert (format "%60s = %S\n" ',expr ,expr))) exprs)))


(defvar haha 0)
(defun haha () 1)
(progn
  (let ((haha 3)) 
    (flet ((haha () 4))
      (block haha
        (catch 'haha
          (tagbody
             (if (zerop haha) (go haha))
             (print '(it was not zero))
             (show haha (symbol-value 'haha) 
                   (haha) (funcall (function haha)) (funcall 'haha))
             (throw 'haha nil)
           haha
             (print '(it was  zero))
             (show haha (symbol-value 'haha)
                   (haha) (funcall (function haha)) (funcall 'haha))
             (return-from haha t))))))
  (show haha (symbol-value 'haha)
        (haha) (funcall (function haha)) (funcall 'haha)))

(it was not zero)
                                                        haha = 3
                                 (symbol-value (quote haha)) = 3
                                                      (haha) = 4
                                   (funcall (function haha)) = 4
                                      (funcall (quote haha)) = 4
                                                        haha = 0
                                 (symbol-value (quote haha)) = 0
                                                      (haha) = 1
                                   (funcall (function haha)) = 1
                                      (funcall (quote haha)) = 1



(In Common Lisp, output would be different, because it as lexical
bindings as an additionnal meaning for names:

(IT WAS NOT ZERO) 
                                                        HAHA = 3
                                        (SYMBOL-VALUE 'HAHA) = 3
                                                      (HAHA) = 4
                                            (FUNCALL #'HAHA) = 4
                                             (FUNCALL 'HAHA) = 1
                                                        HAHA = 0
                                        (SYMBOL-VALUE 'HAHA) = 0
                                                      (HAHA) = 1
                                            (FUNCALL #'HAHA) = 1
                                             (FUNCALL 'HAHA) = 1
)

 

-- 
__Pascal Bourguignon__



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
  2010-02-18 11:46 ` Pascal J. Bourguignon
@ 2010-02-18 16:57   ` John Withers
       [not found]   ` <mailman.1450.1266512270.14305.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 16+ messages in thread
From: John Withers @ 2010-02-18 16:57 UTC (permalink / raw)
  To: Pascal J. Bourguignon; +Cc: help-gnu-emacs

On Thu, 2010-02-18 at 12:46 +0100, Pascal J. Bourguignon wrote:
> dkcombs@panix.com (David Combs) writes:
> 
> > Subj:  Perl, etc has these "?"-prefix modifiers/codes/whatever.  
> >        Precisely which does emacs have (and NOT have)?
> >
> >
> > Please, someone, make an ascii or html table or even plain text 
> > list of all these neat "new" non-standard ops that perl and
> > even php and ruby etc seem to have now, comparing them
> > to what Emacs has or don't have.
> 
> emacs lisp has a lot of data types.  But in lisp, the types are not
> associated to the variables, but to the values.  Therefore names
> (symbols are used to name things in general in lisp) don't need to be
> marked in any special way.

No, what he wants is for someone to go through and make a list of all
the perl lookahead/behind assertions for regular expressions, even
though the data is very easily found with a single google search and
comes down to pretty much if it has a (?<symbol> then emacs doesn't have
it, because the regexes in emacs haven't been touched since the
neolithic.

And finally he is looking for a code patch or pointers to where to look
for something like this patch you can find with a simple google search:
http://emacsbugs.donarmstrong.com/cgi/bugreport.cgi?msg=1;bug=5393

And while I am more than happy to take digs at the lack of basic google
searches and lazyweb requests, I do get the sentiment. At this point the
entire rest of the world has moved on to perl-style regular expressions
a good decade ago, and unlike many things about the world moving in a
different direction than emacs, in this case they have more
functionality. Lookahead and lookbehind assertions are useful. 



> In lisps where there is both lexical bindings and dynamic bindings, such
> as Common Lisp, there's a convention to distinguish dynamic variable
> from lexical variables:
> 
>     - Dynamic variables are surrounded by stars:    *variable*
> 
>     - Lexical variables are surrounded by nothing:   variable
> 
> in addition:
> 
>     - Constant variables are surrounded by pluses:  +variable+
> 
> But in emacs lisp, there is only dynamic binding, so this convention is
> not applied by emacs lisp programmers in general (but Common Lisp
> programmers writing emacs lisp code tend to use it, so you may see it
> applied for *global-variables* vs. local-variables).
> 
> 
> Finally, in lisp, a name may have several meanings.  We distinguish
> often the variable and the function meanings, and call lisps that
> distinguish them "lisp-2", while lisps that don't (eg. scheme or LeLisp)
> are called "lisp-1".  But there are a lot of other meanings a name may
> take.  For example, in emacs lisp, a same name may be used to variable,
> a function, a tagbody tag, a catch tag (catch also takes other objects),
> a block name, etc. And moreover, as a programmer, you can add new
> meanings to a name by writing new functions and macros (so the
> classifications should really be "lisp-infinity+1" vs. "lisp-infinity").
> 
> Anyways, the distinction of meaning of a name in lisp is not done by the
> form of the name, but by the context in which it is found.
> 
> For example, in a function call, a name in first position is interpreted
> as a function name, while the same name in the other position would be
> interpreted as a variable name. In the case of a block name, the first
> argument (second position in the block form) is interpreted as a block
> name.
> 
> (defun f (f) (1+ f))
> (let ((f 41)) 
>    (block f
>      (return-from f (f f))))  ; On this line, first f is a block name, 
>             ; second f is a function name, third f is a variable name.
> --> 42
> 
> 
> 
> 
> Here is an non-exclusive example of the various meaning the name haha
> may be attached to in lisp:
> 
> (require 'cl)
>                    
> (defmacro show (&rest exprs)
>   `(progn
>      ,@(mapcar (lambda (expr) `(insert (format "%60s = %S\n" ',expr ,expr))) exprs)))
> 
> 
> (defvar haha 0)
> (defun haha () 1)
> (progn
>   (let ((haha 3)) 
>     (flet ((haha () 4))
>       (block haha
>         (catch 'haha
>           (tagbody
>              (if (zerop haha) (go haha))
>              (print '(it was not zero))
>              (show haha (symbol-value 'haha) 
>                    (haha) (funcall (function haha)) (funcall 'haha))
>              (throw 'haha nil)
>            haha
>              (print '(it was  zero))
>              (show haha (symbol-value 'haha)
>                    (haha) (funcall (function haha)) (funcall 'haha))
>              (return-from haha t))))))
>   (show haha (symbol-value 'haha)
>         (haha) (funcall (function haha)) (funcall 'haha)))
> 
> (it was not zero)
>                                                         haha = 3
>                                  (symbol-value (quote haha)) = 3
>                                                       (haha) = 4
>                                    (funcall (function haha)) = 4
>                                       (funcall (quote haha)) = 4
>                                                         haha = 0
>                                  (symbol-value (quote haha)) = 0
>                                                       (haha) = 1
>                                    (funcall (function haha)) = 1
>                                       (funcall (quote haha)) = 1
> 
> 
> 
> (In Common Lisp, output would be different, because it as lexical
> bindings as an additionnal meaning for names:
> 
> (IT WAS NOT ZERO) 
>                                                         HAHA = 3
>                                         (SYMBOL-VALUE 'HAHA) = 3
>                                                       (HAHA) = 4
>                                             (FUNCALL #'HAHA) = 4
>                                              (FUNCALL 'HAHA) = 1
>                                                         HAHA = 0
>                                         (SYMBOL-VALUE 'HAHA) = 0
>                                                       (HAHA) = 1
>                                             (FUNCALL #'HAHA) = 1
>                                              (FUNCALL 'HAHA) = 1
> )
> 
>  
> 





^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <mailman.1450.1266512270.14305.help-gnu-emacs@gnu.org>]

* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
       [not found]   ` <mailman.1450.1266512270.14305.help-gnu-emacs@gnu.org>
@ 2010-02-18 19:02     ` Pascal J. Bourguignon
  2010-02-18 21:38       ` John Bokma
                         ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Pascal J. Bourguignon @ 2010-02-18 19:02 UTC (permalink / raw)
  To: help-gnu-emacs

John Withers <grayarea@reddagger.org> writes:

> On Thu, 2010-02-18 at 12:46 +0100, Pascal J. Bourguignon wrote:
>> dkcombs@panix.com (David Combs) writes:
>> 
>> > Subj:  Perl, etc has these "?"-prefix modifiers/codes/whatever.  
>> >        Precisely which does emacs have (and NOT have)?
>> >
>> >
>> > Please, someone, make an ascii or html table or even plain text 
>> > list of all these neat "new" non-standard ops that perl and
>> > even php and ruby etc seem to have now, comparing them
>> > to what Emacs has or don't have.
>> 
>> emacs lisp has a lot of data types.  But in lisp, the types are not
>> associated to the variables, but to the values.  Therefore names
>> (symbols are used to name things in general in lisp) don't need to be
>> marked in any special way.
>
> No, what he wants is for someone to go through and make a list of all
> the perl lookahead/behind assertions for regular expressions, even
> though the data is very easily found with a single google search and
> comes down to pretty much if it has a (?<symbol> then emacs doesn't have
> it, because the regexes in emacs haven't been touched since the
> neolithic.
>
> And finally he is looking for a code patch or pointers to where to look
> for something like this patch you can find with a simple google search:
> http://emacsbugs.donarmstrong.com/cgi/bugreport.cgi?msg=1;bug=5393
>
> And while I am more than happy to take digs at the lack of basic google
> searches and lazyweb requests, I do get the sentiment. At this point the
> entire rest of the world has moved on to perl-style regular expressions
> a good decade ago, and unlike many things about the world moving in a
> different direction than emacs, in this case they have more
> functionality. Lookahead and lookbehind assertions are useful. 

Ah, I thought he meant the $x @x #x whatever...

In the case of "regular expressions", when you add certain extensions,
they're not regular expressions at all, so, I will just cite Jamie
Zawinski:


    Some people, when confronted with a problem, think "I know, I'll use
    regular expressions."  Now they have two problems. 



-- 
__Pascal Bourguignon__


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
  2010-02-18 19:02     ` Pascal J. Bourguignon
@ 2010-02-18 21:38       ` John Bokma
  2010-02-18 21:42       ` John Withers
       [not found]       ` <mailman.1460.1266529372.14305.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 16+ messages in thread
From: John Bokma @ 2010-02-18 21:38 UTC (permalink / raw)
  To: help-gnu-emacs

pjb@informatimago.com (Pascal J. Bourguignon) writes:

>     Some people, when confronted with a problem, think "I know, I'll use
>     regular expressions."  Now they have two problems. 

   Some people, when confronted with a problem, think "I know a funny
   quote!" Now they have two problems.

-- 
John Bokma                                                               j3b

Hacking & Hiking in Mexico -  http://johnbokma.com/
http://castleamber.com/ - Perl & Python Development


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
  2010-02-18 19:02     ` Pascal J. Bourguignon
  2010-02-18 21:38       ` John Bokma
@ 2010-02-18 21:42       ` John Withers
       [not found]       ` <mailman.1460.1266529372.14305.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 16+ messages in thread
From: John Withers @ 2010-02-18 21:42 UTC (permalink / raw)
  To: Pascal J. Bourguignon; +Cc: help-gnu-emacs

On Thu, 2010-02-18 at 20:02 +0100, Pascal J. Bourguignon wrote:
<snip>
> 
> In the case of "regular expressions", when you add certain extensions,
> they're not regular expressions at all, so, I will just cite Jamie
> Zawinski:
<snip quote we all know here>

While in the strict sense you are obviously correct, that really doesn't
matter when you are programmer using a feature of your text editor. In
practice we aren't using them to strictly define regular languages in
some kind of formal language theory bakeoff. Well, I don't know that for
a fact. You seem like a pretty smart guy and that might be your hobby.
But in general, we are using them to get crap done.

It doesn't matter to me if for reasons of formal definition we rename a
modern regular expression engine as a MooCowPerlCrap engine in order not
to conflict with the formal definition. I still will argue that having a
MCPC engine would be a nice feature. Heck, emacs has a rich history of
using terms that no one in the wider, growing world gets as time goes on
anyway (I am talking to you, frames).

sed, grep, xemacs and pretty much the entire rest of the ecosystem
caught this idea quite some time ago, and it would be nice to have these
features in emacs. 

The quote you are pulling is from a discussion of exactly this issue, as
I am sure you are aware. But the funny thing here is that Jamie in the
last few years was using Perl extensively. He might not like it, but he
was using it:
http://regex.info/blog/2006-09-15/247#comment-3085

I don't disagree that regexes might be a pain or overused, but what I
don't get is the idea that if you are going to have them in the first
place, you don't add some pretty handy features that the rest of the
ecosystem has been using for decades now and won't degrade the base
features, if for some reason of formal purity you decide to use only
those.

I dunno, then again, I might just not be getting the emacs way. I have
only been using emacs a few years and my lisp skills aren't that strong,
and except for org-mode I use my emacs almost always in its tertiary
role as a programmers text editor.

john withers

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <mailman.1460.1266529372.14305.help-gnu-emacs@gnu.org>]

* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
       [not found]       ` <mailman.1460.1266529372.14305.help-gnu-emacs@gnu.org>
@ 2010-02-19  0:53         ` David Combs
  2010-02-19  1:06         ` Pascal J. Bourguignon
  1 sibling, 0 replies; 16+ messages in thread
From: David Combs @ 2010-02-19  0:53 UTC (permalink / raw)
  To: help-gnu-emacs

Yes, I was asking about Perl features in emacs' regexps.

Like all the operators with "?" prefixes.

Actually, I'm surprised someone hasn't already (years ago, even)
given some of that stuff a try.

Ilya, for instance -- he's such a whiz at implementing those
fancy regexp features for perl, making (I think) use-anywyere 
libraries of them, and so on.

---

Just in passing, that Friedl book, "Mastering Regular Expressions",
is really worth having in the library.  (So you can dream of
what you could do so easily were they in emacs too!)

Thanks for all the comments!

David

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
       [not found]       ` <mailman.1460.1266529372.14305.help-gnu-emacs@gnu.org>
  2010-02-19  0:53         ` David Combs
@ 2010-02-19  1:06         ` Pascal J. Bourguignon
  2010-02-19  2:36           ` John Withers
       [not found]           ` <mailman.1470.1266547034.14305.help-gnu-emacs@gnu.org>
  1 sibling, 2 replies; 16+ messages in thread
From: Pascal J. Bourguignon @ 2010-02-19  1:06 UTC (permalink / raw)
  To: help-gnu-emacs

John Withers <grayarea@reddagger.org> writes:

> On Thu, 2010-02-18 at 20:02 +0100, Pascal J. Bourguignon wrote:
> <snip>
>> 
>> In the case of "regular expressions", when you add certain extensions,
>> they're not regular expressions at all, so, I will just cite Jamie
>> Zawinski:
> <snip quote we all know here>
>
> While in the strict sense you are obviously correct, that really doesn't
> matter when you are programmer using a feature of your text editor. In
> practice we aren't using them to strictly define regular languages in
> some kind of formal language theory bakeoff. Well, I don't know that for
> a fact. You seem like a pretty smart guy and that might be your hobby.
> But in general, we are using them to get crap done.
>
> It doesn't matter to me if for reasons of formal definition we rename a
> modern regular expression engine as a MooCowPerlCrap engine in order not
> to conflict with the formal definition. I still will argue that having a
> MCPC engine would be a nice feature. Heck, emacs has a rich history of
> using terms that no one in the wider, growing world gets as time goes on
> anyway (I am talking to you, frames).
>
> sed, grep, xemacs and pretty much the entire rest of the ecosystem
> caught this idea quite some time ago, and it would be nice to have these
> features in emacs. 
>
> The quote you are pulling is from a discussion of exactly this issue, as
> I am sure you are aware. But the funny thing here is that Jamie in the
> last few years was using Perl extensively. He might not like it, but he
> was using it:
> http://regex.info/blog/2006-09-15/247#comment-3085
>
> I don't disagree that regexes might be a pain or overused, but what I
> don't get is the idea that if you are going to have them in the first
> place, you don't add some pretty handy features that the rest of the
> ecosystem has been using for decades now and won't degrade the base
> features, if for some reason of formal purity you decide to use only
> those.
>
> I dunno, then again, I might just not be getting the emacs way. I have
> only been using emacs a few years and my lisp skills aren't that strong,
> and except for org-mode I use my emacs almost always in its tertiary
> role as a programmers text editor.

One difficulty when you try to extend regular expression is that the
time and space complexity of matching such an extended regular
expression easily becomes exponential.  In these cases, it may be easier
to write a parser, than to try to force it thru regular expressions,
both for the programmer's brain and for the CPU processor...

Otherwise, people will do anything they want to do, theory and
precendent nonobstant.  This only demonstrate the lack of culture of the
newcomers.

-- 
__Pascal Bourguignon__


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
  2010-02-19  1:06         ` Pascal J. Bourguignon
@ 2010-02-19  2:36           ` John Withers
       [not found]           ` <mailman.1470.1266547034.14305.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 16+ messages in thread
From: John Withers @ 2010-02-19  2:36 UTC (permalink / raw)
  To: Pascal J. Bourguignon; +Cc: help-gnu-emacs

On Fri, 2010-02-19 at 02:06 +0100, Pascal J. Bourguignon wrote:

> 
> One difficulty when you try to extend regular expression is that the
> time and space complexity of matching such an extended regular
> expression easily becomes exponential.  In these cases, it may be easier
> to write a parser, than to try to force it thru regular expressions,
> both for the programmer's brain and for the CPU processor...

Sure exponential backtracking can happen, you can write checks for
common cases and aborts, but let's say you don't. Who cares? I can write
things that go exponential for memory or clock ticks in any of the
languages I am even trivially familiar with.

> Otherwise, people will do anything they want to do, theory and
> precendent nonobstant.  This only demonstrate the lack of culture of the
> newcomers.

Or it demonstrates the need to get things done. I can write a regex to
do a transform on 1000 text files in a directory and do the operation
before you have closed the last paren on your parser.

But I do appreciate theoretical purity and those who have the expanses
of free time in which to cultivate it.

john withers

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <mailman.1470.1266547034.14305.help-gnu-emacs@gnu.org>]

* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
       [not found]           ` <mailman.1470.1266547034.14305.help-gnu-emacs@gnu.org>
@ 2010-02-19  6:48             ` Tim X
  2010-02-20 21:14               ` John Withers
       [not found]               ` <mailman.1559.1266700478.14305.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 16+ messages in thread
From: Tim X @ 2010-02-19  6:48 UTC (permalink / raw)
  To: help-gnu-emacs

John Withers <grayarea@reddagger.org> writes:

> On Fri, 2010-02-19 at 02:06 +0100, Pascal J. Bourguignon wrote:
>
>> 
>> One difficulty when you try to extend regular expression is that the
>> time and space complexity of matching such an extended regular
>> expression easily becomes exponential.  In these cases, it may be easier
>> to write a parser, than to try to force it thru regular expressions,
>> both for the programmer's brain and for the CPU processor...
>
> Sure exponential backtracking can happen, you can write checks for
> common cases and aborts, but let's say you don't. Who cares? I can write
> things that go exponential for memory or clock ticks in any of the
> languages I am even trivially familiar with.
>
>> Otherwise, people will do anything they want to do, theory and
>> precendent nonobstant.  This only demonstrate the lack of culture of the
>> newcomers.
>
> Or it demonstrates the need to get things done. I can write a regex to
> do a transform on 1000 text files in a directory and do the operation
> before you have closed the last paren on your parser.
>

I'm always amazed at these sort of claims because they are just so
meaningless. for every concrete example you can come up with, we can
come up with others where writing the parser will be faster and more
reliable than using REs. 

The real 'trick' is knowing when RE is the best solution and when a
parser is better. Ironically, its often the individuals grasp of the
underlying theory that will tend to determine whether they make the
right or wrong decision. 

> But I do appreciate theoretical purity and those who have the expanses
> of free time in which to cultivate it.
>

Making some artificial distinction between the theoretical and the
practical is nonsense. You need both. A very high proportion of problems
I've seen people having with REs have been due to a lack of
theoretical understanding of how RE work, their strengths and their
weaknesses. 

I've seen far too many bad uses of RE than good ones. A common example
is using regexp to parse HTML documents. This is almost always the wrong
solution and will generally only give you a partially correct result.
Correctness will degrade sharpely with the number of HTML docs needing
to be processed. i.e. if you just have one document, you can probably
tweak your RE to give a correct result, but if you have to process
hundreds or thousands of such documents, you will end up spending far
more time constantly tweaking and maintaining your regular expressions
than you would have spent writing a simple html parser. Much of the
reason why REs are not good for this job is bound up in the theory
underlying REs

Its interesting to note that one of the more significant issues facing
ruby has been with respect to its handling of REs and the problems they
have had in getting them to work efficiently. Its been a while since I
examined progress in this area, but the last time I looked, the extended
RE features being discussed here were the central problem and has
resulted in a situation where they are slow enough to make them pretty
much worthless from a practical standpoint of getting work done! I
encountered this first hand where I needed to parse a large number of
log files that were quite large. While it worked well, it was too slow
and used a lot of memory. In the end, I re-wrote the scripts using a
simple parser. It took less time to write, ran faster and used less
memory. The code was also a lot clearer and easier to maintain. As a
consequence, if I need to use REs I'll use perl, but if I plan to use a
simple parser, I'll probably use ruby.

Another interesting point is that I suspect ruby has a lot more active
contributors than emacs, yet they hadn't been able to greatly improve
the efficiency of REs despite considerable effort being put into the
problem (not sure what the current state is).

This I think supports Pascal's point. What use would extensions to emacs
REs be if those extensions so adversely affected performance that using
them became impractical for anything but trivial RE problems that can
already be handled with what we have?

Emacs REs are certainly not my favorite RE implementation, but thats not
because of a lack of the RE extensions that perl has made so popular. I
personally find all the '\' a much bigger PITA. 

Emacs, like other open source projects is largely about scratching your
own itch. If emacs doesn't have RE features someone wants, either they
use a different tool or they get off their arses an implement it.
Moaning about it and been critical because it doesn't have a feature is
just a lot of hot air. The fact it hasn't been done yet probably means
everyone else who has wanted that particular itch scratched has found a
more efficient means of doing it using another tool or a different
approach. 

Sometimes it may be simply that the person feels the task is too
daunting or too demanding or they simply don't have the
necessary skills. If this is the case, then maybe a better approach is
to play the role of facilitator or coordinator and try to find others
who are interested in contributing towards the same goals. On the other
hand, if all anyone is interested in is just moaning and doing nothing,
then they will get pretty much exactly what they deserve - sweet FA.

The same goes for posts like the OPs in this thread. Rather than asking
someone else to do the work, why not do it and contribute it
back to the project. If you don't have the skills, make a start and do
what you can and then ask for help. It is far more likely others will be
willing to assist when they see a real effort being put in. Having a go
will also result in more specific questions, which are always easier to
answer than vague broad ones. If done well, the information could easily
be added to the manual and contribute to overall improvements for other
users. It could even be started as a page on the emacs wiki, whihc wold
make it easier for others to contribute and improve.

Posts like "Plese, someone else do something that I want" rarely
achieves anything other than make readers think its just a moan from
someone who is frustrated but not frustrated enough to do anything about
their problem except moan. While its fine to be lazy, being lazy and
fussy is just a recipie to make one miserable. Being lazy, fussy and a
moaner just adds noise that makes it harder to find relevant
information. 

Tim

-- 
tcross (at) rapttech dot com dot au

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
  2010-02-19  6:48             ` Tim X
@ 2010-02-20 21:14               ` John Withers
       [not found]               ` <mailman.1559.1266700478.14305.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 16+ messages in thread
From: John Withers @ 2010-02-20 21:14 UTC (permalink / raw)
  To: Tim X; +Cc: help-gnu-emacs

Tim, 

You are completely correct on all counts. What I should have said was
that for many classes of problems I run into during my daily work the
ability to write a regex is much faster than using a parser (and
definitely than writing one). And that I find the classes of problems
that fit that mold increased by having lookahead/behind assertions.

I use parsers more frequently than I use regexes, but a lot of the one
shot work I do on logs, semi-structured text files of various types and
in very, very limited cases some html where the html is already
processed in some way; a quick regex is much faster for me, and I
imagine almost everyone, but I could be wrong.

But in reality, as you pointed out, I shouldn't have been in the
discussion at all. Next week I am going to have time to look at Tomohiro
Matsuyamas patch that I referenced in the first of my posts in this
string. My comments should have been restricted to just saying that I am
looking forward to doing so. 

Thank you for pointing this out.

john withers
 
On Fri, 2010-02-19 at 17:48 +1100, Tim X wrote:
> John Withers <grayarea@reddagger.org> writes:
> 
> > On Fri, 2010-02-19 at 02:06 +0100, Pascal J. Bourguignon wrote:
> >
> >> 
> >> One difficulty when you try to extend regular expression is that the
> >> time and space complexity of matching such an extended regular
> >> expression easily becomes exponential.  In these cases, it may be easier
> >> to write a parser, than to try to force it thru regular expressions,
> >> both for the programmer's brain and for the CPU processor...
> >
> > Sure exponential backtracking can happen, you can write checks for
> > common cases and aborts, but let's say you don't. Who cares? I can write
> > things that go exponential for memory or clock ticks in any of the
> > languages I am even trivially familiar with.
> >
> >> Otherwise, people will do anything they want to do, theory and
> >> precendent nonobstant.  This only demonstrate the lack of culture of the
> >> newcomers.
> >
> > Or it demonstrates the need to get things done. I can write a regex to
> > do a transform on 1000 text files in a directory and do the operation
> > before you have closed the last paren on your parser.
> >
> 
> I'm always amazed at these sort of claims because they are just so
> meaningless. for every concrete example you can come up with, we can
> come up with others where writing the parser will be faster and more
> reliable than using REs. 
<snip>
> 
> Posts like "Plese, someone else do something that I want" rarely
> achieves anything other than make readers think its just a moan from
> someone who is frustrated but not frustrated enough to do anything about
> their problem except moan. While its fine to be lazy, being lazy and
> fussy is just a recipie to make one miserable. Being lazy, fussy and a
> moaner just adds noise that makes it harder to find relevant
> information. 
> 
> Tim
> 
> 





^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <mailman.1559.1266700478.14305.help-gnu-emacs@gnu.org>]

* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
       [not found]               ` <mailman.1559.1266700478.14305.help-gnu-emacs@gnu.org>
@ 2010-02-23 12:33                 ` Tim Landscheidt
  0 siblings, 0 replies; 16+ messages in thread
From: Tim Landscheidt @ 2010-02-23 12:33 UTC (permalink / raw)
  To: help-gnu-emacs

John Withers <grayarea@reddagger.org> wrote:

> You are completely correct on all counts. What I should have said was
> that for many classes of problems I run into during my daily work the
> ability to write a regex is much faster than using a parser (and
> definitely than writing one). And that I find the classes of problems
> that fit that mold increased by having lookahead/behind assertions.

> I use parsers more frequently than I use regexes, but a lot of the one
> shot work I do on logs, semi-structured text files of various types and
> in very, very limited cases some html where the html is already
> processed in some way; a quick regex is much faster for me, and I
> imagine almost everyone, but I could be wrong.
> [...]

Same with me; and I would add that the maintainability in-
creases drastically as well. If I have to revisit a term
like:

| (while (re-search-forward "foo" nil t)
|   (save-match-data
|     (when (not (looking-at "bar"))
|       (replace-match "XYZ")))))

or even:

| foo\($\|[^b]\|b[^a]\|ba[^r]\)

I need much more time to understand what I tried to achieve
than with:

| foo(?!bar)

Furthermore, if the handmade parser has a subtle deviation
from what you would expect it to look like, you spend even
more time figuring out whether that was a wanted effect or a
bug not yet discovered.

Tim


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
  2010-02-18  6:10 Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? David Combs
  2010-02-18 11:46 ` Pascal J. Bourguignon
@ 2010-02-18 16:23 ` Tyler Smith
       [not found] ` <mailman.1449.1266510261.14305.help-gnu-emacs@gnu.org>
  2010-02-24 19:54 ` Stefan Monnier
  3 siblings, 0 replies; 16+ messages in thread
From: Tyler Smith @ 2010-02-18 16:23 UTC (permalink / raw)
  To: help-gnu-emacs

dkcombs@panix.com (David Combs) writes:

> Subj:  Perl, etc has these "?"-prefix modifiers/codes/whatever.  
>        Precisely which does emacs have (and NOT have)?
>
>
> Please, someone, make an ascii or html table or even plain text 
> list of all these neat "new" non-standard ops that perl and
> even php and ruby etc seem to have now, comparing them
> to what Emacs has or don't have.

I don't understand your question. Emacs' regexps facilities are
explained in the manual, (info "(emacs)Regexps") . You will find on that
page a link to further details for programmers. I don't know what new
non-standard ops perl and php and ruby have, but if they aren't in the
Emacs manual, then they most probably aren't in Emacs.

> So you see a nifty regexp in that book, and you want to try
> it in emacs --- what a bear, trying to convert it into
> something that emacs understands.

It sounds like you're trying to use a manual for one group of
applications to learn another application. No question, that must be
quite a bear. Probably easier to use the manual for Emacs to learn
Emacs.

Tyler

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <mailman.1449.1266510261.14305.help-gnu-emacs@gnu.org>]

* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
       [not found] ` <mailman.1449.1266510261.14305.help-gnu-emacs@gnu.org>
@ 2010-02-19  0:59   ` David Combs
  2010-02-19  3:22     ` Tyler Smith
  0 siblings, 1 reply; 16+ messages in thread
From: David Combs @ 2010-02-19  0:59 UTC (permalink / raw)
  To: help-gnu-emacs

In article <mailman.1449.1266510261.14305.help-gnu-emacs@gnu.org>,
Tyler Smith  <tyler.smith@eku.edu> wrote:
>dkcombs@panix.com (David Combs) writes:
>
>> Subj:  Perl, etc has these "?"-prefix modifiers/codes/whatever.  
>>        Precisely which does emacs have (and NOT have)?
>>
>>
>> Please, someone, make an ascii or html table or even plain text 
>> list of all these neat "new" non-standard ops that perl and
>> even php and ruby etc seem to have now, comparing them
>> to what Emacs has or don't have.
>
>I don't understand your question. Emacs' regexps facilities are
>explained in the manual, (info "(emacs)Regexps") . You will find on that
>page a link to further details for programmers. I don't know what new
>non-standard ops perl and php and ruby have, ...

Well, just get "Mastering Regular Expressions", by Jeffrey Friedl,
2nd edition -- it'll blow you away.

The title maybe should have been

        Everything you wanted to know about regular expressions,
               but were afraid to ask.

Plus a lot of stuff mayve you never did want to know, too.

On regexps, this book is THE BIBLE!

Next time in a bookstore and you spy one, have a look.

Ten to one you'll buy the thing.


David




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
  2010-02-19  0:59   ` David Combs
@ 2010-02-19  3:22     ` Tyler Smith
  0 siblings, 0 replies; 16+ messages in thread
From: Tyler Smith @ 2010-02-19  3:22 UTC (permalink / raw)
  To: help-gnu-emacs

dkcombs@panix.com (David Combs) writes:

> In article <mailman.1449.1266510261.14305.help-gnu-emacs@gnu.org>,
> Tyler Smith  <tyler.smith@eku.edu> wrote:
>>dkcombs@panix.com (David Combs) writes:
>>
>>> Subj:  Perl, etc has these "?"-prefix modifiers/codes/whatever.  
>>>        Precisely which does emacs have (and NOT have)?
>>>
>>>
>>> Please, someone, make an ascii or html table or even plain text 
>>> list of all these neat "new" non-standard ops that perl and
>>> even php and ruby etc seem to have now, comparing them
>>> to what Emacs has or don't have.
>>
>>I don't understand your question. Emacs' regexps facilities are
>>explained in the manual, (info "(emacs)Regexps") . You will find on that
>>page a link to further details for programmers. I don't know what new
>>non-standard ops perl and php and ruby have, ...
>
> Well, just get "Mastering Regular Expressions", by Jeffrey Friedl,
> 2nd edition -- it'll blow you away.
>

I have already read this book, and it did inspire me to incorporate
regexps into my emacs toolbox. However, it *does* include a fairly
complete discussion of Emacs' regexps. So if you already have both that
book and the built-in Emacs manual, what further documentation do you
want? If you don't find them in the Emacs chapter of Friedl's book or
the manual, they don't exist _in Emacs_.

But maybe, your first post notwithstanding, you aren't really asking for
new documentation, but in fact new features to be added to the regexps
available in Emacs? That's another issue entirely, as other posters have
commented on.

Tyler

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)?
  2010-02-18  6:10 Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? David Combs
                   ` (2 preceding siblings ...)
       [not found] ` <mailman.1449.1266510261.14305.help-gnu-emacs@gnu.org>
@ 2010-02-24 19:54 ` Stefan Monnier
  3 siblings, 0 replies; 16+ messages in thread
From: Stefan Monnier @ 2010-02-24 19:54 UTC (permalink / raw)
  To: help-gnu-emacs

> Please, someone, make an ascii or html table or even plain text 
> list of all these neat "new" non-standard ops that perl and
> even php and ruby etc seem to have now, comparing them
> to what Emacs has or don't have.

As mentioned, Emacs regexps are fairly well described in the Emacs
manual and the Elisp manual.

Indeed Emacs regexps don't support all the fancy additions present in
things like Perl.  IIRC someone provided a patch for the look-ahead and
look-behind features, but it got lost somewhere along the way.

Emacs likes regexps, so it makes sense to support these things.  At the
same time, I've known them for a long time but have rarely found a need
for them in my Emacs experience.

Personally I'd like to add a DFA-based regex engine to Emacs, so as to
get rid of the exponential backtracking problem that shows up every once
in a while (it also has a few other advantages, such as the ability to
do the regexp matching a chunk at a time).

        Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-02-24 19:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-18  6:10 Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? David Combs
2010-02-18 11:46 ` Pascal J. Bourguignon
2010-02-18 16:57   ` John Withers
     [not found]   ` <mailman.1450.1266512270.14305.help-gnu-emacs@gnu.org>
2010-02-18 19:02     ` Pascal J. Bourguignon
2010-02-18 21:38       ` John Bokma
2010-02-18 21:42       ` John Withers
     [not found]       ` <mailman.1460.1266529372.14305.help-gnu-emacs@gnu.org>
2010-02-19  0:53         ` David Combs
2010-02-19  1:06         ` Pascal J. Bourguignon
2010-02-19  2:36           ` John Withers
     [not found]           ` <mailman.1470.1266547034.14305.help-gnu-emacs@gnu.org>
2010-02-19  6:48             ` Tim X
2010-02-20 21:14               ` John Withers
     [not found]               ` <mailman.1559.1266700478.14305.help-gnu-emacs@gnu.org>
2010-02-23 12:33                 ` Tim Landscheidt
2010-02-18 16:23 ` Tyler Smith
     [not found] ` <mailman.1449.1266510261.14305.help-gnu-emacs@gnu.org>
2010-02-19  0:59   ` David Combs
2010-02-19  3:22     ` Tyler Smith
2010-02-24 19:54 ` Stefan Monnier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).