* Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? @ 2010-02-18 6:10 David Combs 2010-02-18 11:46 ` Pascal J. Bourguignon ` (3 more replies) 0 siblings, 4 replies; 16+ messages in thread From: David Combs @ 2010-02-18 6:10 UTC (permalink / raw) To: help-gnu-emacs Subj: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? Please, someone, make an ascii or html table or even plain text list of all these neat "new" non-standard ops that perl and even php and ruby etc seem to have now, comparing them to what Emacs has or don't have. Also point to any .el-files that upgrade that stuff. --- Friedl's book "Mastering Regular Expresisions", both editions, covered Emacs re its regexp ability. Very Nice! Now there's A NEW REGEXP BOOK (also O'Reilly) a "COOKBOOK" on regexps, and covers, with examples and also pointed-out differences -- for a whole bunch of systems: perl, php, ruby, ... <I forget, but there's a bunch of them>, but does NOT (DAMNIT!) even MENTION emacs! Note: the credits say thanks to Friedl (maybe also Ilya, I forget) for suggestions on the various drafts. So you see a nifty regexp in that book, and you want to try it in emacs --- what a bear, trying to convert it into something that emacs understands. What say, some guru in emacs regexps? (Ilya? Friedl? ...) Thanks! David ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? 2010-02-18 6:10 Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? David Combs @ 2010-02-18 11:46 ` Pascal J. Bourguignon 2010-02-18 16:57 ` John Withers [not found] ` <mailman.1450.1266512270.14305.help-gnu-emacs@gnu.org> 2010-02-18 16:23 ` Tyler Smith ` (2 subsequent siblings) 3 siblings, 2 replies; 16+ messages in thread From: Pascal J. Bourguignon @ 2010-02-18 11:46 UTC (permalink / raw) To: help-gnu-emacs dkcombs@panix.com (David Combs) writes: > Subj: Perl, etc has these "?"-prefix modifiers/codes/whatever. > Precisely which does emacs have (and NOT have)? > > > Please, someone, make an ascii or html table or even plain text > list of all these neat "new" non-standard ops that perl and > even php and ruby etc seem to have now, comparing them > to what Emacs has or don't have. emacs lisp has a lot of data types. But in lisp, the types are not associated to the variables, but to the values. Therefore names (symbols are used to name things in general in lisp) don't need to be marked in any special way. In lisps where there is both lexical bindings and dynamic bindings, such as Common Lisp, there's a convention to distinguish dynamic variable from lexical variables: - Dynamic variables are surrounded by stars: *variable* - Lexical variables are surrounded by nothing: variable in addition: - Constant variables are surrounded by pluses: +variable+ But in emacs lisp, there is only dynamic binding, so this convention is not applied by emacs lisp programmers in general (but Common Lisp programmers writing emacs lisp code tend to use it, so you may see it applied for *global-variables* vs. local-variables). Finally, in lisp, a name may have several meanings. We distinguish often the variable and the function meanings, and call lisps that distinguish them "lisp-2", while lisps that don't (eg. scheme or LeLisp) are called "lisp-1". But there are a lot of other meanings a name may take. For example, in emacs lisp, a same name may be used to variable, a function, a tagbody tag, a catch tag (catch also takes other objects), a block name, etc. And moreover, as a programmer, you can add new meanings to a name by writing new functions and macros (so the classifications should really be "lisp-infinity+1" vs. "lisp-infinity"). Anyways, the distinction of meaning of a name in lisp is not done by the form of the name, but by the context in which it is found. For example, in a function call, a name in first position is interpreted as a function name, while the same name in the other position would be interpreted as a variable name. In the case of a block name, the first argument (second position in the block form) is interpreted as a block name. (defun f (f) (1+ f)) (let ((f 41)) (block f (return-from f (f f)))) ; On this line, first f is a block name, ; second f is a function name, third f is a variable name. --> 42 Here is an non-exclusive example of the various meaning the name haha may be attached to in lisp: (require 'cl) (defmacro show (&rest exprs) `(progn ,@(mapcar (lambda (expr) `(insert (format "%60s = %S\n" ',expr ,expr))) exprs))) (defvar haha 0) (defun haha () 1) (progn (let ((haha 3)) (flet ((haha () 4)) (block haha (catch 'haha (tagbody (if (zerop haha) (go haha)) (print '(it was not zero)) (show haha (symbol-value 'haha) (haha) (funcall (function haha)) (funcall 'haha)) (throw 'haha nil) haha (print '(it was zero)) (show haha (symbol-value 'haha) (haha) (funcall (function haha)) (funcall 'haha)) (return-from haha t)))))) (show haha (symbol-value 'haha) (haha) (funcall (function haha)) (funcall 'haha))) (it was not zero) haha = 3 (symbol-value (quote haha)) = 3 (haha) = 4 (funcall (function haha)) = 4 (funcall (quote haha)) = 4 haha = 0 (symbol-value (quote haha)) = 0 (haha) = 1 (funcall (function haha)) = 1 (funcall (quote haha)) = 1 (In Common Lisp, output would be different, because it as lexical bindings as an additionnal meaning for names: (IT WAS NOT ZERO) HAHA = 3 (SYMBOL-VALUE 'HAHA) = 3 (HAHA) = 4 (FUNCALL #'HAHA) = 4 (FUNCALL 'HAHA) = 1 HAHA = 0 (SYMBOL-VALUE 'HAHA) = 0 (HAHA) = 1 (FUNCALL #'HAHA) = 1 (FUNCALL 'HAHA) = 1 ) -- __Pascal Bourguignon__ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? 2010-02-18 11:46 ` Pascal J. Bourguignon @ 2010-02-18 16:57 ` John Withers [not found] ` <mailman.1450.1266512270.14305.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 16+ messages in thread From: John Withers @ 2010-02-18 16:57 UTC (permalink / raw) To: Pascal J. Bourguignon; +Cc: help-gnu-emacs On Thu, 2010-02-18 at 12:46 +0100, Pascal J. Bourguignon wrote: > dkcombs@panix.com (David Combs) writes: > > > Subj: Perl, etc has these "?"-prefix modifiers/codes/whatever. > > Precisely which does emacs have (and NOT have)? > > > > > > Please, someone, make an ascii or html table or even plain text > > list of all these neat "new" non-standard ops that perl and > > even php and ruby etc seem to have now, comparing them > > to what Emacs has or don't have. > > emacs lisp has a lot of data types. But in lisp, the types are not > associated to the variables, but to the values. Therefore names > (symbols are used to name things in general in lisp) don't need to be > marked in any special way. No, what he wants is for someone to go through and make a list of all the perl lookahead/behind assertions for regular expressions, even though the data is very easily found with a single google search and comes down to pretty much if it has a (?<symbol> then emacs doesn't have it, because the regexes in emacs haven't been touched since the neolithic. And finally he is looking for a code patch or pointers to where to look for something like this patch you can find with a simple google search: http://emacsbugs.donarmstrong.com/cgi/bugreport.cgi?msg=1;bug=5393 And while I am more than happy to take digs at the lack of basic google searches and lazyweb requests, I do get the sentiment. At this point the entire rest of the world has moved on to perl-style regular expressions a good decade ago, and unlike many things about the world moving in a different direction than emacs, in this case they have more functionality. Lookahead and lookbehind assertions are useful. > In lisps where there is both lexical bindings and dynamic bindings, such > as Common Lisp, there's a convention to distinguish dynamic variable > from lexical variables: > > - Dynamic variables are surrounded by stars: *variable* > > - Lexical variables are surrounded by nothing: variable > > in addition: > > - Constant variables are surrounded by pluses: +variable+ > > But in emacs lisp, there is only dynamic binding, so this convention is > not applied by emacs lisp programmers in general (but Common Lisp > programmers writing emacs lisp code tend to use it, so you may see it > applied for *global-variables* vs. local-variables). > > > Finally, in lisp, a name may have several meanings. We distinguish > often the variable and the function meanings, and call lisps that > distinguish them "lisp-2", while lisps that don't (eg. scheme or LeLisp) > are called "lisp-1". But there are a lot of other meanings a name may > take. For example, in emacs lisp, a same name may be used to variable, > a function, a tagbody tag, a catch tag (catch also takes other objects), > a block name, etc. And moreover, as a programmer, you can add new > meanings to a name by writing new functions and macros (so the > classifications should really be "lisp-infinity+1" vs. "lisp-infinity"). > > Anyways, the distinction of meaning of a name in lisp is not done by the > form of the name, but by the context in which it is found. > > For example, in a function call, a name in first position is interpreted > as a function name, while the same name in the other position would be > interpreted as a variable name. In the case of a block name, the first > argument (second position in the block form) is interpreted as a block > name. > > (defun f (f) (1+ f)) > (let ((f 41)) > (block f > (return-from f (f f)))) ; On this line, first f is a block name, > ; second f is a function name, third f is a variable name. > --> 42 > > > > > Here is an non-exclusive example of the various meaning the name haha > may be attached to in lisp: > > (require 'cl) > > (defmacro show (&rest exprs) > `(progn > ,@(mapcar (lambda (expr) `(insert (format "%60s = %S\n" ',expr ,expr))) exprs))) > > > (defvar haha 0) > (defun haha () 1) > (progn > (let ((haha 3)) > (flet ((haha () 4)) > (block haha > (catch 'haha > (tagbody > (if (zerop haha) (go haha)) > (print '(it was not zero)) > (show haha (symbol-value 'haha) > (haha) (funcall (function haha)) (funcall 'haha)) > (throw 'haha nil) > haha > (print '(it was zero)) > (show haha (symbol-value 'haha) > (haha) (funcall (function haha)) (funcall 'haha)) > (return-from haha t)))))) > (show haha (symbol-value 'haha) > (haha) (funcall (function haha)) (funcall 'haha))) > > (it was not zero) > haha = 3 > (symbol-value (quote haha)) = 3 > (haha) = 4 > (funcall (function haha)) = 4 > (funcall (quote haha)) = 4 > haha = 0 > (symbol-value (quote haha)) = 0 > (haha) = 1 > (funcall (function haha)) = 1 > (funcall (quote haha)) = 1 > > > > (In Common Lisp, output would be different, because it as lexical > bindings as an additionnal meaning for names: > > (IT WAS NOT ZERO) > HAHA = 3 > (SYMBOL-VALUE 'HAHA) = 3 > (HAHA) = 4 > (FUNCALL #'HAHA) = 4 > (FUNCALL 'HAHA) = 1 > HAHA = 0 > (SYMBOL-VALUE 'HAHA) = 0 > (HAHA) = 1 > (FUNCALL #'HAHA) = 1 > (FUNCALL 'HAHA) = 1 > ) > > > ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <mailman.1450.1266512270.14305.help-gnu-emacs@gnu.org>]
* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? [not found] ` <mailman.1450.1266512270.14305.help-gnu-emacs@gnu.org> @ 2010-02-18 19:02 ` Pascal J. Bourguignon 2010-02-18 21:38 ` John Bokma ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Pascal J. Bourguignon @ 2010-02-18 19:02 UTC (permalink / raw) To: help-gnu-emacs John Withers <grayarea@reddagger.org> writes: > On Thu, 2010-02-18 at 12:46 +0100, Pascal J. Bourguignon wrote: >> dkcombs@panix.com (David Combs) writes: >> >> > Subj: Perl, etc has these "?"-prefix modifiers/codes/whatever. >> > Precisely which does emacs have (and NOT have)? >> > >> > >> > Please, someone, make an ascii or html table or even plain text >> > list of all these neat "new" non-standard ops that perl and >> > even php and ruby etc seem to have now, comparing them >> > to what Emacs has or don't have. >> >> emacs lisp has a lot of data types. But in lisp, the types are not >> associated to the variables, but to the values. Therefore names >> (symbols are used to name things in general in lisp) don't need to be >> marked in any special way. > > No, what he wants is for someone to go through and make a list of all > the perl lookahead/behind assertions for regular expressions, even > though the data is very easily found with a single google search and > comes down to pretty much if it has a (?<symbol> then emacs doesn't have > it, because the regexes in emacs haven't been touched since the > neolithic. > > And finally he is looking for a code patch or pointers to where to look > for something like this patch you can find with a simple google search: > http://emacsbugs.donarmstrong.com/cgi/bugreport.cgi?msg=1;bug=5393 > > And while I am more than happy to take digs at the lack of basic google > searches and lazyweb requests, I do get the sentiment. At this point the > entire rest of the world has moved on to perl-style regular expressions > a good decade ago, and unlike many things about the world moving in a > different direction than emacs, in this case they have more > functionality. Lookahead and lookbehind assertions are useful. Ah, I thought he meant the $x @x #x whatever... In the case of "regular expressions", when you add certain extensions, they're not regular expressions at all, so, I will just cite Jamie Zawinski: Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. -- __Pascal Bourguignon__ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? 2010-02-18 19:02 ` Pascal J. Bourguignon @ 2010-02-18 21:38 ` John Bokma 2010-02-18 21:42 ` John Withers [not found] ` <mailman.1460.1266529372.14305.help-gnu-emacs@gnu.org> 2 siblings, 0 replies; 16+ messages in thread From: John Bokma @ 2010-02-18 21:38 UTC (permalink / raw) To: help-gnu-emacs pjb@informatimago.com (Pascal J. Bourguignon) writes: > Some people, when confronted with a problem, think "I know, I'll use > regular expressions." Now they have two problems. Some people, when confronted with a problem, think "I know a funny quote!" Now they have two problems. -- John Bokma j3b Hacking & Hiking in Mexico - http://johnbokma.com/ http://castleamber.com/ - Perl & Python Development ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? 2010-02-18 19:02 ` Pascal J. Bourguignon 2010-02-18 21:38 ` John Bokma @ 2010-02-18 21:42 ` John Withers [not found] ` <mailman.1460.1266529372.14305.help-gnu-emacs@gnu.org> 2 siblings, 0 replies; 16+ messages in thread From: John Withers @ 2010-02-18 21:42 UTC (permalink / raw) To: Pascal J. Bourguignon; +Cc: help-gnu-emacs On Thu, 2010-02-18 at 20:02 +0100, Pascal J. Bourguignon wrote: <snip> > > In the case of "regular expressions", when you add certain extensions, > they're not regular expressions at all, so, I will just cite Jamie > Zawinski: <snip quote we all know here> While in the strict sense you are obviously correct, that really doesn't matter when you are programmer using a feature of your text editor. In practice we aren't using them to strictly define regular languages in some kind of formal language theory bakeoff. Well, I don't know that for a fact. You seem like a pretty smart guy and that might be your hobby. But in general, we are using them to get crap done. It doesn't matter to me if for reasons of formal definition we rename a modern regular expression engine as a MooCowPerlCrap engine in order not to conflict with the formal definition. I still will argue that having a MCPC engine would be a nice feature. Heck, emacs has a rich history of using terms that no one in the wider, growing world gets as time goes on anyway (I am talking to you, frames). sed, grep, xemacs and pretty much the entire rest of the ecosystem caught this idea quite some time ago, and it would be nice to have these features in emacs. The quote you are pulling is from a discussion of exactly this issue, as I am sure you are aware. But the funny thing here is that Jamie in the last few years was using Perl extensively. He might not like it, but he was using it: http://regex.info/blog/2006-09-15/247#comment-3085 I don't disagree that regexes might be a pain or overused, but what I don't get is the idea that if you are going to have them in the first place, you don't add some pretty handy features that the rest of the ecosystem has been using for decades now and won't degrade the base features, if for some reason of formal purity you decide to use only those. I dunno, then again, I might just not be getting the emacs way. I have only been using emacs a few years and my lisp skills aren't that strong, and except for org-mode I use my emacs almost always in its tertiary role as a programmers text editor. john withers ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <mailman.1460.1266529372.14305.help-gnu-emacs@gnu.org>]
* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? [not found] ` <mailman.1460.1266529372.14305.help-gnu-emacs@gnu.org> @ 2010-02-19 0:53 ` David Combs 2010-02-19 1:06 ` Pascal J. Bourguignon 1 sibling, 0 replies; 16+ messages in thread From: David Combs @ 2010-02-19 0:53 UTC (permalink / raw) To: help-gnu-emacs Yes, I was asking about Perl features in emacs' regexps. Like all the operators with "?" prefixes. Actually, I'm surprised someone hasn't already (years ago, even) given some of that stuff a try. Ilya, for instance -- he's such a whiz at implementing those fancy regexp features for perl, making (I think) use-anywyere libraries of them, and so on. --- Just in passing, that Friedl book, "Mastering Regular Expressions", is really worth having in the library. (So you can dream of what you could do so easily were they in emacs too!) Thanks for all the comments! David ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? [not found] ` <mailman.1460.1266529372.14305.help-gnu-emacs@gnu.org> 2010-02-19 0:53 ` David Combs @ 2010-02-19 1:06 ` Pascal J. Bourguignon 2010-02-19 2:36 ` John Withers [not found] ` <mailman.1470.1266547034.14305.help-gnu-emacs@gnu.org> 1 sibling, 2 replies; 16+ messages in thread From: Pascal J. Bourguignon @ 2010-02-19 1:06 UTC (permalink / raw) To: help-gnu-emacs John Withers <grayarea@reddagger.org> writes: > On Thu, 2010-02-18 at 20:02 +0100, Pascal J. Bourguignon wrote: > <snip> >> >> In the case of "regular expressions", when you add certain extensions, >> they're not regular expressions at all, so, I will just cite Jamie >> Zawinski: > <snip quote we all know here> > > While in the strict sense you are obviously correct, that really doesn't > matter when you are programmer using a feature of your text editor. In > practice we aren't using them to strictly define regular languages in > some kind of formal language theory bakeoff. Well, I don't know that for > a fact. You seem like a pretty smart guy and that might be your hobby. > But in general, we are using them to get crap done. > > It doesn't matter to me if for reasons of formal definition we rename a > modern regular expression engine as a MooCowPerlCrap engine in order not > to conflict with the formal definition. I still will argue that having a > MCPC engine would be a nice feature. Heck, emacs has a rich history of > using terms that no one in the wider, growing world gets as time goes on > anyway (I am talking to you, frames). > > sed, grep, xemacs and pretty much the entire rest of the ecosystem > caught this idea quite some time ago, and it would be nice to have these > features in emacs. > > The quote you are pulling is from a discussion of exactly this issue, as > I am sure you are aware. But the funny thing here is that Jamie in the > last few years was using Perl extensively. He might not like it, but he > was using it: > http://regex.info/blog/2006-09-15/247#comment-3085 > > I don't disagree that regexes might be a pain or overused, but what I > don't get is the idea that if you are going to have them in the first > place, you don't add some pretty handy features that the rest of the > ecosystem has been using for decades now and won't degrade the base > features, if for some reason of formal purity you decide to use only > those. > > I dunno, then again, I might just not be getting the emacs way. I have > only been using emacs a few years and my lisp skills aren't that strong, > and except for org-mode I use my emacs almost always in its tertiary > role as a programmers text editor. One difficulty when you try to extend regular expression is that the time and space complexity of matching such an extended regular expression easily becomes exponential. In these cases, it may be easier to write a parser, than to try to force it thru regular expressions, both for the programmer's brain and for the CPU processor... Otherwise, people will do anything they want to do, theory and precendent nonobstant. This only demonstrate the lack of culture of the newcomers. -- __Pascal Bourguignon__ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? 2010-02-19 1:06 ` Pascal J. Bourguignon @ 2010-02-19 2:36 ` John Withers [not found] ` <mailman.1470.1266547034.14305.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 16+ messages in thread From: John Withers @ 2010-02-19 2:36 UTC (permalink / raw) To: Pascal J. Bourguignon; +Cc: help-gnu-emacs On Fri, 2010-02-19 at 02:06 +0100, Pascal J. Bourguignon wrote: > > One difficulty when you try to extend regular expression is that the > time and space complexity of matching such an extended regular > expression easily becomes exponential. In these cases, it may be easier > to write a parser, than to try to force it thru regular expressions, > both for the programmer's brain and for the CPU processor... Sure exponential backtracking can happen, you can write checks for common cases and aborts, but let's say you don't. Who cares? I can write things that go exponential for memory or clock ticks in any of the languages I am even trivially familiar with. > Otherwise, people will do anything they want to do, theory and > precendent nonobstant. This only demonstrate the lack of culture of the > newcomers. Or it demonstrates the need to get things done. I can write a regex to do a transform on 1000 text files in a directory and do the operation before you have closed the last paren on your parser. But I do appreciate theoretical purity and those who have the expanses of free time in which to cultivate it. john withers ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <mailman.1470.1266547034.14305.help-gnu-emacs@gnu.org>]
* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? [not found] ` <mailman.1470.1266547034.14305.help-gnu-emacs@gnu.org> @ 2010-02-19 6:48 ` Tim X 2010-02-20 21:14 ` John Withers [not found] ` <mailman.1559.1266700478.14305.help-gnu-emacs@gnu.org> 0 siblings, 2 replies; 16+ messages in thread From: Tim X @ 2010-02-19 6:48 UTC (permalink / raw) To: help-gnu-emacs John Withers <grayarea@reddagger.org> writes: > On Fri, 2010-02-19 at 02:06 +0100, Pascal J. Bourguignon wrote: > >> >> One difficulty when you try to extend regular expression is that the >> time and space complexity of matching such an extended regular >> expression easily becomes exponential. In these cases, it may be easier >> to write a parser, than to try to force it thru regular expressions, >> both for the programmer's brain and for the CPU processor... > > Sure exponential backtracking can happen, you can write checks for > common cases and aborts, but let's say you don't. Who cares? I can write > things that go exponential for memory or clock ticks in any of the > languages I am even trivially familiar with. > >> Otherwise, people will do anything they want to do, theory and >> precendent nonobstant. This only demonstrate the lack of culture of the >> newcomers. > > Or it demonstrates the need to get things done. I can write a regex to > do a transform on 1000 text files in a directory and do the operation > before you have closed the last paren on your parser. > I'm always amazed at these sort of claims because they are just so meaningless. for every concrete example you can come up with, we can come up with others where writing the parser will be faster and more reliable than using REs. The real 'trick' is knowing when RE is the best solution and when a parser is better. Ironically, its often the individuals grasp of the underlying theory that will tend to determine whether they make the right or wrong decision. > But I do appreciate theoretical purity and those who have the expanses > of free time in which to cultivate it. > Making some artificial distinction between the theoretical and the practical is nonsense. You need both. A very high proportion of problems I've seen people having with REs have been due to a lack of theoretical understanding of how RE work, their strengths and their weaknesses. I've seen far too many bad uses of RE than good ones. A common example is using regexp to parse HTML documents. This is almost always the wrong solution and will generally only give you a partially correct result. Correctness will degrade sharpely with the number of HTML docs needing to be processed. i.e. if you just have one document, you can probably tweak your RE to give a correct result, but if you have to process hundreds or thousands of such documents, you will end up spending far more time constantly tweaking and maintaining your regular expressions than you would have spent writing a simple html parser. Much of the reason why REs are not good for this job is bound up in the theory underlying REs Its interesting to note that one of the more significant issues facing ruby has been with respect to its handling of REs and the problems they have had in getting them to work efficiently. Its been a while since I examined progress in this area, but the last time I looked, the extended RE features being discussed here were the central problem and has resulted in a situation where they are slow enough to make them pretty much worthless from a practical standpoint of getting work done! I encountered this first hand where I needed to parse a large number of log files that were quite large. While it worked well, it was too slow and used a lot of memory. In the end, I re-wrote the scripts using a simple parser. It took less time to write, ran faster and used less memory. The code was also a lot clearer and easier to maintain. As a consequence, if I need to use REs I'll use perl, but if I plan to use a simple parser, I'll probably use ruby. Another interesting point is that I suspect ruby has a lot more active contributors than emacs, yet they hadn't been able to greatly improve the efficiency of REs despite considerable effort being put into the problem (not sure what the current state is). This I think supports Pascal's point. What use would extensions to emacs REs be if those extensions so adversely affected performance that using them became impractical for anything but trivial RE problems that can already be handled with what we have? Emacs REs are certainly not my favorite RE implementation, but thats not because of a lack of the RE extensions that perl has made so popular. I personally find all the '\' a much bigger PITA. Emacs, like other open source projects is largely about scratching your own itch. If emacs doesn't have RE features someone wants, either they use a different tool or they get off their arses an implement it. Moaning about it and been critical because it doesn't have a feature is just a lot of hot air. The fact it hasn't been done yet probably means everyone else who has wanted that particular itch scratched has found a more efficient means of doing it using another tool or a different approach. Sometimes it may be simply that the person feels the task is too daunting or too demanding or they simply don't have the necessary skills. If this is the case, then maybe a better approach is to play the role of facilitator or coordinator and try to find others who are interested in contributing towards the same goals. On the other hand, if all anyone is interested in is just moaning and doing nothing, then they will get pretty much exactly what they deserve - sweet FA. The same goes for posts like the OPs in this thread. Rather than asking someone else to do the work, why not do it and contribute it back to the project. If you don't have the skills, make a start and do what you can and then ask for help. It is far more likely others will be willing to assist when they see a real effort being put in. Having a go will also result in more specific questions, which are always easier to answer than vague broad ones. If done well, the information could easily be added to the manual and contribute to overall improvements for other users. It could even be started as a page on the emacs wiki, whihc wold make it easier for others to contribute and improve. Posts like "Plese, someone else do something that I want" rarely achieves anything other than make readers think its just a moan from someone who is frustrated but not frustrated enough to do anything about their problem except moan. While its fine to be lazy, being lazy and fussy is just a recipie to make one miserable. Being lazy, fussy and a moaner just adds noise that makes it harder to find relevant information. Tim -- tcross (at) rapttech dot com dot au ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? 2010-02-19 6:48 ` Tim X @ 2010-02-20 21:14 ` John Withers [not found] ` <mailman.1559.1266700478.14305.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 16+ messages in thread From: John Withers @ 2010-02-20 21:14 UTC (permalink / raw) To: Tim X; +Cc: help-gnu-emacs Tim, You are completely correct on all counts. What I should have said was that for many classes of problems I run into during my daily work the ability to write a regex is much faster than using a parser (and definitely than writing one). And that I find the classes of problems that fit that mold increased by having lookahead/behind assertions. I use parsers more frequently than I use regexes, but a lot of the one shot work I do on logs, semi-structured text files of various types and in very, very limited cases some html where the html is already processed in some way; a quick regex is much faster for me, and I imagine almost everyone, but I could be wrong. But in reality, as you pointed out, I shouldn't have been in the discussion at all. Next week I am going to have time to look at Tomohiro Matsuyamas patch that I referenced in the first of my posts in this string. My comments should have been restricted to just saying that I am looking forward to doing so. Thank you for pointing this out. john withers On Fri, 2010-02-19 at 17:48 +1100, Tim X wrote: > John Withers <grayarea@reddagger.org> writes: > > > On Fri, 2010-02-19 at 02:06 +0100, Pascal J. Bourguignon wrote: > > > >> > >> One difficulty when you try to extend regular expression is that the > >> time and space complexity of matching such an extended regular > >> expression easily becomes exponential. In these cases, it may be easier > >> to write a parser, than to try to force it thru regular expressions, > >> both for the programmer's brain and for the CPU processor... > > > > Sure exponential backtracking can happen, you can write checks for > > common cases and aborts, but let's say you don't. Who cares? I can write > > things that go exponential for memory or clock ticks in any of the > > languages I am even trivially familiar with. > > > >> Otherwise, people will do anything they want to do, theory and > >> precendent nonobstant. This only demonstrate the lack of culture of the > >> newcomers. > > > > Or it demonstrates the need to get things done. I can write a regex to > > do a transform on 1000 text files in a directory and do the operation > > before you have closed the last paren on your parser. > > > > I'm always amazed at these sort of claims because they are just so > meaningless. for every concrete example you can come up with, we can > come up with others where writing the parser will be faster and more > reliable than using REs. <snip> > > Posts like "Plese, someone else do something that I want" rarely > achieves anything other than make readers think its just a moan from > someone who is frustrated but not frustrated enough to do anything about > their problem except moan. While its fine to be lazy, being lazy and > fussy is just a recipie to make one miserable. Being lazy, fussy and a > moaner just adds noise that makes it harder to find relevant > information. > > Tim > > ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <mailman.1559.1266700478.14305.help-gnu-emacs@gnu.org>]
* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? [not found] ` <mailman.1559.1266700478.14305.help-gnu-emacs@gnu.org> @ 2010-02-23 12:33 ` Tim Landscheidt 0 siblings, 0 replies; 16+ messages in thread From: Tim Landscheidt @ 2010-02-23 12:33 UTC (permalink / raw) To: help-gnu-emacs John Withers <grayarea@reddagger.org> wrote: > You are completely correct on all counts. What I should have said was > that for many classes of problems I run into during my daily work the > ability to write a regex is much faster than using a parser (and > definitely than writing one). And that I find the classes of problems > that fit that mold increased by having lookahead/behind assertions. > I use parsers more frequently than I use regexes, but a lot of the one > shot work I do on logs, semi-structured text files of various types and > in very, very limited cases some html where the html is already > processed in some way; a quick regex is much faster for me, and I > imagine almost everyone, but I could be wrong. > [...] Same with me; and I would add that the maintainability in- creases drastically as well. If I have to revisit a term like: | (while (re-search-forward "foo" nil t) | (save-match-data | (when (not (looking-at "bar")) | (replace-match "XYZ"))))) or even: | foo\($\|[^b]\|b[^a]\|ba[^r]\) I need much more time to understand what I tried to achieve than with: | foo(?!bar) Furthermore, if the handmade parser has a subtle deviation from what you would expect it to look like, you spend even more time figuring out whether that was a wanted effect or a bug not yet discovered. Tim ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? 2010-02-18 6:10 Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? David Combs 2010-02-18 11:46 ` Pascal J. Bourguignon @ 2010-02-18 16:23 ` Tyler Smith [not found] ` <mailman.1449.1266510261.14305.help-gnu-emacs@gnu.org> 2010-02-24 19:54 ` Stefan Monnier 3 siblings, 0 replies; 16+ messages in thread From: Tyler Smith @ 2010-02-18 16:23 UTC (permalink / raw) To: help-gnu-emacs dkcombs@panix.com (David Combs) writes: > Subj: Perl, etc has these "?"-prefix modifiers/codes/whatever. > Precisely which does emacs have (and NOT have)? > > > Please, someone, make an ascii or html table or even plain text > list of all these neat "new" non-standard ops that perl and > even php and ruby etc seem to have now, comparing them > to what Emacs has or don't have. I don't understand your question. Emacs' regexps facilities are explained in the manual, (info "(emacs)Regexps") . You will find on that page a link to further details for programmers. I don't know what new non-standard ops perl and php and ruby have, but if they aren't in the Emacs manual, then they most probably aren't in Emacs. > So you see a nifty regexp in that book, and you want to try > it in emacs --- what a bear, trying to convert it into > something that emacs understands. It sounds like you're trying to use a manual for one group of applications to learn another application. No question, that must be quite a bear. Probably easier to use the manual for Emacs to learn Emacs. Tyler ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <mailman.1449.1266510261.14305.help-gnu-emacs@gnu.org>]
* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? [not found] ` <mailman.1449.1266510261.14305.help-gnu-emacs@gnu.org> @ 2010-02-19 0:59 ` David Combs 2010-02-19 3:22 ` Tyler Smith 0 siblings, 1 reply; 16+ messages in thread From: David Combs @ 2010-02-19 0:59 UTC (permalink / raw) To: help-gnu-emacs In article <mailman.1449.1266510261.14305.help-gnu-emacs@gnu.org>, Tyler Smith <tyler.smith@eku.edu> wrote: >dkcombs@panix.com (David Combs) writes: > >> Subj: Perl, etc has these "?"-prefix modifiers/codes/whatever. >> Precisely which does emacs have (and NOT have)? >> >> >> Please, someone, make an ascii or html table or even plain text >> list of all these neat "new" non-standard ops that perl and >> even php and ruby etc seem to have now, comparing them >> to what Emacs has or don't have. > >I don't understand your question. Emacs' regexps facilities are >explained in the manual, (info "(emacs)Regexps") . You will find on that >page a link to further details for programmers. I don't know what new >non-standard ops perl and php and ruby have, ... Well, just get "Mastering Regular Expressions", by Jeffrey Friedl, 2nd edition -- it'll blow you away. The title maybe should have been Everything you wanted to know about regular expressions, but were afraid to ask. Plus a lot of stuff mayve you never did want to know, too. On regexps, this book is THE BIBLE! Next time in a bookstore and you spy one, have a look. Ten to one you'll buy the thing. David ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? 2010-02-19 0:59 ` David Combs @ 2010-02-19 3:22 ` Tyler Smith 0 siblings, 0 replies; 16+ messages in thread From: Tyler Smith @ 2010-02-19 3:22 UTC (permalink / raw) To: help-gnu-emacs dkcombs@panix.com (David Combs) writes: > In article <mailman.1449.1266510261.14305.help-gnu-emacs@gnu.org>, > Tyler Smith <tyler.smith@eku.edu> wrote: >>dkcombs@panix.com (David Combs) writes: >> >>> Subj: Perl, etc has these "?"-prefix modifiers/codes/whatever. >>> Precisely which does emacs have (and NOT have)? >>> >>> >>> Please, someone, make an ascii or html table or even plain text >>> list of all these neat "new" non-standard ops that perl and >>> even php and ruby etc seem to have now, comparing them >>> to what Emacs has or don't have. >> >>I don't understand your question. Emacs' regexps facilities are >>explained in the manual, (info "(emacs)Regexps") . You will find on that >>page a link to further details for programmers. I don't know what new >>non-standard ops perl and php and ruby have, ... > > Well, just get "Mastering Regular Expressions", by Jeffrey Friedl, > 2nd edition -- it'll blow you away. > I have already read this book, and it did inspire me to incorporate regexps into my emacs toolbox. However, it *does* include a fairly complete discussion of Emacs' regexps. So if you already have both that book and the built-in Emacs manual, what further documentation do you want? If you don't find them in the Emacs chapter of Friedl's book or the manual, they don't exist _in Emacs_. But maybe, your first post notwithstanding, you aren't really asking for new documentation, but in fact new features to be added to the regexps available in Emacs? That's another issue entirely, as other posters have commented on. Tyler ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? 2010-02-18 6:10 Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? David Combs ` (2 preceding siblings ...) [not found] ` <mailman.1449.1266510261.14305.help-gnu-emacs@gnu.org> @ 2010-02-24 19:54 ` Stefan Monnier 3 siblings, 0 replies; 16+ messages in thread From: Stefan Monnier @ 2010-02-24 19:54 UTC (permalink / raw) To: help-gnu-emacs > Please, someone, make an ascii or html table or even plain text > list of all these neat "new" non-standard ops that perl and > even php and ruby etc seem to have now, comparing them > to what Emacs has or don't have. As mentioned, Emacs regexps are fairly well described in the Emacs manual and the Elisp manual. Indeed Emacs regexps don't support all the fancy additions present in things like Perl. IIRC someone provided a patch for the look-ahead and look-behind features, but it got lost somewhere along the way. Emacs likes regexps, so it makes sense to support these things. At the same time, I've known them for a long time but have rarely found a need for them in my Emacs experience. Personally I'd like to add a DFA-based regex engine to Emacs, so as to get rid of the exponential backtracking problem that shows up every once in a while (it also has a few other advantages, such as the ability to do the regexp matching a chunk at a time). Stefan ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2010-02-24 19:54 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-02-18 6:10 Perl, etc has these "?"-prefix modifiers/codes/whatever. Precisely which does emacs have (and NOT have)? David Combs 2010-02-18 11:46 ` Pascal J. Bourguignon 2010-02-18 16:57 ` John Withers [not found] ` <mailman.1450.1266512270.14305.help-gnu-emacs@gnu.org> 2010-02-18 19:02 ` Pascal J. Bourguignon 2010-02-18 21:38 ` John Bokma 2010-02-18 21:42 ` John Withers [not found] ` <mailman.1460.1266529372.14305.help-gnu-emacs@gnu.org> 2010-02-19 0:53 ` David Combs 2010-02-19 1:06 ` Pascal J. Bourguignon 2010-02-19 2:36 ` John Withers [not found] ` <mailman.1470.1266547034.14305.help-gnu-emacs@gnu.org> 2010-02-19 6:48 ` Tim X 2010-02-20 21:14 ` John Withers [not found] ` <mailman.1559.1266700478.14305.help-gnu-emacs@gnu.org> 2010-02-23 12:33 ` Tim Landscheidt 2010-02-18 16:23 ` Tyler Smith [not found] ` <mailman.1449.1266510261.14305.help-gnu-emacs@gnu.org> 2010-02-19 0:59 ` David Combs 2010-02-19 3:22 ` Tyler Smith 2010-02-24 19:54 ` Stefan Monnier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).