all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* I need help with a regular expression
@ 2010-04-16 21:25 Cecil Westerhof
  2010-04-16 22:16 ` José A. Romero L.
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Cecil Westerhof @ 2010-04-16 21:25 UTC (permalink / raw)
  To: help-gnu-emacs

I have written some code to count the number of functions in a buffer.
At the moment I use the following regular expression for this:
    "^(defun "

This works fine, but then the defun's have to be on the start of the
line. This is the most logical, but it is better to be save as sorry.
This is why I wanted to write a more robust regular expression. I was
thinking about something like:
    "^[^;]+(defun "

But that does not work. It marks the following completely, instead of
the three at its own:
    (defun a () (message "a"))
    (defun b () (message "b"))
    (defun c () (message "c"))

Why is this? And how can I make a regular expression that does what I
want?

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: I need help with a regular expression
  2010-04-16 21:25 I need help with a regular expression Cecil Westerhof
@ 2010-04-16 22:16 ` José A. Romero L.
  2010-04-17  6:20   ` Cecil Westerhof
  2010-05-04 21:21   ` Lennart Borgman
  2010-05-05 15:51 ` J. David Boyd
       [not found] ` <mailman.7.1273074686.2155.help-gnu-emacs@gnu.org>
  2 siblings, 2 replies; 6+ messages in thread
From: José A. Romero L. @ 2010-04-16 22:16 UTC (permalink / raw)
  To: help-gnu-emacs

On 16 Kwi, 23:25, Cecil Westerhof <Ce...@decebal.nl> wrote:
(...)
> thinking about something like:
>     "^[^;]+(defun "
>
> But that does not work. It marks the following completely, instead of
> the three at its own:
>     (defun a () (message "a"))
>     (defun b () (message "b"))
>     (defun c () (message "c"))
>
> Why is this? And how can I make a regular expression that does what I
> want?

Because  emacs  regular expressions are by default multi-line? All in
all an emacs buffer is just a very long stream of characters, so this
approach makes sense, I guess.

Try this instead: "^[^;\n]*(defun " - and remember: regexp-builder is
your friend ;-)

Cheers,
--
José A. Romero L.
escherdragon at gmail
"We who cut mere stones must always be envisioning cathedrals."
(Quarry worker's creed)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: I need help with a regular expression
  2010-04-16 22:16 ` José A. Romero L.
@ 2010-04-17  6:20   ` Cecil Westerhof
  2010-05-04 21:21   ` Lennart Borgman
  1 sibling, 0 replies; 6+ messages in thread
From: Cecil Westerhof @ 2010-04-17  6:20 UTC (permalink / raw)
  To: help-gnu-emacs

José A. Romero L. <escherdragon@gmail.com> writes:

>> thinking about something like:
>>     "^[^;]+(defun "
>>
>> But that does not work. It marks the following completely, instead of
>> the three at its own:
>>     (defun a () (message "a"))
>>     (defun b () (message "b"))
>>     (defun c () (message "c"))
>>
>> Why is this? And how can I make a regular expression that does what I
>> want?
>
> Because  emacs  regular expressions are by default multi-line? All in
> all an emacs buffer is just a very long stream of characters, so this
> approach makes sense, I guess.
>
> Try this instead: "^[^;\n]*(defun " - and remember: regexp-builder is
> your friend ;-)

I thought I tried that, but properly not, otherwise I should not have
asked this question. Thanks.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: I need help with a regular expression
  2010-04-16 22:16 ` José A. Romero L.
  2010-04-17  6:20   ` Cecil Westerhof
@ 2010-05-04 21:21   ` Lennart Borgman
  1 sibling, 0 replies; 6+ messages in thread
From: Lennart Borgman @ 2010-05-04 21:21 UTC (permalink / raw)
  To: José A. Romero L.; +Cc: help-gnu-emacs

2010/4/17 José A. Romero L. <escherdragon@gmail.com>:
>
> Try this instead: "^[^;\n]*(defun " - and remember: regexp-builder is
> your friend ;-)

And rx of course, which can be used in regexp-builder too.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: I need help with a regular expression
  2010-04-16 21:25 I need help with a regular expression Cecil Westerhof
  2010-04-16 22:16 ` José A. Romero L.
@ 2010-05-05 15:51 ` J. David Boyd
       [not found] ` <mailman.7.1273074686.2155.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 6+ messages in thread
From: J. David Boyd @ 2010-05-05 15:51 UTC (permalink / raw)
  To: help-gnu-emacs

Cecil Westerhof <Cecil@decebal.nl> writes:

> I have written some code to count the number of functions in a buffer.
> At the moment I use the following regular expression for this:
>     "^(defun "
>
> This works fine, but then the defun's have to be on the start of the
> line. This is the most logical, but it is better to be save as sorry.
> This is why I wanted to write a more robust regular expression. I was
> thinking about something like:
>     "^[^;]+(defun "
>
> But that does not work. It marks the following completely, instead of
> the three at its own:
>     (defun a () (message "a"))
>     (defun b () (message "b"))
>     (defun c () (message "c"))
>
> Why is this? And how can I make a regular expression that does what I
> want?

There's a book that explains this, sorry but I can't remember the name
of it, something to do, of course, with "Regular Expressions".

The problem is that the expression you gave it, is, as the author
explains, "hungry".

It tries to match as much as possible, not as least as possible.

In your case, it sees the '^' (start of line, then looks as far as
possible for the 'defun'.  

You did have a blank line after your last defun, right?  Otherwise, it
would have kept on going.

Go to O'Reilly, and hunt for books on regular expressions.  It was only
a few 100 pages, good price, and explained a great deal.

Good luck!





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: I need help with a regular expression
       [not found] ` <mailman.7.1273074686.2155.help-gnu-emacs@gnu.org>
@ 2010-05-06 10:17   ` Xah Lee
  0 siblings, 0 replies; 6+ messages in thread
From: Xah Lee @ 2010-05-06 10:17 UTC (permalink / raw)
  To: help-gnu-emacs

On May 5, 8:51 am, da...@adboyd.com (J. David Boyd) wrote:
> Cecil Westerhof <Ce...@decebal.nl> writes:
> > I have written some code to count the number of functions in a buffer.
> > At the moment I use the following regular expression for this:
> >     "^(defun "
>
> > This works fine, but then the defun's have to be on the start of the
> > line. This is the most logical, but it is better to be save as sorry.
> > This is why I wanted to write a more robust regular expression. I was
> > thinking about something like:
> >     "^[^;]+(defun "
>
> > But that does not work. It marks the following completely, instead of
> > the three at its own:
> >     (defun a () (message "a"))
> >     (defun b () (message "b"))
> >     (defun c () (message "c"))
>
> > Why is this? And how can I make a regular expression that does what I
> > want?
>
> There's a book that explains this, sorry but I can't remember the name
> of it, something to do, of course, with "Regular Expressions".
>
> The problem is that the expression you gave it, is, as the author
> explains, "hungry".
>
> It tries to match as much as possible, not as least as possible.
>
> In your case, it sees the '^' (start of line, then looks as far as
> possible for the 'defun'.  
>
> You did have a blank line after your last defun, right?  Otherwise, it
> would have kept on going.
>
> Go to O'Reilly, and hunt for books on regular expressions.  It was only
> a few 100 pages, good price, and explained a great deal.
>
> Good luck!

i read the first edition (1997) in 1999.
(see perl book reviews here http://xahlee.org/UnixResource_dir/perlr.html
)

Last i looked, the 3rd edition in 2006, they dropped coverage on emacs
regex.

in general, i don't recommend the book, unless your do regex research.
Regex is useful for matching simple words or phrases. When your need
for pattern match text is slightly more complicated than phrases,
regex quickly become not useful.

I've also came across a page that heavily criticize the book, citing
many errors, and showing another regex engine that's way more faster.
(i haven't verified it or read it in depth)

http://swtch.com/~rsc/regexp/regexp1.html

quote:
«Finally, any discussion of regular expressions would be incomplete
without mentioning Jeffrey Friedl's book Mastering Regular
Expressions, perhaps the most popular reference among today's
programmers. Friedl's book teaches programmers how best to use today's
regular expression implementations, but not how best to implement
them. What little text it devotes to implementation issues perpetuates
the widespread belief that recursive backtracking is the only way to
simulate an NFA. Friedl makes it clear that he neither understands nor
respects the underlying theory.
»

also, today there's lots tools for text pattern matching. One i
recommend is Parsing Expression Grammar. There are 2 emacs
implementation (on emacswiki.org), but both are hard to use and lack
much documentation. (the “regular expression” we know today since unix
grep of 1990s or earlier, is derived by happenstance from 4 decade old
theory on parsing, based on then so-called theories of so-called
automata)

for your need, i just recommend reading the emacs info page on its
regex in detail.

• Text Pattern Matching in Emacs (emacs regex tutorial)
  http://xahlee.org/emacs/emacs_regex.html

• Regular Expressions - GNU Emacs Lisp Reference Manual
  http://xahlee.org/elisp/Regular-Expressions.html

for some more opinions on regex, pattern matching, parsing, see:

• Pattern Matching vs Lexical Grammar Specification
  http://xahlee.org/cmaci/notation/pattern_matching_vs_pattern_spec.html

  Xah
∑ http://xahlee.org/^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-05-06 10:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-16 21:25 I need help with a regular expression Cecil Westerhof
2010-04-16 22:16 ` José A. Romero L.
2010-04-17  6:20   ` Cecil Westerhof
2010-05-04 21:21   ` Lennart Borgman
2010-05-05 15:51 ` J. David Boyd
     [not found] ` <mailman.7.1273074686.2155.help-gnu-emacs@gnu.org>
2010-05-06 10:17   ` Xah Lee

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.