all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Xah Lee <xahlee@gmail.com>
To: help-gnu-emacs@gnu.org
Subject: Re: I need help with a regular expression
Date: Thu, 6 May 2010 03:17:16 -0700 (PDT)	[thread overview]
Message-ID: <bee00772-01be-406c-b3da-837fff039839@a34g2000yqn.googlegroups.com> (raw)
In-Reply-To: mailman.7.1273074686.2155.help-gnu-emacs@gnu.org

On May 5, 8:51 am, da...@adboyd.com (J. David Boyd) wrote:
> Cecil Westerhof <Ce...@decebal.nl> writes:
> > I have written some code to count the number of functions in a buffer.
> > At the moment I use the following regular expression for this:
> >     "^(defun "
>
> > This works fine, but then the defun's have to be on the start of the
> > line. This is the most logical, but it is better to be save as sorry.
> > This is why I wanted to write a more robust regular expression. I was
> > thinking about something like:
> >     "^[^;]+(defun "
>
> > But that does not work. It marks the following completely, instead of
> > the three at its own:
> >     (defun a () (message "a"))
> >     (defun b () (message "b"))
> >     (defun c () (message "c"))
>
> > Why is this? And how can I make a regular expression that does what I
> > want?
>
> There's a book that explains this, sorry but I can't remember the name
> of it, something to do, of course, with "Regular Expressions".
>
> The problem is that the expression you gave it, is, as the author
> explains, "hungry".
>
> It tries to match as much as possible, not as least as possible.
>
> In your case, it sees the '^' (start of line, then looks as far as
> possible for the 'defun'.  
>
> You did have a blank line after your last defun, right?  Otherwise, it
> would have kept on going.
>
> Go to O'Reilly, and hunt for books on regular expressions.  It was only
> a few 100 pages, good price, and explained a great deal.
>
> Good luck!

i read the first edition (1997) in 1999.
(see perl book reviews here http://xahlee.org/UnixResource_dir/perlr.html
)

Last i looked, the 3rd edition in 2006, they dropped coverage on emacs
regex.

in general, i don't recommend the book, unless your do regex research.
Regex is useful for matching simple words or phrases. When your need
for pattern match text is slightly more complicated than phrases,
regex quickly become not useful.

I've also came across a page that heavily criticize the book, citing
many errors, and showing another regex engine that's way more faster.
(i haven't verified it or read it in depth)

http://swtch.com/~rsc/regexp/regexp1.html

quote:
«Finally, any discussion of regular expressions would be incomplete
without mentioning Jeffrey Friedl's book Mastering Regular
Expressions, perhaps the most popular reference among today's
programmers. Friedl's book teaches programmers how best to use today's
regular expression implementations, but not how best to implement
them. What little text it devotes to implementation issues perpetuates
the widespread belief that recursive backtracking is the only way to
simulate an NFA. Friedl makes it clear that he neither understands nor
respects the underlying theory.
»

also, today there's lots tools for text pattern matching. One i
recommend is Parsing Expression Grammar. There are 2 emacs
implementation (on emacswiki.org), but both are hard to use and lack
much documentation. (the “regular expression” we know today since unix
grep of 1990s or earlier, is derived by happenstance from 4 decade old
theory on parsing, based on then so-called theories of so-called
automata)

for your need, i just recommend reading the emacs info page on its
regex in detail.

• Text Pattern Matching in Emacs (emacs regex tutorial)
  http://xahlee.org/emacs/emacs_regex.html

• Regular Expressions - GNU Emacs Lisp Reference Manual
  http://xahlee.org/elisp/Regular-Expressions.html

for some more opinions on regex, pattern matching, parsing, see:

• Pattern Matching vs Lexical Grammar Specification
  http://xahlee.org/cmaci/notation/pattern_matching_vs_pattern_spec.html

  Xah
∑ http://xahlee.org/

      parent reply	other threads:[~2010-05-06 10:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-16 21:25 I need help with a regular expression Cecil Westerhof
2010-04-16 22:16 ` José A. Romero L.
2010-04-17  6:20   ` Cecil Westerhof
2010-05-04 21:21   ` Lennart Borgman
2010-05-05 15:51 ` J. David Boyd
     [not found] ` <mailman.7.1273074686.2155.help-gnu-emacs@gnu.org>
2010-05-06 10:17   ` Xah Lee [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bee00772-01be-406c-b3da-837fff039839@a34g2000yqn.googlegroups.com \
    --to=xahlee@gmail.com \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.