unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: ken <gebser@mousecar.com>
To: PJ Weisberg <pj@irregularexpressions.net>
Cc: GNU Emacs List <help-gnu-emacs@gnu.org>
Subject: Re: bug in elisp... or in elisper???
Date: Wed, 23 Mar 2011 10:18:34 -0400	[thread overview]
Message-ID: <4D8A013A.1030804@mousecar.com> (raw)
In-Reply-To: <AANLkTi=GgPtD8ZEj1S1H1BnM7HaNi0qNypEp505AfU7O@mail.gmail.com>

On 03/22/2011 08:15 PM PJ Weisberg wrote:
> On 3/22/11, ken <gebser@mousecar.com> wrote:
>> Fellow elispers,
>>
>> Something seems to be amiss in the search syntax here:
>>
>>  (setq aname-re-str
>> "<a\\([\s-\\|\n]+?\\)name=\"\\(.*?\\)\"\\([\s-\\|\n]*?\\)>\\(\\(.\\|\n\\)*?\\)</a\\(\\(
>> \\|\t\\|\n\\)*?\\)>" )
>>
> ...
>> The problem is that the 5th match-string should be either empty or
>> whitespace.  But it consistently contains the last character of of the
>> 4th match-string.  And these two matches are separated by the literal
>> character string, "</a"!!  What's up with this?
> 
> You miscounted your '('s.  The fifth group IS inside the fourth group,
> matching . or \n.
> 
> -PJ

It wasn't that I miscounted.  I read a doc which said that I couldn't
embed one potential match expression inside another.  (I mentioned this,
I believe, in a previous email.)  So I figured that, if this wasn't
allowed, I certainly couldn't count each expression inside a pair of
parens as another match.  But it seems that doc was wrong.

So this is actually good news: my RE works just as I want it to *and*
there's no bug in elisp to contend with.  I am, however, starting to
have trust issues with documentation I find on the web.  But I have you
guys here on this list as a reality check.

If one match expression *can* be embedded within another, this is good
news: it means I can write more comprehensive REs.  I.e., instead of
writing RE #1 to locate a section of text and then RE #2 to parse just
that section, REs #1 and #2 can be combined into one RE.  Radically cool.

So some further questions:

You might have noticed I use "\\([\s-\\|\n]+?\\)" to non-greedily match
one or more whitespace characters.  Can one "\\[...\\] be nested inside
another...?  e.g., "[[\s-\\|\n]+?]" or some syntax like that?

The "specialness" of "." seems to be lost when inside brackets; that is,
in "[.\n]*?" it seems to represent a regular period (.) rather than "any
character except newline".  Is there some way to bring back that
specialness?  Or is there some other RE to represent "multiple instances
of any character, including a newline"?

Is it actually true (what the docs say) that there's a limit of nine
sub-expression match-strings per RE?  Or can I do, e.g., "(match-string
12)" and "(match-string 15)"?  What is the actual limit?  Whatever it
is, is this hard-coded into elisp... or can it be changed/configured to
something else?


Thanks for the illumination.




  reply	other threads:[~2011-03-23 14:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-22 23:37 bug in elisp... or in elisper??? ken
2011-03-23  0:15 ` PJ Weisberg
2011-03-23 14:18   ` ken [this message]
2011-03-25  3:44     ` Kevin Rodgers
     [not found]   ` <mailman.3.1300889938.15160.help-gnu-emacs@gnu.org>
2011-03-23 15:27     ` Stefan Monnier
     [not found] <mailman.11.1300837050.13753.help-gnu-emacs@gnu.org>
2011-03-22 23:50 ` David Kastrup
2011-03-23 15:21   ` ken
2011-03-23 15:38     ` David Kastrup
2011-03-23  7:01 ` Tim X
2011-03-23 15:56   ` ken

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D8A013A.1030804@mousecar.com \
    --to=gebser@mousecar.com \
    --cc=help-gnu-emacs@gnu.org \
    --cc=pj@irregularexpressions.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).