bug in elisp... or in elisper???

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* bug in elisp... or in  elisper???
@ 2011-03-22 23:37 ken
  2011-03-23  0:15 ` PJ Weisberg
  0 siblings, 1 reply; 13+ messages in thread
From: ken @ 2011-03-22 23:37 UTC (permalink / raw)
  To: GNU Emacs List

Fellow elispers,

Something seems to be amiss in the search syntax here:

 (setq aname-re-str
"<a\\([\s-\\|\n]+?\\)name=\"\\(.*?\\)\"\\([\s-\\|\n]*?\\)>\\(\\(.\\|\n\\)*?\\)</a\\(\\(
\\|\t\\|\n\\)*?\\)>" )

;;Here's a function to use the above RE and return diagnostics:

(defun test-aname-search ()
  (interactive)
  (re-search-forward aname-re-str)
  (message "1: \"%s\" 2: \"%s\" 3: \"%s\" 4: \"%s\" 5: \"%s\" 6: \"%s\"
7: \"%s\" 8: \"%s\""
	   (match-string 1)
	   (match-string 2)
	   (match-string 3)
	   (match-string 4)
	   (match-string 5)
	   (match-string 6)
	   (match-string 7)
	   (match-string 8)))


Here are some strings to search on:

<h3><a name="thisname">Any Text--
Hot Stuff</a></h3>

<h1
class="title"
><a
name="heres-a-name"
>
the</a
></h1
>

<h3><a name="duplicate">Any Text--
Hot Crud</a></h3>


The problem is that the 5th match-string should be either empty or
whitespace.  But it consistently contains the last character of of the
4th match-string.  And these two matches are separated by the literal
character string, "</a"!!  What's up with this?


Wishing I hadn't quit beer,
ken

-- 
Anything is easy if you know how to do it.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug in elisp... or in  elisper???
       [not found] <mailman.11.1300837050.13753.help-gnu-emacs@gnu.org>
@ 2011-03-22 23:50 ` David Kastrup
  2011-03-23 15:21   ` ken
  2011-03-23  7:01 ` bug in elisp... or in elisper??? Tim X
  1 sibling, 1 reply; 13+ messages in thread
From: David Kastrup @ 2011-03-22 23:50 UTC (permalink / raw)
  To: help-gnu-emacs

ken <gebser@mousecar.com> writes:

> Fellow elispers,
>
> Something seems to be amiss in the search syntax here:
>
>  (setq aname-re-str
> "<a\\([\s-\\|\n]+?\\)name=\"\\(.*?\\)\"\\([\s-\\|\n]*?\\)>\\(\\(.\\|\n\\)*?\\)</a\\(\\(
> \\|\t\\|\n\\)*?\\)>" )
>
> ;;Here's a function to use the above RE and return diagnostics:
>
> (defun test-aname-search ()
>   (interactive)
>   (re-search-forward aname-re-str)
>   (message "1: \"%s\" 2: \"%s\" 3: \"%s\" 4: \"%s\" 5: \"%s\" 6: \"%s\"
> 7: \"%s\" 8: \"%s\""
> 	   (match-string 1)
> 	   (match-string 2)
> 	   (match-string 3)
> 	   (match-string 4)
> 	   (match-string 5)
> 	   (match-string 6)
> 	   (match-string 7)
> 	   (match-string 8)))
>
>
> The problem is that the 5th match-string should be either empty or
> whitespace.

Uh what?

\\(.\\|\n\\)*?

Matches _any_ character.

> But it consistently contains the last character of of the 4th
> match-string.

That is because it _is_ the last matched character of the 4th
match-string.

> And these two matches are separated by the literal
> character string, "</a"!!  What's up with this?

Your ability to count \\( strings?  They are assigned match numbers from
left to right, regardless of whether they are nested or not.

-- 
David Kastrup


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug in elisp... or in elisper???
  2011-03-22 23:37 ken
@ 2011-03-23  0:15 ` PJ Weisberg
  2011-03-23 14:18   ` ken
       [not found]   ` <mailman.3.1300889938.15160.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 13+ messages in thread
From: PJ Weisberg @ 2011-03-23  0:15 UTC (permalink / raw)
  To: gebser; +Cc: GNU Emacs List

On 3/22/11, ken <gebser@mousecar.com> wrote:
> Fellow elispers,
>
> Something seems to be amiss in the search syntax here:
>
>  (setq aname-re-str
> "<a\\([\s-\\|\n]+?\\)name=\"\\(.*?\\)\"\\([\s-\\|\n]*?\\)>\\(\\(.\\|\n\\)*?\\)</a\\(\\(
> \\|\t\\|\n\\)*?\\)>" )
>
...
> The problem is that the 5th match-string should be either empty or
> whitespace.  But it consistently contains the last character of of the
> 4th match-string.  And these two matches are separated by the literal
> character string, "</a"!!  What's up with this?

You miscounted your '('s.  The fifth group IS inside the fourth group,
matching . or \n.

-PJ



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug in elisp... or in  elisper???
       [not found] <mailman.11.1300837050.13753.help-gnu-emacs@gnu.org>
  2011-03-22 23:50 ` bug in elisp... or in elisper??? David Kastrup
@ 2011-03-23  7:01 ` Tim X
  2011-03-23 15:56   ` ken
  1 sibling, 1 reply; 13+ messages in thread
From: Tim X @ 2011-03-23  7:01 UTC (permalink / raw)
  To: help-gnu-emacs

ken <gebser@mousecar.com> writes:

> Fellow elispers,
>
> Something seems to be amiss in the search syntax here:
>
>  (setq aname-re-str
> "<a\\([\s-\\|\n]+?\\)name=\"\\(.*?\\)\"\\([\s-\\|\n]*?\\)>\\(\\(.\\|\n\\)*?\\)</a\\(\\(
> \\|\t\\|\n\\)*?\\)>" )
>
> ;;Here's a function to use the above RE and return diagnostics:
>
> (defun test-aname-search ()
>   (interactive)
>   (re-search-forward aname-re-str)
>   (message "1: \"%s\" 2: \"%s\" 3: \"%s\" 4: \"%s\" 5: \"%s\" 6: \"%s\"
> 7: \"%s\" 8: \"%s\""
> 	   (match-string 1)
> 	   (match-string 2)
> 	   (match-string 3)
> 	   (match-string 4)
> 	   (match-string 5)
> 	   (match-string 6)
> 	   (match-string 7)
> 	   (match-string 8)))
>
>
> Here are some strings to search on:
>
> <h3><a name="thisname">Any Text--
> Hot Stuff</a></h3>
>
> <h1
> class="title"
>><a
> name="heres-a-name"
>>
> the</a
>></h1
>>
>
> <h3><a name="duplicate">Any Text--
> Hot Crud</a></h3>
>
>
> The problem is that the 5th match-string should be either empty or
> whitespace.  But it consistently contains the last character of of the
> 4th match-string.  And these two matches are separated by the literal
> character string, "</a"!!  What's up with this?
>
>
> Wishing I hadn't quit beer,
> ken

I don't think your re is matching what you think it is. Strong recommend
you try using re-builder as this will give you a visual representation
of what your re is matching (with different colours representing the
various match groups).

Tim

-- 
tcross (at) rapttech dot com dot au


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug in elisp... or in elisper???
  2011-03-23  0:15 ` PJ Weisberg
@ 2011-03-23 14:18   ` ken
  2011-03-25  3:44     ` Kevin Rodgers
       [not found]   ` <mailman.3.1300889938.15160.help-gnu-emacs@gnu.org>
  1 sibling, 1 reply; 13+ messages in thread
From: ken @ 2011-03-23 14:18 UTC (permalink / raw)
  To: PJ Weisberg; +Cc: GNU Emacs List

On 03/22/2011 08:15 PM PJ Weisberg wrote:
> On 3/22/11, ken <gebser@mousecar.com> wrote:
>> Fellow elispers,
>>
>> Something seems to be amiss in the search syntax here:
>>
>>  (setq aname-re-str
>> "<a\\([\s-\\|\n]+?\\)name=\"\\(.*?\\)\"\\([\s-\\|\n]*?\\)>\\(\\(.\\|\n\\)*?\\)</a\\(\\(
>> \\|\t\\|\n\\)*?\\)>" )
>>
> ...
>> The problem is that the 5th match-string should be either empty or
>> whitespace.  But it consistently contains the last character of of the
>> 4th match-string.  And these two matches are separated by the literal
>> character string, "</a"!!  What's up with this?
> 
> You miscounted your '('s.  The fifth group IS inside the fourth group,
> matching . or \n.
> 
> -PJ

It wasn't that I miscounted.  I read a doc which said that I couldn't
embed one potential match expression inside another.  (I mentioned this,
I believe, in a previous email.)  So I figured that, if this wasn't
allowed, I certainly couldn't count each expression inside a pair of
parens as another match.  But it seems that doc was wrong.

So this is actually good news: my RE works just as I want it to *and*
there's no bug in elisp to contend with.  I am, however, starting to
have trust issues with documentation I find on the web.  But I have you
guys here on this list as a reality check.

If one match expression *can* be embedded within another, this is good
news: it means I can write more comprehensive REs.  I.e., instead of
writing RE #1 to locate a section of text and then RE #2 to parse just
that section, REs #1 and #2 can be combined into one RE.  Radically cool.

So some further questions:

You might have noticed I use "\\([\s-\\|\n]+?\\)" to non-greedily match
one or more whitespace characters.  Can one "\\[...\\] be nested inside
another...?  e.g., "[[\s-\\|\n]+?]" or some syntax like that?

The "specialness" of "." seems to be lost when inside brackets; that is,
in "[.\n]*?" it seems to represent a regular period (.) rather than "any
character except newline".  Is there some way to bring back that
specialness?  Or is there some other RE to represent "multiple instances
of any character, including a newline"?

Is it actually true (what the docs say) that there's a limit of nine
sub-expression match-strings per RE?  Or can I do, e.g., "(match-string
12)" and "(match-string 15)"?  What is the actual limit?  Whatever it
is, is this hard-coded into elisp... or can it be changed/configured to
something else?

Thanks for the illumination.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug in elisp... or in  elisper???
  2011-03-22 23:50 ` bug in elisp... or in elisper??? David Kastrup
@ 2011-03-23 15:21   ` ken
  2011-03-23 15:38     ` David Kastrup
  0 siblings, 1 reply; 13+ messages in thread
From: ken @ 2011-03-23 15:21 UTC (permalink / raw)
  To: David Kastrup; +Cc: help-gnu-emacs

On 03/22/2011 07:50 PM David Kastrup wrote:
> ken <gebser@mousecar.com> writes:
> 
>> Fellow elispers,
>>
>> Something seems to be amiss in the search syntax here:
>>
>>  (setq aname-re-str
>> "<a\\([\s-\\|\n]+?\\)name=\"\\(.*?\\)\"\\([\s-\\|\n]*?\\)>\\(\\(.\\|\n\\)*?\\)</a\\(\\(
>> \\|\t\\|\n\\)*?\\)>" )
>>
>> ....
> 
> Uh what?
> 
> \\(.\\|\n\\)*?
> 
> Matches _any_ character.

Yes.  Why not?  Users' texts can and do contain any sort of character,
multiple instances of them in fact... and, moreover, in any languages'
character sets they might want.  They're allowed to do this.

Perhaps you're perplexed because you're not noting the RE immediately
following: "</a".  IOW, elisp should keep reading chars until the first
instance of "</a".  Seems to me to be a perfectly rational request.  In
the small bit of testing I've done, it seems also to work just fine.

> 
>> But it consistently contains the last character of of the 4th
>> match-string.
> 
> That is because it _is_ the last matched character of the 4th
> match-string.
> 
>> And these two matches are separated by the literal
>> character string, "</a"!!  What's up with this?
> 
> Your ability to count \\( strings?  They are assigned match numbers from
> left to right, regardless of whether they are nested or not.

An inability to count would be the most derogatory interpretation.  But
the function I wrote (here elided) actually did the counting for me, so
that would not be a cogent interpretation.  A mere mortal, I wasn't born
knowing that REs could be nested (documentation I read in fact stated
they couldn't), of course then also not that in such cases both inner
and outer REs are counted separately by match-string.  So once again,
the more charitable interpretation is the more perspicacious... and vice
versa.

-- 
One is not superior merely because one
sees the world as odious.
                -- Chateaubriand (1768-1848)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug in elisp... or in elisper???
       [not found]   ` <mailman.3.1300889938.15160.help-gnu-emacs@gnu.org>
@ 2011-03-23 15:27     ` Stefan Monnier
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Monnier @ 2011-03-23 15:27 UTC (permalink / raw)
  To: help-gnu-emacs

> I am, however, starting to have trust issues with documentation I find
> on the web.

Don't believe everything you read.

> Is it actually true (what the docs say) that there's a limit of nine
> sub-expression match-strings per RE?

No.

> Or can I do, e.g., "(match-string 12)" and "(match-string 15)"?

Yes.

> What is the actual limit?

The limit currently is around 255 sub-groups (or maybe 127), IIRC.
OTOH back-references can only refer to subgroups 1-9 (because we
haven't bothered to introduce a syntax for other cases).

> Whatever it is, is this hard-coded into elisp... or can it be
> changed/configured to something else?

It's hardcoded in the C code of the regexp engine.

BTW, I recommend you use the "online" documentation distributed with
Emacs.  There are function and variable docstrings (C-h f, C-h v), plus
Info documents (Emacs manual, Elisp manual).  We work pretty hard to keep
those up-to-date and of good quality.  And if you find something to be
untrue in there, please report it via M-x report-emacs-bug.

        Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug in elisp... or in  elisper???
  2011-03-23 15:21   ` ken
@ 2011-03-23 15:38     ` David Kastrup
  2011-03-23 16:40       ` Irrelevant digression [was: Re: bug in elisp... or in elisper???] ken
  0 siblings, 1 reply; 13+ messages in thread
From: David Kastrup @ 2011-03-23 15:38 UTC (permalink / raw)
  To: gebser; +Cc: help-gnu-emacs

ken <gebser@mousecar.com> writes:

> An inability to count would be the most derogatory interpretation.
> But the function I wrote (here elided) actually did the counting for
> me, so that would not be a cogent interpretation.  A mere mortal, I
> wasn't born knowing that REs could be nested (documentation I read in
> fact stated they couldn't),

Emacs comes with its own hyperlinked, up to date, maintained, indexed
fast documentation accessible via Help menu and keybindings.

There is no reason to promote random garbage found somewhere on the
internet to "documentation".  In particular not concerning software that
has a history of 30 years, where consequently most documentation in
existence that might at one point even have been accurate is no longer
so due to being prehistoric.

Still I have my doubts that the documentation you are alluding to even
was ever part of Emacs.

> of course then also not that in such cases both inner and outer REs
> are counted separately by match-string.  So once again, the more
> charitable interpretation is the more perspicacious... and vice versa.

Care to provide a pointer to the "documentation" you are referring to?
While I have my doubts it will lead to a much more charitable
interpretation, I certainly am willing to let myself be surprised.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug in elisp... or in  elisper???
  2011-03-23  7:01 ` bug in elisp... or in elisper??? Tim X
@ 2011-03-23 15:56   ` ken
  0 siblings, 0 replies; 13+ messages in thread
From: ken @ 2011-03-23 15:56 UTC (permalink / raw)
  To: tcross; +Cc: help-gnu-emacs


Anything is easy if you know how to do it.


On 03/23/2011 03:01 AM Tim X wrote:
> ken <gebser@mousecar.com> writes:
> 
>> Fellow elispers,
>>
>> Something seems to be amiss in the search syntax here:
>>
>>  (setq aname-re-str
>> "<a\\([\s-\\|\n]+?\\)name=\"\\(.*?\\)\"\\([\s-\\|\n]*?\\)>\\(\\(.\\|\n\\)*?\\)</a\\(\\(
>> \\|\t\\|\n\\)*?\\)>" )
>>
>> ;;Here's a function to use the above RE and return diagnostics:
>>
>> (defun test-aname-search ()
>>   (interactive)
>>   (re-search-forward aname-re-str)
>>   (message "1: \"%s\" 2: \"%s\" 3: \"%s\" 4: \"%s\" 5: \"%s\" 6: \"%s\"
>> 7: \"%s\" 8: \"%s\""
>> 	   (match-string 1)
>> 	   (match-string 2)
>> 	   (match-string 3)
>> 	   (match-string 4)
>> 	   (match-string 5)
>> 	   (match-string 6)
>> 	   (match-string 7)
>> 	   (match-string 8)))
>>
>>
>> Here are some strings to search on:
>>
>> <h3><a name="thisname">Any Text--
>> Hot Stuff</a></h3>
>>
>> <h1
>> class="title"
>>> <a
>> name="heres-a-name"
>> the</a
>>> </h1
>>>
>> <h3><a name="duplicate">Any Text--
>> Hot Crud</a></h3>
>>
>>
>> The problem is that the 5th match-string should be either empty or
>> whitespace.  But it consistently contains the last character of of the
>> 4th match-string.  And these two matches are separated by the literal
>> character string, "</a"!!  What's up with this?
>>
>>
>> Wishing I hadn't quit beer,
>> ken
> 
> I don't think your re is matching what you think it is. Strong recommend
> you try using re-builder as this will give you a visual representation
> of what your re is matching (with different colours representing the
> various match groups).
> 
> Tim

Well, I was missing a crucial bit of knowledge about REs (explained in
two previous posts here) and that was causing me to misinterpret
results.  PJ's reply pointed me in the direction I needed to go to
figure out what the problem was.  And I think it was a mistake for me to
post such a complex example, but I couldn't think of how else to do it.

I read mention of re-builder, but must admit I haven't tried it yet.
With your recommendation, I'm sure I'll be giving it a try on some
future RE puzzle.  The mere fact that this tool exists is comforting...
tells me that I'm not the only one who's occasionally perplexed by REs.

Thanks for the suggestion.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Irrelevant digression [was: Re: bug in elisp... or in  elisper???]
  2011-03-23 15:38     ` David Kastrup
@ 2011-03-23 16:40       ` ken
  2011-03-23 16:52         ` Le Wang
  0 siblings, 1 reply; 13+ messages in thread
From: ken @ 2011-03-23 16:40 UTC (permalink / raw)
  To: David Kastrup; +Cc: help-gnu-emacs

On 03/23/2011 11:38 AM David Kastrup wrote:
> ken <gebser@mousecar.com> writes:
> 
>> An inability to count would be the most derogatory interpretation.
>> But the function I wrote (here elided) actually did the counting for
>> me, so that would not be a cogent interpretation.  A mere mortal, I
>> wasn't born knowing that REs could be nested (documentation I read in
>> fact stated they couldn't),
> 
> Emacs comes with its own hyperlinked, up to date, maintained, indexed
> fast documentation accessible via Help menu and keybindings.

Thanks, David, but I knew that already.  Though I've read quite a bit of
it, admittedly, I didn't read the entirety of the emacs and elisp
documentation.  I'm sure you're not suggesting that as requisite
preparation for writing a few elisp functions as that would preclude
most all of us from ever attempting it.

> 
> There is no reason to promote random garbage found somewhere on the
> internet to "documentation".  In particular not concerning software that
> has a history of 30 years, where consequently most documentation in
> existence that might at one point even have been accurate is no longer
> so due to being prehistoric.

And I certainly didn't "promote" it.  The web is what it is.  Haven't
you ever googled for something?

> 
> Still I have my doubts that the documentation you are alluding to even
> was ever part of Emacs.

Someone had a webpage with information on it, much, perhaps most, of it
good information.  I never said that webpage was "part of Emacs".  It's
the web.  Somebody made a mistake.  Humans do that occasionally.

> 
>> of course then also not that in such cases both inner and outer REs
>> are counted separately by match-string.  So once again, the more
>> charitable interpretation is the more perspicacious... and vice versa.
> 
> Care to provide a pointer to the "documentation" you are referring to?
> While I have my doubts it will lead to a much more charitable
> interpretation, I certainly am willing to let myself be surprised.

I read dozens of pages and see no gain or merit in reading back through
all of them to verify what I read... unless I had some neurotic desire
to win points in an irrelevant and fruitless discussion-- which I don't.
 Nor would I want to obligate anyone to be charitable.  That doesn't
work.  Either they got it or they don't.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Irrelevant digression [was: Re: bug in elisp... or in elisper???]
  2011-03-23 16:40       ` Irrelevant digression [was: Re: bug in elisp... or in elisper???] ken
@ 2011-03-23 16:52         ` Le Wang
  2011-03-23 17:46           ` ken
  0 siblings, 1 reply; 13+ messages in thread
From: Le Wang @ 2011-03-23 16:52 UTC (permalink / raw)
  To: gebser; +Cc: help-gnu-emacs, David Kastrup

On Thu, Mar 24, 2011 at 12:40 AM, ken <gebser@mousecar.com> wrote:
> I read dozens of pages and see no gain or merit in reading back through
> all of them to verify what I read

You don't see any merit to removing misinformation so that others who
do the same search as you aren't led down the wrong path?  Isn't that
why we are here, to help each other better use Emacs?

Provide the link, so we can go about trying to get the author to
change his documentation.  At a minimum, when people search for that
link in particular, they'll see this thread on gmane.

-- 
Le

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Irrelevant digression [was: Re: bug in elisp... or in elisper???]
  2011-03-23 16:52         ` Le Wang
@ 2011-03-23 17:46           ` ken
  0 siblings, 0 replies; 13+ messages in thread
From: ken @ 2011-03-23 17:46 UTC (permalink / raw)
  To: Le Wang; +Cc: help-gnu-emacs, David Kastrup

On 03/23/2011 12:52 PM Le Wang wrote:
> On Thu, Mar 24, 2011 at 12:40 AM, ken <gebser@mousecar.com> wrote:
>> I read dozens of pages and see no gain or merit in reading back through
>> all of them to verify what I read
> 
> You don't see any merit to removing misinformation so that others who
> do the same search as you aren't led down the wrong path?  Isn't that
> why we are here, to help each other better use Emacs?

Le, you bring a good point.  However, it's not often possible to contact
the author of a webpage and if you do somehow do that, who knows if that
author reads email and if so would bother to make the change.  More
importantly, as said, I don't have the time or the inclination to search
out that page.  I understand the web has bad information and accept that
fact.  If I wanted to fix inaccuracies on the web, there are many more
of vastly greater import.   If I do happen to run across the page again,
however, I'll post it back here and then anyone who wants to can have at
the author.  Also, you could search for yourself; I googled for things
like: emacs, elisp, regular expression(s), and other things, all of
which I can't recall now.

In addition, these list discussions are archived, right?  So people
will--  or should-- find them and hopefully won't succumb to the same
inaccuracy as I did.

If you feel the situation demands more than that, then you have your
website with quite a bit of information on elisp.  (I've read quite a
bit there.  It's a pretty good site... and gets respectable search
rankings.)  Post the information there.  Heck, the article is virtually
written for you already.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug in elisp... or in elisper???
  2011-03-23 14:18   ` ken
@ 2011-03-25  3:44     ` Kevin Rodgers
  0 siblings, 0 replies; 13+ messages in thread
From: Kevin Rodgers @ 2011-03-25  3:44 UTC (permalink / raw)
  To: help-gnu-emacs

On 3/23/11 8:18 AM, ken wrote:
...
> You might have noticed I use "\\([\s-\\|\n]+?\\)" to non-greedily match
> one or more whitespace characters.  Can one "\\[...\\] be nested inside
> another...?  e.g., "[[\s-\\|\n]+?]" or some syntax like that?

No.

It is not clear whether you mean "\s" (space) followed by "-" (which is
special within "[]"), or you actually meant "\\s-" (i.e. any character
with whitespace syntax).  The problem with "\\s-" is that it depends on
the buffer's syntax table, as does "[[:space:]]" -- see section 34.3.1.2
(Character Classes) in the Emacs Lisp manual for an explanation of
"[[:space:]]" and other POSIX-inspired character classes:

http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes

If you are going to add "\n" to "\\s-" or "[:space:]", within "[]" or
"\\(\\|\\)", because you can't be sure whether the buffer's syntax table
assigns whitespace syntax to newline, then how can you be sure that it
assigns whitespace syntax to space, tab, formfeed, return, and vertical
tab?

So you may as well be explicit about what you mean by whitespace e.g.
"[ \f\t\n\r\v]"

> The "specialness" of "." seems to be lost when inside brackets; that is,
> in "[.\n]*?" it seems to represent a regular period (.) rather than "any
> character except newline".  Is there some way to bring back that
> specialness?  Or is there some other RE to represent "multiple instances
> of any character, including a newline"?

No, inside "[]", "." is not special.

The right way is: "\\(.\\|\n\\)*"

There may be other ways, but they will be longer and unnecessarily complex.

> Is it actually true (what the docs say) that there's a limit of nine
> sub-expression match-strings per RE?  Or can I do, e.g., "(match-string
> 12)" and "(match-string 15)"?  What is the actual limit?  Whatever it
> is, is this hard-coded into elisp... or can it be changed/configured to
> something else?

No, but you can only refer to the first 9 sub-expressions (actually, the
text matched by each of the first 9 preceding sub-expressions).

-- 
Kevin Rodgers
Denver, Colorado, USA

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-03-25  3:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.11.1300837050.13753.help-gnu-emacs@gnu.org>
2011-03-22 23:50 ` bug in elisp... or in elisper??? David Kastrup
2011-03-23 15:21   ` ken
2011-03-23 15:38     ` David Kastrup
2011-03-23 16:40       ` Irrelevant digression [was: Re: bug in elisp... or in elisper???] ken
2011-03-23 16:52         ` Le Wang
2011-03-23 17:46           ` ken
2011-03-23  7:01 ` bug in elisp... or in elisper??? Tim X
2011-03-23 15:56   ` ken
2011-03-22 23:37 ken
2011-03-23  0:15 ` PJ Weisberg
2011-03-23 14:18   ` ken
2011-03-25  3:44     ` Kevin Rodgers
     [not found]   ` <mailman.3.1300889938.15160.help-gnu-emacs@gnu.org>
2011-03-23 15:27     ` Stefan Monnier

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.