* regexp that matches newline characters?
@ 2008-05-09 16:53 Dmitri Minaev
0 siblings, 0 replies; 9+ messages in thread
From: Dmitri Minaev @ 2008-05-09 16:53 UTC (permalink / raw)
To: EMACS list
I tried to extract a tag from an xml file to parse it later, but I
can't find a regexp that would match an xml tag with its content,
including newlines. Dot doesn't match newlines. The elisp manual
mentions that "complemented character alternative" matches a newline,
so I used this funny template:\\(<author>[^±]*?</author>\\). Of
course, this is not the right thing to do. What would be the correct
regular expression?
--
With best regards,
Dmitri Minaev
Russian history blog: http://minaev.blogspot.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: regexp that matches newline characters?
[not found] <mailman.11388.1210352002.18990.help-gnu-emacs@gnu.org>
@ 2008-05-09 20:07 ` Xah
2008-05-09 22:18 ` Dmitri Minaev
[not found] ` <mailman.11398.1210371539.18990.help-gnu-emacs@gnu.org>
2008-05-09 22:56 ` harven
1 sibling, 2 replies; 9+ messages in thread
From: Xah @ 2008-05-09 20:07 UTC (permalink / raw)
To: help-gnu-emacs
On May 9, 9:53 am, "Dmitri Minaev" <min...@gmail.com> wrote:
> I tried to extract a tag from an xml file to parse it later, but I
> can't find a regexp that would match an xml tag with its content,
> including newlines. Dot doesn't match newlines. The elisp manual
> mentions that "complemented character alternative" matches a newline,
> so I used this funny template:\\(<author>[^±]*?</author>\\). Of
> course, this is not the right thing to do. What would be the correct
> regular expression?
Line ending char can be matched by \n, but you'll need to double the
backslash.
However, this is prob what you want:
(some-regex-func "<author>\([^<]+\)</author>" ...)
which captures the content.
See here for some explanation and frequently used patterns:
http://xahlee.org/emacs/emacs_regex.html
Xah
xah@xahlee.org
∑ http://xahlee.org/
☄
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: regexp that matches newline characters?
2008-05-09 20:07 ` regexp that matches newline characters? Xah
@ 2008-05-09 22:18 ` Dmitri Minaev
2008-05-09 22:28 ` Lennart Borgman (gmail)
[not found] ` <mailman.11398.1210371539.18990.help-gnu-emacs@gnu.org>
1 sibling, 1 reply; 9+ messages in thread
From: Dmitri Minaev @ 2008-05-09 22:18 UTC (permalink / raw)
To: Xah; +Cc: help-gnu-emacs
On Sat, May 10, 2008 at 1:07 AM, Xah <xahlee@gmail.com> wrote:
> Line ending char can be matched by \n, but you'll need to double the
> backslash.
>
> However, this is prob what you want:
>
> (some-regex-func "<author>\([^<]+\)</author>" ...)
Thanks, but it won't do the job -- there are embedded tags inside
<author>. That's why I preferred ± to < :)
The regexp should eat anything, like dot, but including all kinds of
whitespaces. Is it possible to do it with character classes? Something
like [[:alnum:][:space:]]* (this one didn't work for me) ?
>
> See here for some explanation and frequently used patterns:
> http://xahlee.org/emacs/emacs_regex.html
Very good page, but too short :) Thanks!
--
With best regards,
Dmitri Minaev
Russian history blog: http://minaev.blogspot.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: regexp that matches newline characters?
2008-05-09 22:18 ` Dmitri Minaev
@ 2008-05-09 22:28 ` Lennart Borgman (gmail)
0 siblings, 0 replies; 9+ messages in thread
From: Lennart Borgman (gmail) @ 2008-05-09 22:28 UTC (permalink / raw)
To: Dmitri Minaev; +Cc: help-gnu-emacs, Xah
Dmitri Minaev wrote:
> The regexp should eat anything, like dot, but including all kinds of
> whitespaces. Is it possible to do it with character classes? Something
> like [[:alnum:][:space:]]* (this one didn't work for me) ?
\(.\|\)
but double the \.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: regexp that matches newline characters?
[not found] <mailman.11388.1210352002.18990.help-gnu-emacs@gnu.org>
2008-05-09 20:07 ` regexp that matches newline characters? Xah
@ 2008-05-09 22:56 ` harven
2008-05-11 17:45 ` Dmitri Minaev
1 sibling, 1 reply; 9+ messages in thread
From: harven @ 2008-05-09 22:56 UTC (permalink / raw)
To: help-gnu-emacs
On May 9, 6:53 pm, "Dmitri Minaev" <min...@gmail.com> wrote:
> I tried to extract a tag from an xml file to parse it later, but I
> can't find a regexp that would match an xml tag with its content,
> including newlines. Dot doesn't match newlines. The elisp manual
> mentions that "complemented character alternative" matches a newline,
> so I used this funny template:\\(<author>[^±]*?</author>\\). Of
> course, this is not the right thing to do. What would be the correct
> regular expression?
>
> --
> With best regards,
> Dmitri Minaev
>
> Russian history blog:http://minaev.blogspot.com
"\\(.\\|\n\\)" matches everything.
It stands for: any character but a new-line, or a new-line.
Do not double-backslash the \n.
The regexp must be entered as a string in an elisp expression.
In a string, \n stands as newline, \t as tab, \\ as backslash.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: regexp that matches newline characters?
[not found] ` <mailman.11398.1210371539.18990.help-gnu-emacs@gnu.org>
@ 2008-05-10 2:51 ` Xah
0 siblings, 0 replies; 9+ messages in thread
From: Xah @ 2008-05-10 2:51 UTC (permalink / raw)
To: help-gnu-emacs
Sorry i seem to have misunderstood your question.
The following also works, for what's worth.
(search-forward-regexp "<pre class=\"mma\">\\([^•]*\\)</pre>")
<pre class="mma">
something here
<p>some</p>
and there
</pre>
Xah
xah@xahlee.org
∑ http://xahlee.org/
☄
On May 9, 3:18 pm, "Dmitri Minaev" <min...@gmail.com> wrote:
> On Sat, May 10, 2008 at 1:07 AM, Xah <xah...@gmail.com> wrote:
> > Line ending char can be matched by \n, but you'll need to double the
> > backslash.
>
> > However, this is prob what you want:
>
> > (some-regex-func "<author>\([^<]+\)</author>" ...)
>
> Thanks, but it won't do the job -- there are embedded tags inside
> <author>. That's why I preferred ± to < :)
>
> The regexp should eat anything, like dot, but including all kinds of
> whitespaces. Is it possible to do it with character classes? Something
> like [[:alnum:][:space:]]* (this one didn't work for me) ?
>
>
>
> > See here for some explanation and frequently used patterns:
> > http://xahlee.org/emacs/emacs_regex.html
>
> Very good page, but too short :) Thanks!
>
> --
> With best regards,
> Dmitri Minaev
>
> Russian history blog:http://minaev.blogspot.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: regexp that matches newline characters?
2008-05-09 22:56 ` harven
@ 2008-05-11 17:45 ` Dmitri Minaev
2008-05-11 18:11 ` Peter Dyballa
0 siblings, 1 reply; 9+ messages in thread
From: Dmitri Minaev @ 2008-05-11 17:45 UTC (permalink / raw)
To: harven; +Cc: help-gnu-emacs
On Sat, May 10, 2008 at 3:56 AM, harven <harven@free.fr> wrote:
> "\\(.\\|\n\\)" matches everything.
Thanks to everyone. Parenthesized alternative works, but I found a
solution based on character classes:
\\(<author>[[:print:][:space]]*?</author>\\)
So long, it works. It will help me to get rid of nested groups.
> (search-forward-regexp "<pre class=\"mma\">\\([^•]*\\)</pre>")
Yes, but this is the same hack I wanted to avoid: taking a character
which is not supposed to be found inside the tag and matching anything
except for this character. What if this character appears in some
author's name? What if Prince changes his name again? :)
Is there a comparison of various regexp tools' efficiency: are
character classes fast enough? would parenthesized groups be faster?
or character alternatives (like that [^±])?
Thank you.
--
With best regards,
Dmitri Minaev
Russian history blog: http://minaev.blogspot.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: regexp that matches newline characters?
2008-05-11 17:45 ` Dmitri Minaev
@ 2008-05-11 18:11 ` Peter Dyballa
2008-05-11 18:32 ` Dmitri Minaev
0 siblings, 1 reply; 9+ messages in thread
From: Peter Dyballa @ 2008-05-11 18:11 UTC (permalink / raw)
To: Dmitri Minaev; +Cc: emacs list
Am 11.05.2008 um 19:45 schrieb Dmitri Minaev:
> Is there a comparison of various regexp tools' efficiency: are
> character classes fast enough? would parenthesized groups be faster?
> or character alternatives (like that [^±])?
Could be this helps: http://swtch.com/~rsc/regexp/regexp1.html
--
Greetings
Pete
Bake pizza not war!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: regexp that matches newline characters?
2008-05-11 18:11 ` Peter Dyballa
@ 2008-05-11 18:32 ` Dmitri Minaev
0 siblings, 0 replies; 9+ messages in thread
From: Dmitri Minaev @ 2008-05-11 18:32 UTC (permalink / raw)
To: Peter Dyballa; +Cc: emacs list
On Sun, May 11, 2008 at 11:11 PM, Peter Dyballa <Peter_Dyballa@web.de> wrote:
> Could be this helps: http://swtch.com/~rsc/regexp/regexp1.html
>
Not really, I'm afraid :). What inspired me to a certain degree was a
quotation from an old Jamie Zawinski's e-mail:
"The heavy use of regexps in Perl is due to them being far and away
the most obvious hammer in the box.
The heavy use of regexps in Emacs is due almost entirely to
performance issues: because of implementation details, Emacs code that
uses regexps will almost always run faster than code that uses more
traditional control structures." (from
http://regex.info/blog/2006-09-15/247)
Let's hope it still holds true...
--
With best regards,
Dmitri Minaev
Russian history blog: http://minaev.blogspot.com
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-05-11 18:32 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <mailman.11388.1210352002.18990.help-gnu-emacs@gnu.org>
2008-05-09 20:07 ` regexp that matches newline characters? Xah
2008-05-09 22:18 ` Dmitri Minaev
2008-05-09 22:28 ` Lennart Borgman (gmail)
[not found] ` <mailman.11398.1210371539.18990.help-gnu-emacs@gnu.org>
2008-05-10 2:51 ` Xah
2008-05-09 22:56 ` harven
2008-05-11 17:45 ` Dmitri Minaev
2008-05-11 18:11 ` Peter Dyballa
2008-05-11 18:32 ` Dmitri Minaev
2008-05-09 16:53 Dmitri Minaev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).