* Regexp to match any character, including newline?
@ 2003-10-04 22:02 Joe Fineman
2003-10-04 22:46 ` Stefan Monnier
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Joe Fineman @ 2003-10-04 22:02 UTC (permalink / raw)
It is sometimes a nuisance that "." in a regexp does not match
newlines. For example, I want a regexp for text in parentheses that
contains the word "and" followed (anywhere) by a date.
(.+ and .+ [1-2][0-9][0-9][0-9].+)
works only if the expression happens to be on one line. I have tried
[^ ] with the space replaced by an unlikely character such as ASCII
000; that seems to work in isolation, but when I substitute it for
. in the above regexp, the result misbehaves, missing all the right
matches & finding the odd wrong one. Is there an obvious solution to
this problem?
--
--- Joe Fineman jcf@TheWorld.com
||: Look on yonder, see that eagle rise. :||
||: He was born on land, but he sure enjoys the skies. :||
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regexp to match any character, including newline?
2003-10-04 22:02 Regexp to match any character, including newline? Joe Fineman
@ 2003-10-04 22:46 ` Stefan Monnier
2003-10-04 22:59 ` Jesper Harder
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Stefan Monnier @ 2003-10-04 22:46 UTC (permalink / raw)
> It is sometimes a nuisance that "." in a regexp does not match
> newlines. For example, I want a regexp for text in parentheses that
> contains the word "and" followed (anywhere) by a date.
> (.+ and .+ [1-2][0-9][0-9][0-9].+)
> works only if the expression happens to be on one line. I have tried
> [^ ] with the space replaced by an unlikely character such as ASCII
> 000; that seems to work in isolation, but when I substitute it for
> . in the above regexp, the result misbehaves, missing all the right
> matches & finding the odd wrong one.
(re-search-forward "([^\000]+and[^\000]+[12][0-9]\\{3\\}[^\000]+)")
seems to work just fine here. Note that if you enter the RE interactively,
you have to use [ ^ C-q 0 RET ] and not [ ^ \ 0 0 0 ] (which would
match any char except for backslash and 0).
> Is there an obvious solution to this problem?
use \(.\|\n\) which matches any character.
Stefan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regexp to match any character, including newline?
2003-10-04 22:02 Regexp to match any character, including newline? Joe Fineman
2003-10-04 22:46 ` Stefan Monnier
@ 2003-10-04 22:59 ` Jesper Harder
2003-10-05 7:09 ` Martin Stone Davis
[not found] ` <mailman.1130.1065337789.21628.help-gnu-emacs@gnu.org>
2003-10-05 21:45 ` Joe Fineman
2003-10-06 8:20 ` Gian Uberto Lauri
3 siblings, 2 replies; 7+ messages in thread
From: Jesper Harder @ 2003-10-04 22:59 UTC (permalink / raw)
Joe Fineman <jcf@TheWorld.com> writes:
> It is sometimes a nuisance that "." in a regexp does not match
> newlines. For example, I want a regexp for text in parentheses that
> contains the word "and" followed (anywhere) by a date.
>
> (.+ and .+ [1-2][0-9][0-9][0-9].+)
>
> works only if the expression happens to be on one line. I have tried
> [^ ] with the space replaced by an unlikely character such as ASCII
> 000; that seems to work in isolation, but when I substitute it for
> . in the above regexp, the result misbehaves, missing all the right
> matches & finding the odd wrong one. Is there an obvious solution to
> this problem?
You can use "\\(?:.\\|\n\\)+" to match _anything_ including newlines.
Also, look at the very cool 'rx' package, which provides a much nicer
syntax for regexps than the usual line noise. For instance:
(rx (and "(" (* anything) "and" (* anything)
(in "1-2") (repeat 4 digit) (* anything) ")"))
=>
"\\(?:(\\(?:.\\|
\\)*and\\(?:.\\|
\\)*[1-2][[:digit:]]\\{4\\}\\(?:.\\|
\\)*)\\)"
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regexp to match any character, including newline?
2003-10-04 22:59 ` Jesper Harder
@ 2003-10-05 7:09 ` Martin Stone Davis
[not found] ` <mailman.1130.1065337789.21628.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 7+ messages in thread
From: Martin Stone Davis @ 2003-10-05 7:09 UTC (permalink / raw)
Jesper Harder wrote:
> Also, look at the very cool 'rx' package, which provides a much nicer
> syntax for regexps than the usual line noise. For instance:
>
> (rx (and "(" (* anything) "and" (* anything)
> (in "1-2") (repeat 4 digit) (* anything) ")"))
> =>
> "\\(?:(\\(?:.\\|
> \\)*and\\(?:.\\|
> \\)*[1-2][[:digit:]]\\{4\\}\\(?:.\\|
> \\)*)\\)"
omfg that is the best!
Is this documented anywhere? I was able to find out about it by typing
M-x apr <RET> rx <RET>
but it is not anywhere in the Elisp info files. I'm sure there are
other gems like it in emacs, so how should I go about finding them?
-Martin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regexp to match any character, including newline?
[not found] ` <mailman.1130.1065337789.21628.help-gnu-emacs@gnu.org>
@ 2003-10-05 20:35 ` Jesper Harder
0 siblings, 0 replies; 7+ messages in thread
From: Jesper Harder @ 2003-10-05 20:35 UTC (permalink / raw)
Martin Stone Davis <m0davis@pacbell.net> writes:
[rx.el]
> Is this documented anywhere?
I think the only mention is if you use `C-h p' (finder-by-keyword),
where it's listed under "Extensions".
> but it is not anywhere in the Elisp info files.
Maybe it should at least be mentioned in the regexp section of the
manual -- send a suggestion to bug-lisp-manual@gnu.org.
> I'm sure there are other gems like it in emacs
For regexps `M-x re-builder' is also very useful.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regexp to match any character, including newline?
2003-10-04 22:02 Regexp to match any character, including newline? Joe Fineman
2003-10-04 22:46 ` Stefan Monnier
2003-10-04 22:59 ` Jesper Harder
@ 2003-10-05 21:45 ` Joe Fineman
2003-10-06 8:20 ` Gian Uberto Lauri
3 siblings, 0 replies; 7+ messages in thread
From: Joe Fineman @ 2003-10-05 21:45 UTC (permalink / raw)
Thanks very much for all the things to try.
--
--- Joe Fineman jcf@TheWorld.com
||: Dying isn't so bad. It's being buried that gets you down. :||
^ permalink raw reply [flat|nested] 7+ messages in thread
* Regexp to match any character, including newline?
2003-10-04 22:02 Regexp to match any character, including newline? Joe Fineman
` (2 preceding siblings ...)
2003-10-05 21:45 ` Joe Fineman
@ 2003-10-06 8:20 ` Gian Uberto Lauri
3 siblings, 0 replies; 7+ messages in thread
From: Gian Uberto Lauri @ 2003-10-06 8:20 UTC (permalink / raw)
Cc: help-gnu-emacs
>>>>> "JF" == Joe Fineman <jcf@TheWorld.com> writes:
JF> It is sometimes a nuisance that "." in a regexp does not match
JF> newlines. For example, I want a regexp for text in parentheses that
JF> contains the word "and" followed (anywhere) by a date.
JF> (.+ and .+ [1-2][0-9][0-9][0-9].+)
What about something like:
\(.*[\n]\)
;; Warning, can cause regexp stack overflow.
font-lock has should font-lock-multiline (HOW????) since Emacs 21
/\ ___
/___/\__|_|\_|__|___Gian Uberto Lauri_____________________
//--\ | | \| | Integralista GNUslamico e fancazzista
\/
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2003-10-06 8:20 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-04 22:02 Regexp to match any character, including newline? Joe Fineman
2003-10-04 22:46 ` Stefan Monnier
2003-10-04 22:59 ` Jesper Harder
2003-10-05 7:09 ` Martin Stone Davis
[not found] ` <mailman.1130.1065337789.21628.help-gnu-emacs@gnu.org>
2003-10-05 20:35 ` Jesper Harder
2003-10-05 21:45 ` Joe Fineman
2003-10-06 8:20 ` Gian Uberto Lauri
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).