unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Regexp to match any character, including newline?
@ 2003-10-04 22:02 Joe Fineman
  2003-10-04 22:46 ` Stefan Monnier
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Joe Fineman @ 2003-10-04 22:02 UTC (permalink / raw)


It is sometimes a nuisance that "." in a regexp does not match
newlines.  For example, I want a regexp for text in parentheses that
contains the word "and" followed (anywhere) by a date.

  (.+ and .+ [1-2][0-9][0-9][0-9].+)

works only if the expression happens to be on one line.  I have tried
[^ ] with the space replaced by an unlikely character such as ASCII
000; that seems to work in isolation, but when I substitute it for
. in the above regexp, the result misbehaves, missing all the right
matches & finding the odd wrong one.  Is there an obvious solution to
this problem?
-- 
---  Joe Fineman    jcf@TheWorld.com

||:  Look on yonder, see that eagle rise.                :||
||:  He was born on land, but he sure enjoys the skies.  :||

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regexp to match any character, including newline?
  2003-10-04 22:02 Regexp to match any character, including newline? Joe Fineman
@ 2003-10-04 22:46 ` Stefan Monnier
  2003-10-04 22:59 ` Jesper Harder
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Stefan Monnier @ 2003-10-04 22:46 UTC (permalink / raw)


> It is sometimes a nuisance that "." in a regexp does not match
> newlines.  For example, I want a regexp for text in parentheses that
> contains the word "and" followed (anywhere) by a date.

>   (.+ and .+ [1-2][0-9][0-9][0-9].+)

> works only if the expression happens to be on one line.  I have tried
> [^ ] with the space replaced by an unlikely character such as ASCII
> 000; that seems to work in isolation, but when I substitute it for
> . in the above regexp, the result misbehaves, missing all the right
> matches & finding the odd wrong one.

(re-search-forward "([^\000]+and[^\000]+[12][0-9]\\{3\\}[^\000]+)")
seems to work just fine here.  Note that if you enter the RE interactively,
you have to use [ ^ C-q 0 RET ] and not [ ^ \ 0 0 0 ] (which would
match any char except for backslash and 0).

> Is there an obvious solution to this problem?

use \(.\|\n\) which matches any character.


        Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regexp to match any character, including newline?
  2003-10-04 22:02 Regexp to match any character, including newline? Joe Fineman
  2003-10-04 22:46 ` Stefan Monnier
@ 2003-10-04 22:59 ` Jesper Harder
  2003-10-05  7:09   ` Martin Stone Davis
       [not found]   ` <mailman.1130.1065337789.21628.help-gnu-emacs@gnu.org>
  2003-10-05 21:45 ` Joe Fineman
  2003-10-06  8:20 ` Gian Uberto Lauri
  3 siblings, 2 replies; 7+ messages in thread
From: Jesper Harder @ 2003-10-04 22:59 UTC (permalink / raw)


Joe Fineman <jcf@TheWorld.com> writes:

> It is sometimes a nuisance that "." in a regexp does not match
> newlines.  For example, I want a regexp for text in parentheses that
> contains the word "and" followed (anywhere) by a date.
>
>   (.+ and .+ [1-2][0-9][0-9][0-9].+)
>
> works only if the expression happens to be on one line.  I have tried
> [^ ] with the space replaced by an unlikely character such as ASCII
> 000; that seems to work in isolation, but when I substitute it for
> . in the above regexp, the result misbehaves, missing all the right
> matches & finding the odd wrong one.  Is there an obvious solution to
> this problem?

You can use "\\(?:.\\|\n\\)+" to match _anything_ including newlines.

Also, look at the very cool 'rx' package, which provides a much nicer
syntax for regexps than the usual line noise.  For instance:

(rx (and "(" (* anything) "and" (* anything)
	 (in "1-2") (repeat 4 digit) (* anything) ")"))
=>
"\\(?:(\\(?:.\\|
\\)*and\\(?:.\\|
\\)*[1-2][[:digit:]]\\{4\\}\\(?:.\\|
\\)*)\\)"

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regexp to match any character, including newline?
  2003-10-04 22:59 ` Jesper Harder
@ 2003-10-05  7:09   ` Martin Stone Davis
       [not found]   ` <mailman.1130.1065337789.21628.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 7+ messages in thread
From: Martin Stone Davis @ 2003-10-05  7:09 UTC (permalink / raw)


Jesper Harder wrote:

> Also, look at the very cool 'rx' package, which provides a much nicer
> syntax for regexps than the usual line noise.  For instance:
> 
> (rx (and "(" (* anything) "and" (* anything)
> 	 (in "1-2") (repeat 4 digit) (* anything) ")"))
> =>
> "\\(?:(\\(?:.\\|
> \\)*and\\(?:.\\|
> \\)*[1-2][[:digit:]]\\{4\\}\\(?:.\\|
> \\)*)\\)"

omfg that is the best!

Is this documented anywhere?  I was able to find out about it by typing

M-x apr <RET> rx <RET>

but it is not anywhere in the Elisp info files.  I'm sure there are 
other gems like it in emacs, so how should I go about finding them?

-Martin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regexp to match any character, including newline?
       [not found]   ` <mailman.1130.1065337789.21628.help-gnu-emacs@gnu.org>
@ 2003-10-05 20:35     ` Jesper Harder
  0 siblings, 0 replies; 7+ messages in thread
From: Jesper Harder @ 2003-10-05 20:35 UTC (permalink / raw)


Martin Stone Davis <m0davis@pacbell.net> writes:

[rx.el]

> Is this documented anywhere?

I think the only mention is if you use `C-h p' (finder-by-keyword),
where it's listed under "Extensions".

> but it is not anywhere in the Elisp info files.

Maybe it should at least be mentioned in the regexp section of the
manual -- send a suggestion to bug-lisp-manual@gnu.org.

> I'm sure there are other gems like it in emacs

For regexps `M-x re-builder' is also very useful.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regexp to match any character, including newline?
  2003-10-04 22:02 Regexp to match any character, including newline? Joe Fineman
  2003-10-04 22:46 ` Stefan Monnier
  2003-10-04 22:59 ` Jesper Harder
@ 2003-10-05 21:45 ` Joe Fineman
  2003-10-06  8:20 ` Gian Uberto Lauri
  3 siblings, 0 replies; 7+ messages in thread
From: Joe Fineman @ 2003-10-05 21:45 UTC (permalink / raw)


Thanks very much for all the things to try.
-- 
---  Joe Fineman    jcf@TheWorld.com

||:  Dying isn't so bad.  It's being buried that gets you down.  :||

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Regexp to match any character, including newline?
  2003-10-04 22:02 Regexp to match any character, including newline? Joe Fineman
                   ` (2 preceding siblings ...)
  2003-10-05 21:45 ` Joe Fineman
@ 2003-10-06  8:20 ` Gian Uberto Lauri
  3 siblings, 0 replies; 7+ messages in thread
From: Gian Uberto Lauri @ 2003-10-06  8:20 UTC (permalink / raw)
  Cc: help-gnu-emacs

>>>>> "JF" == Joe Fineman <jcf@TheWorld.com> writes:

JF> It is sometimes a nuisance that "." in a regexp does not match
JF> newlines.  For example, I want a regexp for text in parentheses that
JF> contains the word "and" followed (anywhere) by a date.

JF>   (.+ and .+ [1-2][0-9][0-9][0-9].+)

What about something like:

\(.*[\n]\) 

;; Warning, can cause regexp stack overflow.

font-lock has should font-lock-multiline (HOW????) since Emacs 21

 /\            ___
/___/\__|_|\_|__|___Gian Uberto Lauri_____________________
  //--\ | | \|  |   Integralista GNUslamico e fancazzista 
\/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-10-06  8:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-04 22:02 Regexp to match any character, including newline? Joe Fineman
2003-10-04 22:46 ` Stefan Monnier
2003-10-04 22:59 ` Jesper Harder
2003-10-05  7:09   ` Martin Stone Davis
     [not found]   ` <mailman.1130.1065337789.21628.help-gnu-emacs@gnu.org>
2003-10-05 20:35     ` Jesper Harder
2003-10-05 21:45 ` Joe Fineman
2003-10-06  8:20 ` Gian Uberto Lauri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).