unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* line-spanning regexp
@ 2003-01-14 23:59 Tennis Smith
  2003-01-15  1:47 ` Greg Hill
  0 siblings, 1 reply; 5+ messages in thread
From: Tennis Smith @ 2003-01-14 23:59 UTC (permalink / raw)


Hi,

How do I construct a regexp that looks for two strings that *might* span
two consecutive lines?  

For example, I need a regexp that will find string1 and string2 and 
everything in between for the following scenarios:


blah blah blah blah string1 blah blah string2 blah blah blah

-OR-

blah blah string1 blah
string2 blah blah

TIA,
-Tennis

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: line-spanning regexp
  2003-01-14 23:59 line-spanning regexp Tennis Smith
@ 2003-01-15  1:47 ` Greg Hill
  0 siblings, 0 replies; 5+ messages in thread
From: Greg Hill @ 2003-01-15  1:47 UTC (permalink / raw)


At 3:59 PM -0800 1/14/03, Tennis Smith wrote:
>Hi,
>
>How do I construct a regexp that looks for two strings that *might* span
>two consecutive lines? 
>
>For example, I need a regexp that will find string1 and string2 and
>everything in between for the following scenarios:
>
>
>blah blah blah blah string1 blah blah string2 blah blah blah
>
>-OR-
>
>blah blah string1 blah
>string2 blah blah
>
>TIA,
>-Tennis

"string1[^\n]*[\n]?[^\n]*string2"

--Greg

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: line-spanning regexp
@ 2003-01-15 18:00 Bingham, Jay
  2003-01-15 21:46 ` Greg Hill
  0 siblings, 1 reply; 5+ messages in thread
From: Bingham, Jay @ 2003-01-15 18:00 UTC (permalink / raw)


On Tuesday, January 14, 2003 7:47 PM Greg Hill Wrote
>
>
>At 3:59 PM -0800 1/14/03, Tennis Smith wrote:
>>Hi,
>>
>>How do I construct a regexp that looks for two strings that *might*
span
>>two consecutive lines? 
>>
>>For example, I need a regexp that will find string1 and string2 and
>>everything in between for the following scenarios:
>>
>>
>>blah blah blah blah string1 blah blah string2 blah blah blah
>>
>>-OR-
>>
>>blah blah string1 blah
>>string2 blah blah
>>
>>TIA,
>>-Tennis
>
>"string1[^\n]*[\n]?[^\n]*string2"
>

The above pattern for a regexp may NOT work in all circumstances.
Specifically it may not work correctly when used in interactive regular
expression searches (isearch-forward-regexp, C-M-S;
isearch-backward-regexp, C-M-r; search-forward-regexp and
search-backward-regexp).  The reason that it may not work is that the
escaped sequences \n and \t when entered in an interactive regexp DO NOT
match newline and tab, although the Search -> Regexp Search info node
does not mention this restriction and information contained at the
Search -> Regexps info node might be interpreted as indication that they
do.

However, in the example given by the OP it works, but not for the reason
that one might think.  It works in this case because the expression
[^\n] will match anything that is not a "\" or an "n", since a newline
is not a backslash or the letter "n" it will match in either the first
instance or the second instance of the [^\n]* as long as there is a
backslash or an "n" in the text that that occurs between the start of
string1 and end of string2.  Change "string" to "text" in the buffer and
the pattern will no longer match.

The correct regexp (that does not depend on the presence of an n or \)
to use in interactive searches is (as typed to enter it):

"string1[^C-qC-j]*[C-qC-j]?[^C-qC-j]*string2"

This will produce a string that looks like this when displayed:

"string1[^^J]*[^J]?[^^J]*string2"

The pattern suggested by Greg may also produce undesired results when
the following condition exists in the buffer:

blah blah string1 blah string2 blah blah blah
blah blah string1 blah blah string2 blah blah blah

In this case it will match from the start of string1 on first line to
the end of string2 on the second line.  If this is not the desired
result the regexp can be modified to match the shortest rather than the
longest string.  In Emacs 21.1 and later versions the regexp to do this
is:

"string1[^\n]*?[\n]?[^\n]*?string2"

Earlier versions of Emacs require a different construct, the regexp to
use in those versions is:

"string1\\(\\|[^\n]\\)*[\n]?\\(\\|[^\n]\\)*string2"

See http://www.emacswiki.org/cgi-bin/wiki.pl?NonGreedyRegexp for more
information.

Happy emacsing
-_
J_)
C_)ingham
.    HP - NonStop Austin Software & Services - Software Quality
Assurance
.    Austin, TX
. "Language is the apparel in which your thoughts parade in public.
.  Never clothe them in vulgar and shoddy attire."     -Dr. George W.
Crane-

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: line-spanning regexp
       [not found] <mailman.319.1042653844.21513.help-gnu-emacs@gnu.org>
@ 2003-01-15 20:13 ` Kevin Rodgers
  0 siblings, 0 replies; 5+ messages in thread
From: Kevin Rodgers @ 2003-01-15 20:13 UTC (permalink / raw)


Bingham, Jay wrote:

> On Tuesday, January 14, 2003 7:47 PM Greg Hill Wrote
>>
>>"string1[^\n]*[\n]?[^\n]*string2"
> 
> The above pattern for a regexp may NOT work in all circumstances.
> Specifically it may not work correctly when used in interactive regular
> expression searches (isearch-forward-regexp, C-M-S;
> isearch-backward-regexp, C-M-r; search-forward-regexp and
> search-backward-regexp).


I assumed the presence of the delimiting double quotes was to indicate that
he meant a string to be passed to the non-interactive functions and not a
key sequence to be typed to the interactive commands.

...


> The correct regexp (that does not depend on the presence of an n or \)
> to use in interactive searches is (as typed to enter it):
> 
> "string1[^C-qC-j]*[C-qC-j]?[^C-qC-j]*string2"


Isn't `[^C-qC-j]' equivalent to `.'?


-- 
<a href="mailto:&lt;kevin.rodgers&#64;ihs.com&gt;">Kevin Rodgers</a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: line-spanning regexp
  2003-01-15 18:00 Bingham, Jay
@ 2003-01-15 21:46 ` Greg Hill
  0 siblings, 0 replies; 5+ messages in thread
From: Greg Hill @ 2003-01-15 21:46 UTC (permalink / raw)


At 12:00 PM -0600 1/15/03, Bingham, Jay wrote:
<snip>
>Earlier versions of Emacs require a different construct, the regexp to
>use in those versions is:
>
>"string1\\(\\|[^\n]\\)*[\n]?\\(\\|[^\n]\\)*string2"
>
>See http://www.emacswiki.org/cgi-bin/wiki.pl?NonGreedyRegexp for more
>information.

Jay,

Thanks for pointing that out.  I didn't know about this hack for 
non-greedy searches in pre-21 emacs.

Unfortunately the web page you cited doesn't really explain it, just 
calls it a "hack" and provides a single minimalist example.  I'm sure 
I could remember it better and use it more effectively if I 
understood why it works, but that is not apparent to me.  Are you 
aware of a more in-depth discussion of this topic that I could 
consult?

Thanks.

--Greg

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-01-15 21:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-14 23:59 line-spanning regexp Tennis Smith
2003-01-15  1:47 ` Greg Hill
  -- strict thread matches above, loose matches on Subject: below --
2003-01-15 18:00 Bingham, Jay
2003-01-15 21:46 ` Greg Hill
     [not found] <mailman.319.1042653844.21513.help-gnu-emacs@gnu.org>
2003-01-15 20:13 ` Kevin Rodgers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).