From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: "Bingham, Jay" Newsgroups: gmane.emacs.help Subject: RE: line-spanning regexp Date: Wed, 15 Jan 2003 12:00:36 -0600 Sender: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Message-ID: <72A87F7160C0994D8C5A36E2FDC227F504420DA9@txnexc01.americas.cpqcorp.net> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Trace: main.gmane.org 1042653831 7861 80.91.224.249 (15 Jan 2003 18:03:51 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 15 Jan 2003 18:03:51 +0000 (UTC) Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18YrtD-00022Q-00 for ; Wed, 15 Jan 2003 19:03:48 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18Yrtc-0006Hh-03 for gnu-help-gnu-emacs@m.gmane.org; Wed, 15 Jan 2003 13:04:12 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18Yrst-0005z1-00 for help-gnu-emacs@gnu.org; Wed, 15 Jan 2003 13:03:27 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18YrqS-0002xU-00 for help-gnu-emacs@gnu.org; Wed, 15 Jan 2003 13:00:58 -0500 Original-Received: from zcamail04.zca.compaq.com ([161.114.32.104]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18YrqD-0002av-00 for help-gnu-emacs@gnu.org; Wed, 15 Jan 2003 13:00:41 -0500 Original-Received: from cacexg11.americas.cpqcorp.net (cacexg11.americas.cpqcorp.net [16.105.250.94]) by zcamail04.zca.compaq.com (Postfix) with ESMTP id 44B2B2644 for ; Wed, 15 Jan 2003 10:00:38 -0800 (PST) Original-Received: from txnexc01.americas.cpqcorp.net ([16.74.7.244]) by cacexg11.americas.cpqcorp.net with Microsoft SMTPSVC(5.0.2195.2966); Wed, 15 Jan 2003 10:00:37 -0800 X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: line-spanning regexp Thread-Index: AcK8OECfHj9+9Uq9SnCwMRVjMAJ6/gAgv4RQ Original-To: X-OriginalArrivalTime: 15 Jan 2003 18:00:37.0784 (UTC) FILETIME=[02487180:01C2BCC0] X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.help:5512 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:5512 On Tuesday, January 14, 2003 7:47 PM Greg Hill Wrote > > >At 3:59 PM -0800 1/14/03, Tennis Smith wrote: >>Hi, >> >>How do I construct a regexp that looks for two strings that *might* span >>two consecutive lines?=20 >> >>For example, I need a regexp that will find string1 and string2 and >>everything in between for the following scenarios: >> >> >>blah blah blah blah string1 blah blah string2 blah blah blah >> >>-OR- >> >>blah blah string1 blah >>string2 blah blah >> >>TIA, >>-Tennis > >"string1[^\n]*[\n]?[^\n]*string2" > The above pattern for a regexp may NOT work in all circumstances. Specifically it may not work correctly when used in interactive regular expression searches (isearch-forward-regexp, C-M-S; isearch-backward-regexp, C-M-r; search-forward-regexp and search-backward-regexp). The reason that it may not work is that the escaped sequences \n and \t when entered in an interactive regexp DO NOT match newline and tab, although the Search -> Regexp Search info node does not mention this restriction and information contained at the Search -> Regexps info node might be interpreted as indication that they do. However, in the example given by the OP it works, but not for the reason that one might think. It works in this case because the expression [^\n] will match anything that is not a "\" or an "n", since a newline is not a backslash or the letter "n" it will match in either the first instance or the second instance of the [^\n]* as long as there is a backslash or an "n" in the text that that occurs between the start of string1 and end of string2. Change "string" to "text" in the buffer and the pattern will no longer match. The correct regexp (that does not depend on the presence of an n or \) to use in interactive searches is (as typed to enter it): "string1[^C-qC-j]*[C-qC-j]?[^C-qC-j]*string2" This will produce a string that looks like this when displayed: "string1[^^J]*[^J]?[^^J]*string2" The pattern suggested by Greg may also produce undesired results when the following condition exists in the buffer: blah blah string1 blah string2 blah blah blah blah blah string1 blah blah string2 blah blah blah In this case it will match from the start of string1 on first line to the end of string2 on the second line. If this is not the desired result the regexp can be modified to match the shortest rather than the longest string. In Emacs 21.1 and later versions the regexp to do this is: "string1[^\n]*?[\n]?[^\n]*?string2" Earlier versions of Emacs require a different construct, the regexp to use in those versions is: "string1\\(\\|[^\n]\\)*[\n]?\\(\\|[^\n]\\)*string2" See http://www.emacswiki.org/cgi-bin/wiki.pl?NonGreedyRegexp for more information. Happy emacsing -_ J_) C_)ingham . HP - NonStop Austin Software & Services - Software Quality Assurance . Austin, TX . "Language is the apparel in which your thoughts parade in public. . Never clothe them in vulgar and shoddy attire." -Dr. George W. Crane-