From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Luc Teirlinck Newsgroups: gmane.emacs.devel Subject: Re: Matches for multiline regexps Date: Fri, 17 Jun 2005 21:48:32 -0500 (CDT) Message-ID: <200506180248.j5I2mW504853@raven.dms.auburn.edu> References: <200506160140.j5G1eFJ26066@raven.dms.auburn.edu> <200506170326.j5H3Qxc01563@raven.dms.auburn.edu> NNTP-Posting-Host: main.gmane.org X-Trace: sea.gmane.org 1119063966 10093 80.91.229.2 (18 Jun 2005 03:06:06 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 18 Jun 2005 03:06:06 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Jun 18 05:05:57 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1DjTeb-0001yF-M9 for ged-emacs-devel@m.gmane.org; Sat, 18 Jun 2005 05:05:53 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DjTkL-0001aL-3R for ged-emacs-devel@m.gmane.org; Fri, 17 Jun 2005 23:11:49 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1DjTho-0000j7-ND for emacs-devel@gnu.org; Fri, 17 Jun 2005 23:09:13 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1DjThc-0000cz-DH for emacs-devel@gnu.org; Fri, 17 Jun 2005 23:09:07 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DjTha-0000Zj-LZ for emacs-devel@gnu.org; Fri, 17 Jun 2005 23:08:58 -0400 Original-Received: from [131.204.53.104] (helo=manatee.dms.auburn.edu) by monty-python.gnu.org with esmtp (Exim 4.34) id 1DjTRD-0001GO-8w; Fri, 17 Jun 2005 22:52:03 -0400 Original-Received: from raven.dms.auburn.edu (raven.dms.auburn.edu [131.204.53.29]) by manatee.dms.auburn.edu (8.12.10/8.12.10) with ESMTP id j5I2npCK012861; Fri, 17 Jun 2005 21:49:51 -0500 (CDT) Original-Received: (from teirllm@localhost) by raven.dms.auburn.edu (8.11.7p1+Sun/8.11.7) id j5I2mW504853; Fri, 17 Jun 2005 21:48:32 -0500 (CDT) X-Authentication-Warning: raven.dms.auburn.edu: teirllm set sender to teirllm@dms.auburn.edu using -f Original-To: rms@gnu.org In-reply-to: (message from Richard Stallman on Fri, 17 Jun 2005 10:58:35 -0400) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:39058 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:39058 Richard Stallman wrote: Additional remark: from simpler examples. it appears that they are _intended_ to be line numbers. If so, this is a bug. Yes, it seems to be a bug in counting the line numbers. Could you fix that too? I will take a look at it, but first a decision has to be made on how we treat overlapping matches. (I am talking about matches that themselves overlap. I have no problem handling a match that starts on the same line on which a previous match ended, but later on the line, so that the matches themselves do not overlap, only one of their lines.) The current occur implementation for multiline regexps has _several_ problems. Apart from getting the line numbers wrong, the matches do not get correctly displayed: only their first line is shown. The current implementation _tries_ to "correctly" (in one of the two possible interpretations of what is "correct") find all matches in case there are overlapping matches. But it does not come close to succeeding in that. Worse, it has to pay for its attempt to do so by failing to find all matches in more natural cases where there are no overlapping matches and only one possible interpretation of "correct". The present occur implementation differs radically in philosophy with all other word or regexp search functions in Emacs and is backward incompatible with Emacs 21. I propose to have occur treat overlapping matches the same as the other Emacs search functions do, which is also the way occur behaved before Emacs 22. That is, given a buffer with the following five lines: 11 11 11 11 11 `M-x occur RET 11 C-q C-j 11 RET' will find two matches, one on line 1 and one on line 3. Those are the only matches that `C-M-s 11 C-q C-j 11 RET C-s C-s C-s...' at beginning of buffer is going to find. It is what occur does in Emacs 21. Implementing this correctly seems relatively easy and does not require paying a price in efficiency. If this interpretation is good enough for C-M-s, then why not for occur? Trying to fix occur to handle the other interpretation of "correct" (matches at lines 1, 2, 3 and 4) is possible but more difficult. (The current occur version can do that correctly in this example, but fails for many other examples.) Even a completely correct implementation would still present problems. It could make the handling of more natural regexps less efficient, it clashes with all other search functions in its philosophy, and it would not be clear how to display all multiline matches in a way that is clear and avoids excessive redundancy, because there could be a _lot_ of overlapping lines between matches. With my proposal only _consecutive_ entries in the *Occur* buffer could overlap and the overlap would be at most one line. With a correct implementation of the other interpretation, there is no limit in amount of overlap. Sincerely, Luc.