From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: "Davis Herring" <herring@lanl.gov>
Newsgroups: gmane.emacs.devel
Subject: Re: Structural regular expressions
Date: Thu, 9 Sep 2010 13:47:00 -0700 (PDT)
Message-ID: <46875.130.55.118.19.1284065220.squirrel@webmail.lanl.gov>
References: <loom.20100907T212314-566@post.gmane.org>
	<AANLkTimYvE0aqrG-OQxuY6BTca7ngzrfQUa62mOxyV=+@mail.gmail.com>
	<loom.20100907T222143-475@post.gmane.org> <87sk1lt4uf.fsf@gmail.com>
	<jwvsk1kaav2.fsf-monnier+emacs@gnu.org> <pvhphbi0wq0d.fsf@gmx.li>
	<jwvlj7c9ura.fsf-monnier+emacs@gnu.org>
Reply-To: herring@lanl.gov
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: dough.gmane.org 1284065239 9746 80.91.229.12 (9 Sep 2010 20:47:19 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Thu, 9 Sep 2010 20:47:19 +0000 (UTC)
Cc: Lawrence Mitchell <wence@gmx.li>, emacs-devel@gnu.org
To: "Stefan Monnier" <monnier@iro.umontreal.ca>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Sep 09 22:47:18 2010
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1Oto1c-0008Fb-7h
	for ged-emacs-devel@m.gmane.org; Thu, 09 Sep 2010 22:47:16 +0200
Original-Received: from localhost ([127.0.0.1]:49250 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1Oto1b-0008OR-L0
	for ged-emacs-devel@m.gmane.org; Thu, 09 Sep 2010 16:47:15 -0400
Original-Received: from [140.186.70.92] (port=38693 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Oto1T-0008No-Qm
	for emacs-devel@gnu.org; Thu, 09 Sep 2010 16:47:08 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <herring@lanl.gov>) id 1Oto1S-0004RL-HZ
	for emacs-devel@gnu.org; Thu, 09 Sep 2010 16:47:07 -0400
Original-Received: from proofpoint2.lanl.gov ([204.121.3.26]:49525)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <herring@lanl.gov>) id 1Oto1S-0004R2-9X
	for emacs-devel@gnu.org; Thu, 09 Sep 2010 16:47:06 -0400
Original-Received: from mailrelay2.lanl.gov (mailrelay2.lanl.gov [128.165.4.103])
	by proofpoint2.lanl.gov (8.14.3/8.14.3) with ESMTP id o89LP9EE023480;
	Thu, 9 Sep 2010 15:25:09 -0600
Original-Received: from localhost (localhost.localdomain [127.0.0.1])
	by mailrelay2.lanl.gov (Postfix) with ESMTP id 39F991A8995E;
	Thu,  9 Sep 2010 14:47:01 -0600 (MDT)
X-NIE-2-Virus-Scanner: amavisd-new at mailrelay2.lanl.gov
Original-Received: from webmail1.lanl.gov (webmail1.lanl.gov [128.165.4.106])
	by mailrelay2.lanl.gov (Postfix) with ESMTP id 1C8651A8994A;
	Thu,  9 Sep 2010 14:47:01 -0600 (MDT)
Original-Received: by webmail1.lanl.gov (Postfix, from userid 48)
	id 1A2441CA82DE; Thu,  9 Sep 2010 14:47:00 -0600 (MDT)
Original-Received: from 130.55.118.19 (SquirrelMail authenticated user 196434)
	by webmail.lanl.gov with HTTP; Thu, 9 Sep 2010 13:47:00 -0700 (PDT)
In-Reply-To: <jwvlj7c9ura.fsf-monnier+emacs@gnu.org>
User-Agent: SquirrelMail/1.4.8-5.el5_4.10.lanl3
X-Priority: 3 (Normal)
Importance: Normal
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.0.10011, 1.0.148,
	0.0.0000
	definitions=2010-09-09_11:2010-09-09, 2010-09-09,
	1970-01-01 signatures=0
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:129834
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/129834>

> Indeed, we could probably go a long way by simply extending our notion
> of region so as to allow it to be non-contiguous.
>
> Patches welcome,

This is no patch, but I had an idea for the interface for this:

Definition: simple region
The interval (possibly empty) between point and mark, exactly as it is now.

Variable: region-list
A set of non-empty, disjoint intervals, always local to each buffer.  Each
is a cons of two markers.  Typically each is highlighted in a subtle
fashion, even outside Transient Mark Mode.

Function: multi-region
Returns the union of the region list and the simple region (using
`point-marker' and/or `mark-marker' as needed).  (If the simple region is
empty and the region list is not, the simple region is ignored and the
return value equals `region-list'.)  This is the user-visible
possibly-disconnected upgrade to the region concept.

User option: multi-region-separator (default: "\n")
String to insert between separate intervals of the multi-region when
concatenated.

(defun multi-region-string (&optional sep)
  "Return the contents of the multi-region.
Separate intervals with SEP (or `multi-region-separator' if omitted)."
  (mapconcat (lambda (c) (buffer-substring (car c) (cdr c)))
             (multi-region) (or sep multi-region-separator)))

Rule: (interactive "r") maps over the multi-region.
Perhaps with some way to disable it (prefix command, or just a quick way
to suppress/restore the region list while leaving the simple region
alone), `call-interactively' would handle an interactive spec once
(including any prompting), then repeatedly call the function with the
start and end set to the start and end of each interval in the
multi-region in turn, in buffer order.

Rationale: This is a very intrusive change!  But it's often the right
thing (delete-region, upcase-region, ispell-region, translate-region,
underline-region, indent-region, count-lines-region,
expand-region-abbrevs, and probably eval-region) and is one of very few
ways of letting existing code apply in any sense to multi-regions.  (If
doing it by default is too much, a prefix "mutlify" command could be
provided instead, and all of this could be optional.)

Another spec ("R"?) could be added for commands like `narrow-to-region'
that should either operate only on the simple region (or fail if the
region isn't simple?).  Yet another spec might pass all of the
multi-region at once so that commands like `kill-region' and
`write-region' could use `multi-region-string' or otherwise act on them
coherently.

Command: keep-region
Unions the current simple region into the region list (may coalesce
existing intervals).  Immediately afterwards, the simple region is
entirely redundant and has no effect (until point or mark moves).

Command: drop-region
Removes the current simple region from the region list (may split existing
intervals).  Immediately afterwards, the multi-region is no different!

Command: drop-this-region
Remove the interval that contains point from the region list.

Command: drop-multi-region
Clears the region list (causing the multi-region to equal the simple region).

These low-level commands would be too tedious to be the principal user
mechanism for manipulating the multi-region.  So we add:

Command: mark-regexp
Add to the region list all matches for a regexp (following point, for
consistency with `how-many' and `keep-lines').  Framing the regexp with
^.*....*$ allows this command to mark lines (or a separate command could
do that for you).  Even when lines are marked in that fashion, the
newlines between them are not, so each line is a separate interval.

Command: unmark-regexp
Delete from the region list all regions within which a match for a regexp
exists.

These are analogous to the "highlight all" feature in Firefox, for
instance.  Then we can navigate among them:

Command: next-region
Move point to the closest following beginning of a region list interval. 
This could be used in macros.

Command: count-regions
Display in the echo area how many intervals are in the region list and the
multi-region (which may be one more or many fewer).

Since region lists are complicated things, the user might want to save
them and reuse them later, so letting registers hold them would be good. 
(Should they store the region list or the multi-region?)

WDOT?

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.