From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Drew Adams <drew.adams@oracle.com>
Newsgroups: gmane.emacs.devel
Subject: RE: A vision for multiple major modes: some design notes
Date: Thu, 21 Apr 2016 09:05:23 -0700 (PDT)
Message-ID: <64f1d39a-dfd0-44ca-86c1-b4d6104b5702@default>
References: <20160420194450.GA3457@acm.fritz.box>
	<05d5bd7e-1cea-4336-a37c-fe6bd6752558@default>
	<20160421124325.GC1775@acm.fritz.box>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1461254774 21836 80.91.229.3 (21 Apr 2016 16:06:14 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 21 Apr 2016 16:06:14 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Alan Mackenzie <acm@muc.de>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Apr 21 18:06:01 2016
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1atH75-0001ll-T1
	for ged-emacs-devel@m.gmane.org; Thu, 21 Apr 2016 18:05:56 +0200
Original-Received: from localhost ([::1]:43744 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1atH75-0002Lm-8S
	for ged-emacs-devel@m.gmane.org; Thu, 21 Apr 2016 12:05:55 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56420)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <drew.adams@oracle.com>) id 1atH6l-00028n-IW
	for emacs-devel@gnu.org; Thu, 21 Apr 2016 12:05:40 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <drew.adams@oracle.com>) id 1atH6e-0007c9-Vl
	for emacs-devel@gnu.org; Thu, 21 Apr 2016 12:05:35 -0400
Original-Received: from userp1040.oracle.com ([156.151.31.81]:40572)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <drew.adams@oracle.com>) id 1atH6e-0007bM-MP
	for emacs-devel@gnu.org; Thu, 21 Apr 2016 12:05:28 -0400
Original-Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233])
	by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with
	ESMTP id u3LG5P0o004518
	(version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Thu, 21 Apr 2016 16:05:26 GMT
Original-Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75])
	by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u3LG5Oat014417
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Thu, 21 Apr 2016 16:05:25 GMT
Original-Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25])
	by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id u3LG5NcP025810;
	Thu, 21 Apr 2016 16:05:24 GMT
In-Reply-To: <20160421124325.GC1775@acm.fritz.box>
X-Priority: 3
X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.9  (901082) [OL
	12.0.6744.5000 (x86)]
X-Source-IP: aserv0021.oracle.com [141.146.126.233]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic]
X-Received-From: 156.151.31.81
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:203147
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/203147>

> This is a good point.  Maybe it would be better to match an island or
> the gap between two chained islands with any regexp element which
> matches the space (the good old 0x20 character).

See also Eli's feedback about this.  I think I agree with him
that trying to repurpose whitespace matching for this is maybe
the best approach.  A separate matching should perhaps be used -
nothing to do with whitespace per se, even if the matching used
might take whitespace (also) into account.

> > I'm pretty sure I would want to be able do things throughout
> > a chain that spans different buffers.  If it were I, I would
> > think about defining all that you are doing using a structure
> > that is multi-buffer.
>
> I don't envisage that the island chains will really be that useful for
> (user initiated) searching, etc.  The idea is that, to the user, such a
> buffer will look much like it already does, except that the font locking
> will be appropriate for each island, the major mode key map will be
> right for each island, and so on.

I see it differently.  I think you see it that way because for
you the major mode thing is an essential part of the feature
you want to implement - it is primary.  To me, chains of islands
should be the primary, and a very general, thing, and one
(important) use of them would be to apply a mode to them
("multi-modes").

IOW, I see (lots of) possible uses for chains of islands that
go beyond (i.e., do not necessarily involve) the application
of a particular mode to them.  And in the general case I see
no reason to limit chains to a single buffer.

That doesn't mean that there wouldn't be important cases that
do limit the use to either (a) applying a given major mode or
(b) a single buffer.  I just don't see why we would build
such limits into the design (i.e., hardcoded, making it hard
to extend to either (a) mode-agnostic or (b) multi-buffer).=20

> > [That is what I did for zones.el, for instance - sets of such
> > text zones are delimited by markers, which automatically record
> > the buffer they pertain to.  And they can be persistent, as well.
> > Have you considered the possibility of persisting island chains?]

Persistence?

> > And I would probably want user-level operations, to combine
> > chains (append, intersect, union/coalesce, difference).
> > And why not be able to do that for chains that cross buffers?
>=20
> The chains will be disjoint, so intersection/difference wouldn't be
> useful.

I understand that the islands in a chain would be disjoint.
But why would chains necessarily be disjoint?  Why shouldn't
chains be independent (at least be able to be independent)?
Why would defining one chain impose limits on defining other
chains (any new chains would need to be disjoint from existing
ones)?

See above, regarding the utility of being able to ignore a
chain's mode for certain operations (and the ability for a
chain to not even have an associated mode).  I suspect that
you are not seeing the use cases I am, which involve doing
all kinds of things to/with the text in a chain of islands.

As Eli suggested, think of a chain of islands as an extension
of narrowing.  Now think of the many different kinds of things
you (or code) do to a narrowed region.  This should be a more
general feature, I think, than what is available in something
like MuMaMo or mmm.  "Multi-modes" is a subcase.

Again, I see a chain of (ordered) text regions as the primary,
general feature, and the mapping (restriction) of a major mode
to such a chain as a subsidiary feature.

> Given that the essential feature of a chain is its major mode,

That is where we differ, and that explains, I think, the
narrower focus you have.  I wouldn't limit the feature to
being coupled to a mode.  That should be a possibility but
not a requirement.

> it wouldn't make sense to combine chains (which will usually
> have different major modes).

It would make sense, depending on what kind of operation you
wanted to apply to the text in chains.  And chains with the
same mode could also be combined, whether in the same buffer
or not.

> I'm still trying to think through the idea of a
> chain having islands in several buffers.

Think of the chains first as just buffer narrowings that
are multi-region, i.e., ignoring all the syntax and
major-mode features that you are thinking about.  (You
can still think of those, but they come in at a different
level - a specific subfeature or set of use cases.)

> > Being able to add (e.g. append) a chain in one buffer to a chain
> > in another buffer is one simple example.  Anything you might want
> > to do with one chain you will likely want to be able to do with
> > a set of chains, or at least with a chain that results from
> > composing a set of chains in various ways.
>=20
> > Also, I'm guessing/hoping, but I'm not sure I saw this explicitly,
> > that you can have multiple chains (e.g. in the same buffer) that
> > use the same major mode.
>=20
> Indeed, yes.
>=20
> > Being associated with a major mode is only one possible attribute of a
> > chain - it is not required, and other attributes and uses of a chain
> > are not dependent on it, right?  IOW, it is not necessary to think of
> > chains as mode-related - that is just one (albeit common) use &
> > interpretation, right?
>=20
> Not right, sorry.  The major mode is an essential attribute of an
> island chain.

Why?  What's necessarily essential about it?  That's a design
choice, no?  Would you consider dropping it as a requirement
and keeping it as an option (for any given chain)?

> There will be a slot for it in the structure which holds chain
> data, just as there is currently a slot for it in the (C) buffer
> structure.

Must the slot be filled?  Always?  (Why?)

> There will likewise be slots for the syntax table, major
> mode key map, and so on.  None of these slots would work well with a
> null value.

Why not optional?  Of course if such a slot is not used then
it, and anything that depends on it, would not "work well".
But that should not prevent other, non-mode-related uses of
a chain from working OK.

> > >   o - An island will be delimited in two complementary ways:
> > >     * - It will be enclosed syntactically by characters with
> > >       "open island" and "close island" syntax (see section (v)).
> > >       Both of these syntactic markers will include a flag "chain"
> > >       indicating whether there is a previous/next island in the
> > >       chain.  The cdr of the syntax value will be
> > >       the island chain to which the island belongs.
> > >     * - It will be covered by the text property `island', whose
> > >       value will be the pertinent island or island chain
>=20
> > Are both always required, or is either sufficient for most
> > purposes?
>=20
> Both are required, yes.  They will both be used.

Why required?  Why can't the design tolerate not having
syntax-based delimiting?

I would prefer to see what you're envisaging placed within
the context of a more general feature.  I see 3 possible
levels, in fact:

1. Arbitrary sets of text zones.  Not necessarily ordered
   (e.g. by buffer position).  Not necessarily without
   overlap.

2. #1, but as chains: ordered, non-overlapping.

3. #2, but with an associated major mode per chain.
   This is essentially what you have in mind, I think.

For all 3 levels I can see use cases for chains that cross
buffers and use cases for chain-combining operations.

I can also imagine using some chain-local variables that
are not buffer-specific or mode-specific.  (You already
allow for that, IIUC.)

> > I'm thinking that in many contexts I would not care about
> > delimiting by syntax, and I might not even care about
> > associating a given chain with a mode.  Would I be able to
> > use such chains nevertheless (e.g. search/replace across them)?
>=20
> I'm not sure this island mechanism is the right tool for doing what
> you're suggesting.

Depends on what it ends up being. ;-)

> For searching/replacing at the user level, some
> extra option meaning "only in the current chain" would need to be
> added to the user interface.

FWIW, I've done this for arbitrary sets of zones (including
across buffers).  The code is in `isearch-prop.el' (which
depends on `zones.el' for this feature).

Also, wrt "the current chain": You might want to look at
the zones.el code for the use of variables (which can be
buffer-local, but need not be) that hold sets of zones
(including sets that are "chains") - how users can create
them, choose among them, clone them, persist them, etc.

> > A priori, I would like to have a chain data structure, and
> > as much of the rest of the features as possible, be available
> > and manipulable from Lisp.  Something like this has lots of
> > enhancement possibilities and use cases that we are unlikely
> > to imagine at the outset.  Implementing more than an absolute
> > minimum in C hampers that exploration and improvement.
>=20
> One idea would be to implement a chain feature, one of whose uses would
> be the major mode islands I've been trying to specify.

That's what I've been trying to suggest: chains of zones are
more general than the feature you've described.  That doesn't
take away from the importance of the use case you have in mind.

> A significant
> part of this would have to be implemented at the C level for speed -
> chain local variables are already going to be slower to access than
> buffer local variables.  We must keep that difference to a minimum.

I have no problem with stuff being in C for performance reasons.
When that is not critical, keeping stuff in Lisp is good.

Especially for a new and very general feature: let folks play
with it and experiment with new possibilities.  We can later
optimize any parts we like.

We should avoid doing that prematurely, as always - but
especially for Emacs, where Lisp enhancement by users is
really the name of the game.

Thanks again for opening this discussion and providing
a detailed first proposal.