From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: A vision for multiple major modes: some design notes
Date: Fri, 22 Apr 2016 11:48:52 +0300
Message-ID: <83a8km58qz.fsf@gnu.org>
References: <20160420194450.GA3457@acm.fritz.box>
	<8360vb6o7u.fsf@gnu.org> <20160421221943.GE1775@acm.fritz.box>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: ger.gmane.org 1461314967 15844 80.91.229.3 (22 Apr 2016 08:49:27 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Fri, 22 Apr 2016 08:49:27 +0000 (UTC)
Cc: emacs-devel@gnu.org, dgutov@yandex.ru
To: Alan Mackenzie <acm@muc.de>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Apr 22 10:49:22 2016
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1atWm9-0000vY-9A
	for ged-emacs-devel@m.gmane.org; Fri, 22 Apr 2016 10:49:21 +0200
Original-Received: from localhost ([::1]:57226 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1atWm8-0004rJ-OB
	for ged-emacs-devel@m.gmane.org; Fri, 22 Apr 2016 04:49:20 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47907)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1atWlq-0004gi-5x
	for emacs-devel@gnu.org; Fri, 22 Apr 2016 04:49:04 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1atWlm-0002o6-Ur
	for emacs-devel@gnu.org; Fri, 22 Apr 2016 04:49:02 -0400
Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:59862)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1atWlm-0002o2-Qv; Fri, 22 Apr 2016 04:48:58 -0400
Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:3017
	helo=home-c4e4a596f7)
	by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128)
	(Exim 4.82) (envelope-from <eliz@gnu.org>)
	id 1atWlm-0006yd-0q; Fri, 22 Apr 2016 04:48:58 -0400
In-reply-to: <20160421221943.GE1775@acm.fritz.box> (message from Alan
	Mackenzie on Thu, 21 Apr 2016 22:19:43 +0000)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:203165
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/203165>

> Date: Thu, 21 Apr 2016 22:19:43 +0000
> Cc: dgutov@yandex.ru, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > A more subtle issue is with point movements that are not shown to the
> > user (those done by Lisp code of some command, before redisplay kicks
> > in) -- what will be the effect of those? do they trigger redisplay,
> > for example?
> 
> They shouldn't trigger redisplay, no.

But if that code calls sit-for or somesuch, they will, and the result
will be flickering.  But that's not a very important issue.

> > >     * - [Island] will be covered by the text property `island', whose value will be
> > >       the pertinent island or island chain (see section (ii)) (not yet
> > >       decided).  Note that if islands are enclosed inside other islands, the
> > >       value is the innermost island.  There is the possibility of using an
> > >       interval tree independent of the one for text properties to increase
> > >       performance.
> 
> > I don't understand the notion of "enclosed" islands: wouldn't such
> > "enclosing" simply break the "outer" island into two separate islands?
> 
> If we mark island start and end with the syntax-table text properties
> "{" and "}", we're going to have something like
> 
>     {     a{  }b    }
> 
> .  Simply to break the outer island into two pieces, we'd really need to
> apply delimiters at a and b, giving:
> 
>     {     }{  }{    }
> 
> .  This would overwrite the previous syntaxes at a and b, and this might
> be a Bad Thing.

We could design the stuff so that Bad Things won't happen.  I consider
this nesting of islands a (possibly unnecessary) complications that we
shouldn't accept unless we have a very good reason.  Nesting
immediately requires a plethora of operations that are otherwise not
necessary.

> > >   o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as
> > >     whitespace, much as they do comments.  They will also treat as whitespace
> > >     the gap between two islands in a chain.
> 
> > Why whitespace? why not some new category?  By overloading whitespace,
> > you make things harder on the underlying infrastructure, like regexp
> > search and matching.
> 
> I think it's clear that the "foreign" island's syntax has no interaction
> with the current island.

This is not a contradiction to what I suggested.  The new category
could be treated the same as whitespace, in its effect on
syntax-related issues.  By contrast, having whitespace regexp class be
indistinguishable from an island probably means complications on a
very low level of matching regular expressions and syntax constructs,
something that I fear will get in the way.

> If we treat it as whitespace, that should minimise the amount of
> adapting we need to do to existing major modes.

We need to consider the amount of adaptations in the low-level
infrastructure code as well, not only on the application level.

> I envisage that a regexp element will match the "foreign" island if that
> element would match a space.  I know this sounds horrible, but I haven't
> come up with a scenario where this wouldn't work well.

And I say this is a bomb waiting to go off.  It is relatively easy to
add a new regexp construct for an island (e.g., we already support
categories in regexps, so just defining a category is one easy way),
and treat that as whitespace, while still keeping our options open to
make it behave slightly differently if needed, and still allowing the
applications to specify one, but not the other.  By contrast, if we
decide that whitespace matches an island, we are opening a giant can
of worms.  Here's one worm out of that can: some low-level operations
need to search the buffer using regexps disregarding any narrowing --
what you suggest means these operations cannot safely use whitespace
in their regexps.  This is something to stay away of, IMO.

> > Extending [:space:] that way seems to be an implementation detail
> > leaking to user level.  I think we should avoid that at all costs.
> 
> Why?  I don't understand your last paragraph.

See above.  [:space:] is something used a lot in Lisp applications, so
we leak the implementation of islands to that level: from now on, each
Lisp application will need to consider the possibility that searching
for [:space:] will find an island, something that might have no
relation to whitespace.

> > I'm not sure I understand the details.  E.g., where will the
> > island-chain local values be stored?
> 
> In a C struct chain, analogous to struct buffer, using much the same
> mechanisms.

What object(s) will that chain be rooted at?  And how will it be
related to its buffer?

> > To remind you, buffer-local variables have a special object in their
> > symbol value cell, and BVAR only works for the few buffer-local
> > variables that are stored in the buffer object itself.  I'm not sure I
> > understand how CVAR could solve the problem you need to solve, which
> > is keeping multiple chains per buffer, each one with its values of
> > these variables.
> 
> CVAR would get the current chain from the `island' (or `chain') text
> property at the position.

If it is stored in the text property, then you will have to decide
what happens when text is copied and yanked elsewhere.

> If this is nil, it would do what BVAR does.

Once again, BVAR only handles variables that are part of the buffer
object itself.  The other buffer-local variables (which are the
majority) are handled as part of switching the buffer, and the C code
simply refers to them by name.  So BVAR is not necessarily the correct
model for what you are designing.

> Otherwise it would access the appropriate named element in the struct
> chain.  I think CVAR would take three parameters: the variable name, the
> buffer, and the buffer position.

Can you show a pseudo-code of CVAR?  I'm afraid I'm missing something
here, because I don't see clearly what you have in mind.

> Other chain local variables would be accessed through an alist in the
> struct chain holding miscellaneous variables, exactly as is done for
> the other buffer local variables in struct buffer.

There's no such alist in how we access buffer-local variables, not
AFAIK.  Again, I must be missing something here.

> > This actually sounds like a simple extension of narrowing, so I wonder
> > why do we need so many new object types and notions.
> 
> I think it's more like a complicated extension of narrowing.  :-)

It's simple because instead of one region you have more than one, and
the user-level commands don't affect them.  All the other changes are
exact reproduction of what narrowing does.

> I think that chain local variables are essential to multiple major
> modes - you can't have m.m.m. without some sort of chain locality.

What is "chain locality"?

> I also think that for a major mode to work transparently over
> several chained islands, all the irrelevant stuff between the
> islands needs to be made, er, transparent.

Yes, but how is that related to my comment about extending narrowing?

> > I don't see any discussion of how redisplay will deal with islands.
> > To remind you, redisplay moves through portions of the buffer, without
> > moving point, and access buffer-local variables for its job.  You need
> > to augment the design with something that will allow redisplay see the
> > correct values of variables depending on the buffer position it is at.
> > The same problem exists for any features that use display simulation
> > for making decisions about movement and layout, e.g. vertical-motion.
> 
> I think redisplay is mostly controlled by variables (such as
> `scroll-margin') accessed by BVAR.  These calls could be replaced by
> CVAR.

That's not the whole story; once again, you forget about buffer-local
variables that are not part of the buffer object; BVAR is not used for
those.  I gave an example of one such variable: face-remapping-alist,
and I selected that variable for a reason.  Here's how the display
engine refers to it in the current codebase:

	  base_face_id = it->string_from_prefix_prop_p
	    ? (!NILP (Vface_remapping_alist)
	       ? lookup_basic_face (it->f, DEFAULT_FACE_ID)
	       : DEFAULT_FACE_ID)
	    : underlying_face_id (it);

Another example (which I also mentioned) is standard-display-table:

  /* Use the standard display table for displaying strings.  */
  if (DISP_TABLE_P (Vstandard_display_table))
    it->dp = XCHAR_TABLE (Vstandard_display_table);

See? no BVAR anywhere in sight.

> Problems will arise if redisplay reads the variable once, and
> fails to read it again when its current position moves into or out of an
> island.  Redisplay would have to be aware of island boundaries, and
> re-read the controlling variables on passing a boundary.  Other than
> that, I can't see any big problems.  Not yet, anyway.

To remind you, the display engine works by examining characters from
the buffer text one by one.  Are you saying that it will have, for
each character it examines, to look up the island chain for possible
changes?  That would make it abysmally slow, I think.

IOW, part of your design needs to provide some efficient means for
redisplay to "be aware of island boundaries, and re-read the
controlling variables on passing a boundary".

There's one more complication, which is related to redisplay, but not
only to it.  You write:

> (ix) Miscellaneous commands and functions.
>   o - `point-min' and `point-max' will, when `in-islands' is non-nil, return
>     the max/min point in the visible region in the same chain of islands as
>     point.
>   o - `search-\(forward\|backward\)\(-regexp\)?' will restrict themselves to
>     the current island chain when `in-islands' is non-nil.
>   o - `skip-\(chars\|syntax\)-\(forward\|backward\)' will likewise operate in
>     the current island chain (how?) when `in-islands' is non-nil.
>   o - `\(next\|previous\)-\(single\|char\)-property-change', etc., will do the
>     Right Thing in island chains when `in-islands' is non-nil.
>   o - New functions `island-min', `island-max', `island-chain-min' and
>     `island-chain-max' will do what their names say.
>   o - There will be no restrictions on the use of widening/narrowing, as have
>     been proposed for other support engines for multiple major modes.
>   o - New commands like `beginning-of-island', `narrow-to-island', etc. will
>     be wanted.  More difficultly, bindings for them will be needed.

Something bothers me there.  What will "M-<" and "M->" do, if
point-min and point-max are limited to the current island?  Likewise
the search commands -- they cannot be limited to the current island,
unless the user explicitly says so (and personally, I don't envision
users to ask to be so limited).

There's a dichotomy here, between the underlying C-level variables
that currently are set to the limits of the narrowed region, and
affect all user commands and internal operations (e.g., the display
engine never looks beyond these limits); and the multi-mode
functionality that needs to narrow the view even more.  If you
propagate the island-level limitations too deep, they will affect user
commands and features (like display) that have nothing to do with the
reason for which islands are being designed.  E.g., a naīve
replacement of C macros BEGV and ZV with something that returns the
beginning and end of the current island will cause the display show
only the current island, as if you narrowed the buffer to that
island.  I'm sure that's not what we want.