A vision for multiple major modes: some design notes

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* A vision for multiple major modes: some design notes
@ 2016-04-20 19:44 Alan Mackenzie
  2016-04-20 21:06 ` Drew Adams
                   ` (4 more replies)
  0 siblings, 5 replies; 45+ messages in thread
From: Alan Mackenzie @ 2016-04-20 19:44 UTC (permalink / raw)
  To: emacs-devel, Dmitry Gutov

Hello, Dmitry and Emacs.

This post describes my notion of how multiple major modes {c,sh}ould be
implemented.  Key notions are "islands", "island chains", and "chain
local" variable bindings.

In this scheme, "super modes" will not have to do anything to swap in/out
local variable bindings pertinent to islands; this will be done by the
underlying C code.  Narrowing/widening will not be (ab)used by the super
mode mechanism.  Major modes will continue to be able to use the entire
range of Emacs facilities.

Here are some design notes:

(i) Overview and motivation.
  o - The aim is to support several major modes simultaneously in a single
    buffer.
  o - The "super mode" will set up "chains of islands" (see below).
    * - Each chain will have its own major mode, key map, syntax table, etc.
    * - In each chain, "chain local" variable bindings will exist.  Such a
      binding will be current when point is within an island in the chain.
    * - The coordination of these bindings will be carried out by the
      mechanisms described below, without explicit coding in the super mode.
  o - To the user, the current major mode will be that of the island where
    point is.  All familiar commands will work without restriction.
  o - To the writer of major modes, a minimal set of restrictions will apply:
    * - For some major mode commands, the mode will have to bind the variable
      `in-islands' (see below) to non-nil.
    * - For regexps which recognise whitespace, the regexp must contain "\\s-"
      or "\\s " or "[[:space:]]" so that the regexp engine will handle
      "foreign" islands and gaps between chained islands as whitespace.
    * - All other Emacs facilities will be available for use, being adapted as
      necessary for the island mechanism.

(ii) Definitions and concepts.
  o - An @dfn{island} is a contiguous portion of a buffer marked at each end.
    Its attributes are those of the chain of islands of which it is an
    element.
  o - A @dfn{chain} of islands is a canonically ordered chain of islands in a
    single buffer.  An island chain has its own major mode; it has its own
    syntax table, abbreviation table, font lock settings, etc.  It has its own
    bindings of (most) "buffer" local variables.
  o - An island chain will have @dfn{chain local} variable bindings.  Such a
    binding will become current and accessible when point is within one of the
    chain's islands.  When point is not in an island, the buffer local binding
    of the variable will be current.  Most variables which are currently
    buffer local in Emacs 25 will become chain local.  Those (relatively few)
    variables which must retain a single value over an entire buffer will be
    marked as such with a non-nil value of the `entire-buffer' property.
  o - The variable `using-islands' will be set non-nil to indicate the current
    buffer is using the island mechanism.
  o - The variable `in-islands' will control island and island chain
    facilities.  When this variable is bound to non-nil, the facilities
    described here (such as chain local variables) are active.  When the
    variable is nil, (most of) the new facilities are inactive, and Emacs
    behaves as Emacs 25.

(iii) Island Chains.
  o - An island chain will be a Lisp object which is a C struct similar to
    struct buffer.  In particular, it will contain slots for common chain
    local variables, and an association list for bindings of other chain local
    variables.
  o - An island chain might contain pointers to the first and last of its
    islands (still to be decided).

(iv) Islands.
  o - An island will be delimited in two complementary ways:
    * - It will be enclosed syntactically by characters with "open island" and
      "close island" syntax (see section (v)).  Both of these syntactic
      markers will include a flag "chain" indicating whether there is a
      previous/next island in the chain.  The cdr of the syntax value will be
      the island chain to which the island belongs.
    * - It will be covered by the text property `island', whose value will be
      the pertinent island or island chain (see section (ii)) (not yet
      decided).  Note that if islands are enclosed inside other islands, the
      value is the innermost island.  There is the possibility of using an
      interval tree independent of the one for text properties to increase
      performance.
  o - An island might be represented by a C or Lisp structure, it might not
    (not yet decided).  This structure would hold the containing chain,
    markers pointing to the start and end of the chain, and the previous and
    next islands in the chain.

(v) Syntax, etc.
  o - Two new syntax classes, "open island" and "close island" will be
    introduced.  These will be designated by the characters "{" and "}".  Their
    "matching character" slots will contain the island's chain.  There will be
    an extra flag "chain" (denoted by "i") indicating whether there is a
    previous/next island in the chain.
  o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as
    whitespace, much as they do comments.  They will also treat as whitespace
    the gap between two islands in a chain.
  o - The (currently 11 element) parser state will be enhanced to support
    islands as follows:
    * - A twelfth element will be introduced.  This will contain an
      association list whose elements will have the form (island-chain
      . 12-element parse state); each element will contain the suspended state
      of parsing in the island chain which is the car of the element.  An
      element with a car of nil will represent the suspended parsing state of
      the buffer outside of islands.
    * - Elements 12, 13, .... will be island chains of the enclosing islands,
      elt 12 being that of the innermost enclosing island, etc.  An element
      with a value of nil indicates being outside all islands.
  o - `parse-partial-sexp' will create and use an enhanced parser state as
    described above.  Note that a two character construct (such as a C comment
    opener) can not enclose an island, and special handling will be required
    to exclude this.  The syntax table in use will change as the current
    position passes between islands.
  o - `syntax-ppss' will do the right thing with the extended parser state.
    Alternatively, `syntax-ppss' will have an independent 12-element state in
    each island chain, where elt. 11 is always nil.  Its cache mechanism will
    be enhanced such that buffer changes outside of an island chain need not
    invalidate the stored cache pertaining to the chain.
  o - The facilities in this section are active even when `in-islands' is
    nil.

(vi) Regexps.
  o - The regexp engine will be enhanced such that the regexps "\\s-", "\\s ",
    and "[[:space:]] will match an entire island.
  o - The gap between two islands in a chain will also be matched by the above
    regexps.
  o - This treatment of an island, and a gap between two islands, as WS will
    occur only when `in-islands' is non-nil.
  o - When `in-islands' is nil, there will be no reliable way of scanning over
    an island by regexps, since it is a potentially nested structure, and FSMs
    don't recognise arbitrarily nested structures.

(vii) Variables.
  o - Island chain local variable bindings will come into existence.  These
    bindings depend on the island point is in.  There will be lower level
    routines that will have "position" parameters as an alternative to using
    point.
  o - All variables which are currently buffer local will become chain local
    except for those whose symbols are given a non-nil `entire-buffer'
    property.  There will be no new functions like
    `make-chain-local-variable'.
  o - When the `entire-buffer' property is nil, the buffer local binding of a
    variable will hold the value pertinent to the areas of the buffer outside
    of islands.  When that property is non-nil, the binding holds the value
    for the entire buffer.
  o - When `in-islands' is nil, the chain local mechanism described here is
    not used - instead the familiar buffer local binding is used.
  o - The current binding for a local variable will be the chain local binding
    of the island chain of the island containing point.  If point is not in an
    island, the buffer local binding is current.
  o - If a chain local binding is current, and its value is unbound, the
    binding of an enclosing scope is NOT used in its place.  Probably the
    variable's default-value should be used when reading.
  o - In buffer.h, a new macro CVAR ("island chain variable") analogous to
    BVAR will be introduced.  It will use BVAR as a fall back.  Most
    invocations of BVAR will be changed to CVAR.
  o - In data.c, the mechanism for accessing local variable bindings
    (e.g. `swap_in_symval_forwarding') will be enhanced to test `in-islands'
    and handle chain local bindings appropriately.

(viii) Change hooks.
  o - There will be two additional abnormal hooks,
    `island-before-change-function' and `island-after-change-function', which
    will each hold a single function or nil.  These will take the same
    parameters as `before-change-functions' and `after-change-functions'
    respectively.
  o - The return value of these functions will be an association list with
    members whose car is an island chain (or nil, meaning "outside all
    islands") and whose cdr is the list of parameters to supply to
    `before/after-change-functions for that chain.  Usually, the alist will
    have just one member containing BEG, END, and for `after-..' OLD-LEN
    unchanged.
  o - After calling each of these functions, Emacs will invoke
    `before/after-change-functions' on each chain in the returned alist.  This
    will be in place of the standard calls to `before/after-change-functions'.
  o - The intention of these hooks is that super modes will use them to detect
    the deletion and insertion of islands, and to do the "de-islandification"
    and "islandification" as needed.
  o - `before/after-change-functions' will be normal chain local variables.
    A chain local binding will hold functions for the individual chain.  The
    buffer local binding will hold functions for the parts of the buffer
    outside of islands.

(ix) Miscellaneous commands and functions.
  o - `point-min' and `point-max' will, when `in-islands' is non-nil, return
    the max/min point in the visible region in the same chain of islands as
    point.
  o - `search-\(forward\|backward\)\(-regexp\)?' will restrict themselves to
    the current island chain when `in-islands' is non-nil.
  o - `skip-\(chars\|syntax\)-\(forward\|backward\)' will likewise operate in
    the current island chain (how?) when `in-islands' is non-nil.
  o - `\(next\|previous\)-\(single\|char\)-property-change', etc., will do the
    Right Thing in island chains when `in-islands' is non-nil.
  o - New functions `island-min', `island-max', `island-chain-min' and
    `island-chain-max' will do what their names say.
  o - There will be no restrictions on the use of widening/narrowing, as have
    been proposed for other support engines for multiple major modes.
  o - New commands like `beginning-of-island', `narrow-to-island', etc. will
    be wanted.  More difficultly, bindings for them will be needed.
  o - ??? Other commands to be amended.

(x) Emacs subsystems and `in-islands'.
  o - Redisplay will bind `in-islands' to non-nil, but will successfully
    display all islands wholly or partially in windows being displayed.
  o - Font Lock will bind `in-islands' to non-nil, but will successfully
    fontify all pertinent islands.
  o - `island-before/after-change-function' will be called with `in-islands'
    nil.
  o - `before/after-change-functions' will be called with `in-islands' bound
    to non-nil.
  o - Major modes will need to bind `in-islands' to non-nil for such things as
    indentation.
  o - For normal user interaction, `in-islands' will be nil.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: A vision for multiple major modes: some design notes
  2016-04-20 19:44 A vision for multiple major modes: some design notes Alan Mackenzie
@ 2016-04-20 21:06 ` Drew Adams
  2016-04-20 23:00   ` Drew Adams
  2016-04-21 12:43   ` Alan Mackenzie
  2016-04-20 22:27 ` Phillip Lord
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 45+ messages in thread
From: Drew Adams @ 2016-04-20 21:06 UTC (permalink / raw)
  To: Alan Mackenzie, emacs-devel

Sounds very good, a priori.  And I commend you for actually
putting together a clear and comprehensive design proposal
for discussion, instead of just implementing something.
Especially for something that is likely to lead to new uses
and further possibilities, it is good to open up the big
picture for discussion (regardless of the outcome).

Some feedback, mostly minor -

>     * - For regexps which recognise whitespace, the regexp must contain
>         "\\s-" or "\\s " or "[[:space:]]" so that the regexp engine will
>         handle "foreign" islands and gaps between chained islands as whitespace.

I understand the motivation (you explain it further on).  But this
hardcoding of what can constitute a "whitespace-matching" pattern
seems a bit rigid.  No way to flexibly allow for different meanings
of whitespace here?  What if some code wants to handle \n or \t or
\f etc. differently, or to even treat some set of (normally
non-whitespace) chars as if they too were whitespace for island
purposes?

>   o - A @dfn{chain} of islands is a canonically ordered chain of islands in
>       a single buffer.

Why limit it necessarily to a single buffer?  It is common to
want to do things (search etc.) across multiple buffers, and
sometimes regardless of mode.  That doesn't diminish just
because one might want to use chains of non-contiguous text
zones.

I'm pretty sure I would want to be able do things throughout
a chain that spans different buffers.  If it were I, I would
think about defining all that you are doing using a structure
that is multi-buffer.

[That is what I did for zones.el, for instance - sets of such
text zones are delimited by markers, which automatically record
the buffer they pertain to.  And they can be persistent, as well.
Have you considered the possibility of persisting island chains?]

And I would probably want user-level operations, to combine
chains (append, intersect, union/coalesce, difference). 
And why not be able to do that for chains that cross buffers?
Being able to add (e.g. append) a chain in one buffer to a chain
in another buffer is one simple example.  Anything you might want
to do with one chain you will likely want to be able to do with
a set of chains, or at least with a chain that results from
composing a set of chains in various ways.

Also, I'm guessing/hoping, but I'm not sure I saw this explicitly,
that you can have multiple chains (e.g. in the same buffer) that
use the same major mode.  Being associated with a major mode is
only one possible attribute of a chain - it is not required, and
other attributes and uses of a chain are not dependent on it, right?
IOW, it is not necessary to think of chains as mode-related - that
is just one (albeit common) use & interpretation, right?

>   o - An island will be delimited in two complementary ways:
>     * - It will be enclosed syntactically by characters with
>       "open island" and "close island" syntax (see section (v)).
>       Both of these syntactic markers will include a flag "chain"
>       indicating whether there is a previous/next island in the
>       chain.  The cdr of the syntax value will be
>       the island chain to which the island belongs.
>     * - It will be covered by the text property `island', whose
>       value will be the pertinent island or island chain

Are both always required, or is either sufficient for most
purposes?  Is the syntax one needed only when you need to
take advantage of it?  Can you do most things using either,
so that a given operation (that is not specific to only one
of them, e.g. not specific to syntax) can be done regardless
of which is available?

I'm thinking that in many contexts I would not care about
delimiting by syntax, and I might not even care about
associating a given chain with a mode.  Would I be able to
use such chains nevertheless (e.g. search/replace across them)?

>       Note that if islands are enclosed inside other islands,

Maybe you can elaborate on overlapping islands and chains? 
What caveats or use cases do you see?

A priori, I would like to have a chain data structure, and
as much of the rest of the features as possible, be available
and manipulable from Lisp.  Something like this has lots of
enhancement possibilities and use cases that we are unlikely
to imagine at the outset.  Implementing more than an absolute
minimum in C hampers that exploration and improvement.

HTH.  I don't claim to have grasped all of what you envisage.
It's great food for thought, in any case.

(I asked a couple of times, in the bug thread(s) and here,
for just this sort of top-level picture of what was envisaged.
I gave up hoping that someone might actually make clear what
the question/project/plan is.  This is a welcome, if unexpected,
development.)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: A vision for multiple major modes: some design notes
  2016-04-20 21:06 ` Drew Adams
@ 2016-04-20 23:00   ` Drew Adams
  2016-04-21 12:43   ` Alan Mackenzie
  1 sibling, 0 replies; 45+ messages in thread
From: Drew Adams @ 2016-04-20 23:00 UTC (permalink / raw)
  To: Alan Mackenzie, emacs-devel

I said:

> And I would probably want user-level operations, to
> combine chains (append, intersect, union/coalesce,
> difference).

And complement - get a new chain as the complement of a
chain, i.e., the islands of one are the non-islands of the
other.  You should easily be able to search etc. _outside_
the islands of a given chain.

--

I did that for zones.el, for instance:

  zz-zones-complement is a Lisp function in `zones.el'.
  (zz-zones-complement ZONES &optional BEG END BUFFER)

  Return a list of zones that is the complement of ZONES, from BEG to END.
  ZONES is assumed to be a union, i.e., sorted by car, with no overlaps.
  Any extra info in a zone of ZONES, i.e., after the cadr, is ignored.

(The bit about being a union and sorted by car just means
that the list of zones must be like your chain of islands:
no overlaps and ordered by buffer position.  The bit about
ignoring cadr has to do with the fact that a zone (~island)
can contain other information, in addition to its limits.)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-20 21:06 ` Drew Adams
  2016-04-20 23:00   ` Drew Adams
@ 2016-04-21 12:43   ` Alan Mackenzie
  2016-04-21 14:24     ` Stefan Monnier
                       ` (3 more replies)
  1 sibling, 4 replies; 45+ messages in thread
From: Alan Mackenzie @ 2016-04-21 12:43 UTC (permalink / raw)
  To: Drew Adams; +Cc: emacs-devel

Hello, Drew.

On Wed, Apr 20, 2016 at 02:06:37PM -0700, Drew Adams wrote:
> Sounds very good, a priori.  And I commend you for actually
> putting together a clear and comprehensive design proposal
> for discussion, instead of just implementing something.
> Especially for something that is likely to lead to new uses
> and further possibilities, it is good to open up the big
> picture for discussion (regardless of the outcome).

Thanks, that's appreciated.

> Some feedback, mostly minor -

;-)

> >     * - For regexps which recognise whitespace, the regexp must contain
> >         "\\s-" or "\\s " or "[[:space:]]" so that the regexp engine will
> >         handle "foreign" islands and gaps between chained islands as whitespace.

> I understand the motivation (you explain it further on).  But this
> hardcoding of what can constitute a "whitespace-matching" pattern
> seems a bit rigid.  No way to flexibly allow for different meanings
> of whitespace here?  What if some code wants to handle \n or \t or
> \f etc. differently, or to even treat some set of (normally
> non-whitespace) chars as if they too were whitespace for island
> purposes?

This is a good point.  Maybe it would be better to match an island or
the gap between two chained islands with any regexp element which
matches the space (the good old 0x20 character).

> >   o - A @dfn{chain} of islands is a canonically ordered chain of islands in
> >       a single buffer.

> Why limit it necessarily to a single buffer?  It is common to
> want to do things (search etc.) across multiple buffers, and
> sometimes regardless of mode.  That doesn't diminish just
> because one might want to use chains of non-contiguous text
> zones.

Why limit it?  A buffer is a natural unit of editing.  The island chain
concept is primarily to allow different regions of a buffer to have
different major modes, whilst minimising ugly workarounds, artificial
restrictions, and so on.

> I'm pretty sure I would want to be able do things throughout
> a chain that spans different buffers.  If it were I, I would
> think about defining all that you are doing using a structure
> that is multi-buffer.

I don't envisage that the island chains will really be that useful for
(user initiated) searching, etc.  The idea is that, to the user, such a
buffer will look much like it already does, except that the font locking
will be appropriate for each island, the major mode key map will be
right for each island, and so on.

> [That is what I did for zones.el, for instance - sets of such
> text zones are delimited by markers, which automatically record
> the buffer they pertain to.  And they can be persistent, as well.
> Have you considered the possibility of persisting island chains?]

> And I would probably want user-level operations, to combine
> chains (append, intersect, union/coalesce, difference). 
> And why not be able to do that for chains that cross buffers?

The chains will be disjoint, so intersection/difference wouldn't be
useful.  Given that the essential feature of a chain is its major mode,
it wouldn't make sense to combine chains (which will usually have
different major modes).  I'm still trying to think through the idea of a
chain having islands in several buffers.

> Being able to add (e.g. append) a chain in one buffer to a chain
> in another buffer is one simple example.  Anything you might want
> to do with one chain you will likely want to be able to do with
> a set of chains, or at least with a chain that results from
> composing a set of chains in various ways.

> Also, I'm guessing/hoping, but I'm not sure I saw this explicitly,
> that you can have multiple chains (e.g. in the same buffer) that
> use the same major mode.

Indeed, yes.

> Being associated with a major mode is only one possible attribute of a
> chain - it is not required, and other attributes and uses of a chain
> are not dependent on it, right?  IOW, it is not necessary to think of
> chains as mode-related - that is just one (albeit common) use &
> interpretation, right?

Not right, sorry.  The major mode is an essential attribute of an island
chain.  There will be a slot for it in the structure which holds chain
data, just as there is currently a slot for it in the (C) buffer
structure.  There will likewise be slots for the syntax table, major
mode key map, and so on.  None of these slots would work well with a
null value.

> >   o - An island will be delimited in two complementary ways:
> >     * - It will be enclosed syntactically by characters with
> >       "open island" and "close island" syntax (see section (v)).
> >       Both of these syntactic markers will include a flag "chain"
> >       indicating whether there is a previous/next island in the
> >       chain.  The cdr of the syntax value will be
> >       the island chain to which the island belongs.
> >     * - It will be covered by the text property `island', whose
> >       value will be the pertinent island or island chain

> Are both always required, or is either sufficient for most
> purposes?

Both are required, yes.  They will both be used.

> Is the syntax one needed only when you need to take advantage of it?
> Can you do most things using either, so that a given operation (that
> is not specific to only one of them, e.g. not specific to syntax) can
> be done regardless of which is available?

Primarily, the text property is to allow the chain local variable
mechanism quickly to find the correct chain for accessing the variables
from.  There is a worry that the extra cost of accessing this text
property may slow Emacs down excessively.  There will probably have to
be some sort of cacheing of the current island.

> I'm thinking that in many contexts I would not care about
> delimiting by syntax, and I might not even care about
> associating a given chain with a mode.  Would I be able to
> use such chains nevertheless (e.g. search/replace across them)?

I'm not sure this island mechanism is the right tool for doing what
you're suggesting.  For searching/replacing at the user level, some
extra option meaning "only in the current chain" would need to be added
to the user interface.

> >       Note that if islands are enclosed inside other islands,

> Maybe you can elaborate on overlapping islands and chains? 
> What caveats or use cases do you see?

Islands would not be permitted (not sure how at this stage) to "overlap"
eachother.  Two islands must either be disjoint, or one completely
contain the other.  The major mode for any position would be that of the
"innermost" current island.

> A priori, I would like to have a chain data structure, and
> as much of the rest of the features as possible, be available
> and manipulable from Lisp.  Something like this has lots of
> enhancement possibilities and use cases that we are unlikely
> to imagine at the outset.  Implementing more than an absolute
> minimum in C hampers that exploration and improvement.

One idea would be to implement a chain feature, one of whose uses would
be the major mode islands I've been trying to specify.  A significant
part of this would have to be implemented at the C level for speed -
chain local variables are already going to be slower to access than
buffer local variables.  We must keep that difference to a minimum.

> HTH.  I don't claim to have grasped all of what you envisage.
> It's great food for thought, in any case.

> (I asked a couple of times, in the bug thread(s) and here,
> for just this sort of top-level picture of what was envisaged.
> I gave up hoping that someone might actually make clear what
> the question/project/plan is.  This is a welcome, if unexpected,
> development.)

Thanks!

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-21 12:43   ` Alan Mackenzie
@ 2016-04-21 14:24     ` Stefan Monnier
  2016-04-23  2:20       ` zhanghj
  2016-04-23 22:36       ` Dmitry Gutov
  2016-04-21 16:05     ` Drew Adams
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 45+ messages in thread
From: Stefan Monnier @ 2016-04-21 14:24 UTC (permalink / raw)
  To: emacs-devel

I haven't kept up with this discussion, but I think it'd worthwhile
taking a look at what things like SublimeText do for syntax
highlighting, because it's a lot more powerful than what font-lock does
(IOW it lets you define contexts and is hence closer to a parser whereas
font-lock is closer to a lexer), and it might be an interesting starting
point for multiple major modes.

I think font-lock is old and deserves a replacement.

        Stefan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-21 14:24     ` Stefan Monnier
@ 2016-04-23  2:20       ` zhanghj
  2016-04-23 22:36       ` Dmitry Gutov
  1 sibling, 0 replies; 45+ messages in thread
From: zhanghj @ 2016-04-23  2:20 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: netjune, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> I haven't kept up with this discussion, but I think it'd worthwhile
> taking a look at what things like SublimeText do for syntax
> highlighting, because it's a lot more powerful than what font-lock does
> (IOW it lets you define contexts and is hence closer to a parser whereas
> font-lock is closer to a lexer), and it might be an interesting starting
> point for multiple major modes.
>
> I think font-lock is old and deserves a replacement.
>
>
>         Stefan

Yes. It can also do symbol indexing (like imenu in emacs, or ctags
tools) based on the syntax files.





^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-21 14:24     ` Stefan Monnier
  2016-04-23  2:20       ` zhanghj
@ 2016-04-23 22:36       ` Dmitry Gutov
  1 sibling, 0 replies; 45+ messages in thread
From: Dmitry Gutov @ 2016-04-23 22:36 UTC (permalink / raw)
  To: Stefan Monnier, emacs-devel

On 04/21/2016 05:24 PM, Stefan Monnier wrote:
> I haven't kept up with this discussion, but I think it'd worthwhile
> taking a look at what things like SublimeText do for syntax
> highlighting, because it's a lot more powerful than what font-lock does
> (IOW it lets you define contexts and is hence closer to a parser whereas
> font-lock is closer to a lexer), and it might be an interesting starting
> point for multiple major modes.

Indeed. If anyone's interested, here's some documentation: 
https://www.sublimetext.com/docs/3/syntax.html

Apparently, Sublime, TextMate, Atom and even Vim all use this or similar 
approaches. If it were somehow adopted for Emacs (plenty of details 
would need to be worked out), it would allow describing more complex 
grammars, and e.g. support a related Ruby feature that I'm having 
difficulty implementing right now.

Further, if it provides a different mechanism of syntactic parsing, it 
could be an alternative to using islands to make parse-partial-sexp skip 
over "foreign" regions. Although, unless we're going to change how we 
write indentation code, we'd still need to be able to compute the 
current paren nesting. Ultimately, the new way of defining a grammar 
could also be a way to define and apply "island" boundaries 
automatically, without the need for third-party code.

Where it's less likely to help, though, is with being able to combine 
and reuse settings and code from multiple major modes in one file. For 
anything like that to happen, the syntax definitions would have to be 
using a format that's highly composable, at least. I'm not sure I'm 
seeing that in any of the current grammars in the aforementioned 
editors. And the current way to combine the functionality from different 
languages is to call different major mode functions and switch between 
sets of buffer-local variables. Not sure what's the alternative for that.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: A vision for multiple major modes: some design notes
  2016-04-21 12:43   ` Alan Mackenzie
  2016-04-21 14:24     ` Stefan Monnier
@ 2016-04-21 16:05     ` Drew Adams
  2016-04-21 16:31       ` Eli Zaretskii
       [not found]     ` <<64f1d39a-dfd0-44ca-86c1-b4d6104b5702@default>
       [not found]     ` <<<64f1d39a-dfd0-44ca-86c1-b4d6104b5702@default>
  3 siblings, 1 reply; 45+ messages in thread
From: Drew Adams @ 2016-04-21 16:05 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

> This is a good point.  Maybe it would be better to match an island or
> the gap between two chained islands with any regexp element which
> matches the space (the good old 0x20 character).

See also Eli's feedback about this.  I think I agree with him
that trying to repurpose whitespace matching for this is maybe
the best approach.  A separate matching should perhaps be used -
nothing to do with whitespace per se, even if the matching used
might take whitespace (also) into account.

> > I'm pretty sure I would want to be able do things throughout
> > a chain that spans different buffers.  If it were I, I would
> > think about defining all that you are doing using a structure
> > that is multi-buffer.
>
> I don't envisage that the island chains will really be that useful for
> (user initiated) searching, etc.  The idea is that, to the user, such a
> buffer will look much like it already does, except that the font locking
> will be appropriate for each island, the major mode key map will be
> right for each island, and so on.

I see it differently.  I think you see it that way because for
you the major mode thing is an essential part of the feature
you want to implement - it is primary.  To me, chains of islands
should be the primary, and a very general, thing, and one
(important) use of them would be to apply a mode to them
("multi-modes").

IOW, I see (lots of) possible uses for chains of islands that
go beyond (i.e., do not necessarily involve) the application
of a particular mode to them.  And in the general case I see
no reason to limit chains to a single buffer.

That doesn't mean that there wouldn't be important cases that
do limit the use to either (a) applying a given major mode or
(b) a single buffer.  I just don't see why we would build
such limits into the design (i.e., hardcoded, making it hard
to extend to either (a) mode-agnostic or (b) multi-buffer). 

> > [That is what I did for zones.el, for instance - sets of such
> > text zones are delimited by markers, which automatically record
> > the buffer they pertain to.  And they can be persistent, as well.
> > Have you considered the possibility of persisting island chains?]

Persistence?

> > And I would probably want user-level operations, to combine
> > chains (append, intersect, union/coalesce, difference).
> > And why not be able to do that for chains that cross buffers?
> 
> The chains will be disjoint, so intersection/difference wouldn't be
> useful.

I understand that the islands in a chain would be disjoint.
But why would chains necessarily be disjoint?  Why shouldn't
chains be independent (at least be able to be independent)?
Why would defining one chain impose limits on defining other
chains (any new chains would need to be disjoint from existing
ones)?

See above, regarding the utility of being able to ignore a
chain's mode for certain operations (and the ability for a
chain to not even have an associated mode).  I suspect that
you are not seeing the use cases I am, which involve doing
all kinds of things to/with the text in a chain of islands.

As Eli suggested, think of a chain of islands as an extension
of narrowing.  Now think of the many different kinds of things
you (or code) do to a narrowed region.  This should be a more
general feature, I think, than what is available in something
like MuMaMo or mmm.  "Multi-modes" is a subcase.

Again, I see a chain of (ordered) text regions as the primary,
general feature, and the mapping (restriction) of a major mode
to such a chain as a subsidiary feature.

> Given that the essential feature of a chain is its major mode,

That is where we differ, and that explains, I think, the
narrower focus you have.  I wouldn't limit the feature to
being coupled to a mode.  That should be a possibility but
not a requirement.

> it wouldn't make sense to combine chains (which will usually
> have different major modes).

It would make sense, depending on what kind of operation you
wanted to apply to the text in chains.  And chains with the
same mode could also be combined, whether in the same buffer
or not.

> I'm still trying to think through the idea of a
> chain having islands in several buffers.

Think of the chains first as just buffer narrowings that
are multi-region, i.e., ignoring all the syntax and
major-mode features that you are thinking about.  (You
can still think of those, but they come in at a different
level - a specific subfeature or set of use cases.)

> > Being able to add (e.g. append) a chain in one buffer to a chain
> > in another buffer is one simple example.  Anything you might want
> > to do with one chain you will likely want to be able to do with
> > a set of chains, or at least with a chain that results from
> > composing a set of chains in various ways.
> 
> > Also, I'm guessing/hoping, but I'm not sure I saw this explicitly,
> > that you can have multiple chains (e.g. in the same buffer) that
> > use the same major mode.
> 
> Indeed, yes.
> 
> > Being associated with a major mode is only one possible attribute of a
> > chain - it is not required, and other attributes and uses of a chain
> > are not dependent on it, right?  IOW, it is not necessary to think of
> > chains as mode-related - that is just one (albeit common) use &
> > interpretation, right?
> 
> Not right, sorry.  The major mode is an essential attribute of an
> island chain.

Why?  What's necessarily essential about it?  That's a design
choice, no?  Would you consider dropping it as a requirement
and keeping it as an option (for any given chain)?

> There will be a slot for it in the structure which holds chain
> data, just as there is currently a slot for it in the (C) buffer
> structure.

Must the slot be filled?  Always?  (Why?)

> There will likewise be slots for the syntax table, major
> mode key map, and so on.  None of these slots would work well with a
> null value.

Why not optional?  Of course if such a slot is not used then
it, and anything that depends on it, would not "work well".
But that should not prevent other, non-mode-related uses of
a chain from working OK.

> > >   o - An island will be delimited in two complementary ways:
> > >     * - It will be enclosed syntactically by characters with
> > >       "open island" and "close island" syntax (see section (v)).
> > >       Both of these syntactic markers will include a flag "chain"
> > >       indicating whether there is a previous/next island in the
> > >       chain.  The cdr of the syntax value will be
> > >       the island chain to which the island belongs.
> > >     * - It will be covered by the text property `island', whose
> > >       value will be the pertinent island or island chain
> 
> > Are both always required, or is either sufficient for most
> > purposes?
> 
> Both are required, yes.  They will both be used.

Why required?  Why can't the design tolerate not having
syntax-based delimiting?

I would prefer to see what you're envisaging placed within
the context of a more general feature.  I see 3 possible
levels, in fact:

1. Arbitrary sets of text zones.  Not necessarily ordered
   (e.g. by buffer position).  Not necessarily without
   overlap.

2. #1, but as chains: ordered, non-overlapping.

3. #2, but with an associated major mode per chain.
   This is essentially what you have in mind, I think.

For all 3 levels I can see use cases for chains that cross
buffers and use cases for chain-combining operations.

I can also imagine using some chain-local variables that
are not buffer-specific or mode-specific.  (You already
allow for that, IIUC.)

> > I'm thinking that in many contexts I would not care about
> > delimiting by syntax, and I might not even care about
> > associating a given chain with a mode.  Would I be able to
> > use such chains nevertheless (e.g. search/replace across them)?
> 
> I'm not sure this island mechanism is the right tool for doing what
> you're suggesting.

Depends on what it ends up being. ;-)

> For searching/replacing at the user level, some
> extra option meaning "only in the current chain" would need to be
> added to the user interface.

FWIW, I've done this for arbitrary sets of zones (including
across buffers).  The code is in `isearch-prop.el' (which
depends on `zones.el' for this feature).

Also, wrt "the current chain": You might want to look at
the zones.el code for the use of variables (which can be
buffer-local, but need not be) that hold sets of zones
(including sets that are "chains") - how users can create
them, choose among them, clone them, persist them, etc.

> > A priori, I would like to have a chain data structure, and
> > as much of the rest of the features as possible, be available
> > and manipulable from Lisp.  Something like this has lots of
> > enhancement possibilities and use cases that we are unlikely
> > to imagine at the outset.  Implementing more than an absolute
> > minimum in C hampers that exploration and improvement.
> 
> One idea would be to implement a chain feature, one of whose uses would
> be the major mode islands I've been trying to specify.

That's what I've been trying to suggest: chains of zones are
more general than the feature you've described.  That doesn't
take away from the importance of the use case you have in mind.

> A significant
> part of this would have to be implemented at the C level for speed -
> chain local variables are already going to be slower to access than
> buffer local variables.  We must keep that difference to a minimum.

I have no problem with stuff being in C for performance reasons.
When that is not critical, keeping stuff in Lisp is good.

Especially for a new and very general feature: let folks play
with it and experiment with new possibilities.  We can later
optimize any parts we like.

We should avoid doing that prematurely, as always - but
especially for Emacs, where Lisp enhancement by users is
really the name of the game.

Thanks again for opening this discussion and providing
a detailed first proposal.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-21 16:05     ` Drew Adams
@ 2016-04-21 16:31       ` Eli Zaretskii
  0 siblings, 0 replies; 45+ messages in thread
From: Eli Zaretskii @ 2016-04-21 16:31 UTC (permalink / raw)
  To: Drew Adams; +Cc: acm, emacs-devel

> Date: Thu, 21 Apr 2016 09:05:23 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: emacs-devel@gnu.org
> 
> I have no problem with stuff being in C for performance reasons.
> When that is not critical, keeping stuff in Lisp is good.
> 
> Especially for a new and very general feature: let folks play
> with it and experiment with new possibilities.  We can later
> optimize any parts we like.

The parts that affect redisplay must at least partially be in C,
because there's no existing infrastructure that I'm aware of that can
be piggy-backed to do this kind of stuff.



^ permalink raw reply	[flat|nested] 45+ messages in thread

[parent not found: <<64f1d39a-dfd0-44ca-86c1-b4d6104b5702@default>]

[parent not found: <<83oa926i0e.fsf@gnu.org>]

* RE: A vision for multiple major modes: some design notes
       [not found]       ` <<83oa926i0e.fsf@gnu.org>
@ 2016-04-21 16:59         ` Drew Adams
  2016-04-21 19:55           ` Eli Zaretskii
  0 siblings, 1 reply; 45+ messages in thread
From: Drew Adams @ 2016-04-21 16:59 UTC (permalink / raw)
  To: Eli Zaretskii, Drew Adams; +Cc: acm, emacs-devel

> > I have no problem with stuff being in C for performance reasons.
> > When that is not critical, keeping stuff in Lisp is good.
> >
> > Especially for a new and very general feature: let folks play
> > with it and experiment with new possibilities.  We can later
> > optimize any parts we like.
> 
> The parts that affect redisplay must at least partially be in C,
> because there's no existing infrastructure that I'm aware of that
> can be piggy-backed to do this kind of stuff.

Anything that must be in C must be in C, of course. ;-)

But just what does "the parts that affect redisplay" mean?
If we mean parts that need to do something particular wrt
redisplay, then yes, that makes sense.

If we mean also some parts that would just be faster if
done in C then maybe, or maybe not.

You mentioned earlier that redisplay needs to access
buffer-local variables as it moves through the buffer.
And you said that redisplay needs to get the right values
of such variables.

But for some island-chain operations, e.g. some that I'm
thinking of that do not care about the mode of a chain
or whether it even has a mode, I don't see why redisplay
would need to do anything special.

No, I don't claim to understand this.  I'll stick with
agreeing that if there is an effect from this feature on
redisplay, or if redisplay affects this feature somehow,
and if that means that some bits of the feature must be
implemented in C, that's fine.

I would just prefer that we not go overboard wrt a C
implementation, just because we can or because something
might be faster in C.

I'd just as soon have a general, open, and
easy-to-modify-&-extend feature at the outset, and
worry later about optimizing bits of it that are
important to optimize.  From my point of view, a feature
such as this opens new possibilities that are ripe for
exploration.  And that shouts, "Lisp, please!".

Anyway, not knowing anything about this part of things,
I'll shut up about C vs Lisp, at least for now.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-21 16:59         ` Drew Adams
@ 2016-04-21 19:55           ` Eli Zaretskii
  0 siblings, 0 replies; 45+ messages in thread
From: Eli Zaretskii @ 2016-04-21 19:55 UTC (permalink / raw)
  To: Drew Adams; +Cc: acm, emacs-devel

> Date: Thu, 21 Apr 2016 09:59:02 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: acm@muc.de, emacs-devel@gnu.org
> 
> But just what does "the parts that affect redisplay" mean?
> If we mean parts that need to do something particular wrt
> redisplay, then yes, that makes sense.

I mean the part that is needed for redisplay to behave in each island
according to user expectations.  For example, imagine that a mode that
is relevant to a certain island chain sets up face-remapping-alist in
some particular way -- when redisplay does its job, it repeatedly
consults this variable when it needs to compute faces.  I'm saying
that the part of the changes for this feature that affects redisplay
will have to arrange for recalculation of the value of
face-remapping-alist when the display engine gets to examining the
portion of buffer text that belongs to this island chain.  Since the
position where the display engine processes is not visible to Lisp,
this arrangement will have to be in C.  And similarly with any other
variable whose value the display engine accesses from its C code, like
standard-display-table, for example.

> You mentioned earlier that redisplay needs to access
> buffer-local variables as it moves through the buffer.
> And you said that redisplay needs to get the right values
> of such variables.
> 
> But for some island-chain operations, e.g. some that I'm
> thinking of that do not care about the mode of a chain
> or whether it even has a mode, I don't see why redisplay
> would need to do anything special.

This could be so in some particular use cases, but it's not so in
general.  Modes do affect the way text is displayed.  Besides, Alan
says that "most" buffer-local variables will become island-chain
local.  If we believe him, then your use cases you mention above are
lucky exceptions rather than the rule.

^ permalink raw reply	[flat|nested] 45+ messages in thread

[parent not found: <<<64f1d39a-dfd0-44ca-86c1-b4d6104b5702@default>]

[parent not found: <<<83oa926i0e.fsf@gnu.org>]

[parent not found: <<791d74d1-2b1d-4304-8e7e-d6c31af7aa41@default>]

[parent not found: <<83eg9y68jy.fsf@gnu.org>]

* RE: A vision for multiple major modes: some design notes
       [not found]           ` <<83eg9y68jy.fsf@gnu.org>
@ 2016-04-21 20:26             ` Drew Adams
  0 siblings, 0 replies; 45+ messages in thread
From: Drew Adams @ 2016-04-21 20:26 UTC (permalink / raw)
  To: Eli Zaretskii, Drew Adams; +Cc: acm, emacs-devel

> > But just what does "the parts that affect redisplay" mean?
> > If we mean parts that need to do something particular wrt
> > redisplay, then yes, that makes sense.
> 
> I mean the part that is needed for redisplay to behave in each island
> according to user expectations.  For example, imagine that a mode that
> is relevant to a certain island chain sets up face-remapping-alist in
> some particular way -- when redisplay does its job, it repeatedly
> consults this variable when it needs to compute faces.  I'm saying
> that the part of the changes for this feature that affects redisplay
> will have to arrange for recalculation of the value of
> face-remapping-alist when the display engine gets to examining the
> portion of buffer text that belongs to this island chain.  Since the
> position where the display engine processes is not visible to Lisp,
> this arrangement will have to be in C.  And similarly with any other
> variable whose value the display engine accesses from its C code, like
> standard-display-table, for example.

Thanks for the example.  That's the kind of thing I thought
you had in mind.

> > You mentioned earlier that redisplay needs to access
> > buffer-local variables as it moves through the buffer.
> > And you said that redisplay needs to get the right values
> > of such variables.
> >
> > But for some island-chain operations, e.g. some that I'm
> > thinking of that do not care about the mode of a chain
> > or whether it even has a mode, I don't see why redisplay
> > would need to do anything special.
> 
> This could be so in some particular use cases, but it's not
> so in general.

Depends on what one means by "in general". ;-) To me, having
a different mode associated with a chain is a special case of
either having such a mode or not having one.  Likewise, for
having chain-local variables or not.  Both having and not
having are special cases of "in general".

> Modes do affect the way text is displayed.

Yes.  But if a chain does not use a mode that is different
from the buffer's mode, then there should be no special
mode-specific handling needed for it.

> Besides, Alan says that "most" buffer-local variables will
> become island-chain local.  If we believe him, then your
> use cases you mention above are lucky exceptions rather
> than the rule.

I don't see them as either lucky exceptions or the rule.
I imagine that there are lots of possible uses of a chain
of islands of text, some of which involve a different mode
or in some other way involve different display possibilities,
and some of which do not.

From the point of view of C code (e.g. redisplay) modification,
the latter use cases would I guess be lucky (little or nothing
new to do).  That doesn't mean they would be exceptional (rare)
in terms of user use cases.  (Dunno know whether they would be.)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-20 19:44 A vision for multiple major modes: some design notes Alan Mackenzie
  2016-04-20 21:06 ` Drew Adams
@ 2016-04-20 22:27 ` Phillip Lord
  2016-04-21  9:14   ` Alan Mackenzie
  2016-04-21 14:17 ` Eli Zaretskii
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 45+ messages in thread
From: Phillip Lord @ 2016-04-20 22:27 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Dmitry Gutov, emacs-devel


A few comments, rather than an in-depth analysis, am afraid.

Alan Mackenzie <acm@muc.de> writes:
> (iv) Islands.
>   o - An island will be delimited in two complementary ways:
>     * - It will be enclosed syntactically by characters with "open island" and
>       "close island" syntax (see section (v)).  Both of these syntactic
>       markers will include a flag "chain" indicating whether there is a
>       previous/next island in the chain.  The cdr of the syntax value will be
>       the island chain to which the island belongs.
>     * - It will be covered by the text property `island', whose value will be
>       the pertinent island or island chain (see section (ii)) (not yet
>       decided).  Note that if islands are enclosed inside other islands, the
>       value is the innermost island.  There is the possibility of using an
>       interval tree independent of the one for text properties to increase
>       performance.

When you say "complementary" do you mean alternative or simultaneous?
I.e. will an island always be enclosed by syntax markers and always have
a text property. Or can it have either?

I'm still not understanding how the chain of islands is set up. Is this
entirely the super modes responsibility? The use of "syntax" suggests
that the islands can be detected *purely* syntactically. But, there are
many places where this is not true: consider org-mode:

#+begin_src emacs-lisp
(message "hello world")
#+end_src

We cannot assume that "+end_src" is the end of a island.

Also, how will the regexp engine work when it spans an island? I ask
because, if we use the regexp engine to match delimiters, the which
syntax do we use, if there are multiple modes in the buffer.


>   o - An island might be represented by a C or Lisp structure, it might not
>     (not yet decided).  This structure would hold the containing chain,
>     markers pointing to the start and end of the chain, and the previous and
>     next islands in the chain.
>
> (v) Syntax, etc.
>   o - Two new syntax classes, "open island" and "close island" will be
>     introduced.  These will be designated by the characters "{" and "}".  Their
>     "matching character" slots will contain the island's chain.  There will be
>     an extra flag "chain" (denoted by "i") indicating whether there is a
>     previous/next island in the chain.
>   o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as
>     whitespace, much as they do comments.  They will also treat as whitespace
>     the gap between two islands in a chain.

Difficult to say, but this might produce some counter intuitive
behaviour. So, for example, consider some text like so:

=== Example

(here is some lisp)


;; This is a long and tedious piece of documentation in my lisp program.
(here is some more lisp)


=== End Example

Now moving backward a paragraph will have a significant difference in
behaviour -- on the "(" of "here is some more lisp", we move to "(here
is some lisp), while on the char before, we move the "This is a long".
Good, bad, expected? Don't know.



>   o - The (currently 11 element) parser state will be enhanced to support
>     islands as follows:
>     * - A twelfth element will be introduced.  This will contain an
>       association list whose elements will have the form (island-chain
>       . 12-element parse state); each element will contain the suspended state
>       of parsing in the island chain which is the car of the element.  An
>       element with a car of nil will represent the suspended parsing state of
>       the buffer outside of islands.
>     * - Elements 12, 13, .... will be island chains of the enclosing islands,
>       elt 12 being that of the innermost enclosing island, etc.  An element
>       with a value of nil indicates being outside all islands.
>   o - `parse-partial-sexp' will create and use an enhanced parser state as
>     described above.  Note that a two character construct (such as a C comment
>     opener) can not enclose an island, and special handling will be required
>     to exclude this.  The syntax table in use will change as the current
>     position passes between islands.
>   o - `syntax-ppss' will do the right thing with the extended parser state.
>     Alternatively, `syntax-ppss' will have an independent 12-element state in
>     each island chain, where elt. 11 is always nil.  Its cache mechanism will
>     be enhanced such that buffer changes outside of an island chain need not
>     invalidate the stored cache pertaining to the chain.
>   o - The facilities in this section are active even when `in-islands' is
>     nil.
>
> (vi) Regexps.
>   o - The regexp engine will be enhanced such that the regexps "\\s-", "\\s ",
>     and "[[:space:]] will match an entire island.
>   o - The gap between two islands in a chain will also be matched by the above
>     regexps.
>   o - This treatment of an island, and a gap between two islands, as WS will
>     occur only when `in-islands' is non-nil.
>   o - When `in-islands' is nil, there will be no reliable way of scanning over
>     an island by regexps, since it is a potentially nested structure, and FSMs
>     don't recognise arbitrarily nested structures.
>
> (vii) Variables.
>   o - Island chain local variable bindings will come into existence.  These
>     bindings depend on the island point is in.  There will be lower level
>     routines that will have "position" parameters as an alternative to using
>     point.
>   o - All variables which are currently buffer local will become chain local
>     except for those whose symbols are given a non-nil `entire-buffer'
>     property.  There will be no new functions like
>     `make-chain-local-variable'.

What is the default-value of a chain local variable, if the variable is
also buffer-local?

Will we need functions for setting all chains in a certain mode in a
single buffer?


Phil



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-20 22:27 ` Phillip Lord
@ 2016-04-21  9:14   ` Alan Mackenzie
  2016-04-22 12:45     ` Phillip Lord
  0 siblings, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2016-04-21  9:14 UTC (permalink / raw)
  To: Phillip Lord; +Cc: emacs-devel, Dmitry Gutov

Hello, Phillip

On Wed, Apr 20, 2016 at 11:27:34PM +0100, Phillip Lord wrote:

> A few comments, rather than an in-depth analysis, am afraid.

Thanks!

> Alan Mackenzie <acm@muc.de> writes:
> > (iv) Islands.
> >   o - An island will be delimited in two complementary ways:
> >     * - It will be enclosed syntactically by characters with "open island" and
> >       "close island" syntax (see section (v)).  Both of these syntactic
> >       markers will include a flag "chain" indicating whether there is a
> >       previous/next island in the chain.  The cdr of the syntax value will be
> >       the island chain to which the island belongs.
> >     * - It will be covered by the text property `island', whose value will be
> >       the pertinent island or island chain (see section (ii)) (not yet
> >       decided).  Note that if islands are enclosed inside other islands, the
> >       value is the innermost island.  There is the possibility of using an
> >       interval tree independent of the one for text properties to increase
> >       performance.

> When you say "complementary" do you mean alternative or simultaneous?
> I.e. will an island always be enclosed by syntax markers and always have
> a text property. Or can it have either?

Sorry, that wasn't very clear.  It would always have both.  The text
property would enable the code for chain local variables quickly to
determine the current chain.  The syntactic markers would enable
efficient scanning by parse-partial-sexp, etc.

> I'm still not understanding how the chain of islands is set up. Is this
> entirely the super modes responsibility?

Yes, it would be entirely for the super mode to do.  There would be a
set of functions to do this, for example:

    (defun create-island-chain (beg end major-mode ...) ...)  (where BEG
    and END would be the bounds of the first island in the chain).

    (defun add-island-to-chain (chain beg end ...) ...)  (which would
    delimit (BEG END) as an island, and link it into CHAIN)

There would also be functions for removing islands from a chain, etc.  I
should really have put this into the notes.  Thanks!

> The use of "syntax" suggests that the islands can be detected *purely*
> syntactically.

No.  It would be up to the super mode to determine them (however is
appropriate), then to call, e.g., `create-island-chain' and
`add-island-to-chain'.

> But, there are many places where this is not true: consider org-mode:

> #+begin_src emacs-lisp
> (message "hello world")
> #+end_src

> We cannot assume that "+end_src" is the end of a island.

> Also, how will the regexp engine work when it spans an island? I ask
> because, if we use the regexp engine to match delimiters, the which
> syntax do we use, if there are multiple modes in the buffer.

I imagine that the island-start/end syntactic markers would normally be
set by the super mode as syntax-table text properties.  These always
take priority over whatever the current syntax table would say.  These
markers would be considered to be in the enclosing scope, not part of
the island they define.

The current syntax table would always be that of the island the current
position was in.  I suppose there is potential for an island to be
recognised as such in the "enclosing scope", but not in the island
itself.  This could be mitigated against by warning super mode
programmers to use island-start/end syntaxes ONLY in syntax-table text
properties.

The actual matching of an island to "\\s-" would be delegated to the
syntax code (as is currently done for "\\s?" expressions).

> >   o - An island might be represented by a C or Lisp structure, it might not
> >     (not yet decided).  This structure would hold the containing chain,
> >     markers pointing to the start and end of the chain, and the previous and
> >     next islands in the chain.
> >
> > (v) Syntax, etc.
> >   o - Two new syntax classes, "open island" and "close island" will be
> >     introduced.  These will be designated by the characters "{" and "}".  Their
> >     "matching character" slots will contain the island's chain.  There will be
> >     an extra flag "chain" (denoted by "i") indicating whether there is a
> >     previous/next island in the chain.
> >   o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as
> >     whitespace, much as they do comments.  They will also treat as whitespace
> >     the gap between two islands in a chain.

> Difficult to say, but this might produce some counter intuitive
> behaviour. So, for example, consider some text like so:

> === Example

> (here is some lisp)

> ;; This is a long and tedious piece of documentation in my lisp program.
> (here is some more lisp)

> === End Example

> Now moving backward a paragraph will have a significant difference in
> behaviour -- on the "(" of "here is some more lisp", we move to "(here
> is some lisp), while on the char before, we move the "This is a long".
> Good, bad, expected? Don't know.

Assuming that the comment is set up as an island inside the lisp code
(which might not be the Right Thing to do) ....

As a user action, moving back that paragraph would move from "(here is
some more lisp)" to ";; This is a long ....", since `in-islands' would
be nil during command processing.

As part of a program's parsing, `in-islands' would be bound to non-nil,
and backward-paragraph would move from "(here is some more lisp)" to
"(here is some lisp)".

This is the intended processing.

[ .... ]

> > (vii) Variables.
> >   o - Island chain local variable bindings will come into existence.  These
> >     bindings depend on the island point is in.  There will be lower level
> >     routines that will have "position" parameters as an alternative to using
> >     point.
> >   o - All variables which are currently buffer local will become chain local
> >     except for those whose symbols are given a non-nil `entire-buffer'
> >     property.  There will be no new functions like
> >     `make-chain-local-variable'.

> What is the default-value of a chain local variable, if the variable is
> also buffer-local?

This would be the (global) default value of the variable.  It would not
be the buffer-local value.  The intention is that the buffer-local value
is the value for the portions of the buffer which are not in any
islands.

> Will we need functions for setting all chains in a certain mode in a
> single buffer?

I'm not sure what you mean, here.  Does "in a certain mode" mean "INTO a
certain mode"?  If so, setting a major or minor mode in a chain will be
able to be done by putting point inside a pertinent island and calling
the mode function.  Maybe a new function `mapchains' could be useful.

> Phil

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-21  9:14   ` Alan Mackenzie
@ 2016-04-22 12:45     ` Phillip Lord
  0 siblings, 0 replies; 45+ messages in thread
From: Phillip Lord @ 2016-04-22 12:45 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel, Dmitry Gutov

Alan Mackenzie <acm@muc.de> writes:
>> When you say "complementary" do you mean alternative or simultaneous?
>> I.e. will an island always be enclosed by syntax markers and always have
>> a text property. Or can it have either?
>
> Sorry, that wasn't very clear.  It would always have both.  The text
> property would enable the code for chain local variables quickly to
> determine the current chain.  The syntactic markers would enable
> efficient scanning by parse-partial-sexp, etc.
>
>> I'm still not understanding how the chain of islands is set up. Is this
>> entirely the super modes responsibility?
>
> Yes, it would be entirely for the super mode to do.  There would be a
> set of functions to do this, for example:
>
>     (defun create-island-chain (beg end major-mode ...) ...)  (where BEG
>     and END would be the bounds of the first island in the chain).
>
>     (defun add-island-to-chain (chain beg end ...) ...)  (which would
>     delimit (BEG END) as an island, and link it into CHAIN)
>
> There would also be functions for removing islands from a chain, etc.  I
> should really have put this into the notes.  Thanks!


I think that you would need some good utility functions, and call backs
to support this though. Say, I have a mode with some syntactic markers
identifing islands. Who has the job of checking that the islands are
still the same?

I had this problem with "lentic" -- and it's hard work. You need to hook
into the various change functions, and sometimes rescan the entire
buffer. Using the change functions is a PITA anyway, and easy to get
wrong. And, avoiding scanning the whole buffer after every change is
good to avoid.

Font-lock avoids this, for instance, by getting core Emacs to tell each
mode when to re-fontifify different regions. I think you would need
something similar.




>> Also, how will the regexp engine work when it spans an island? I ask
>> because, if we use the regexp engine to match delimiters, the which
>> syntax do we use, if there are multiple modes in the buffer.
>
> I imagine that the island-start/end syntactic markers would normally be
> set by the super mode as syntax-table text properties.  These always
> take priority over whatever the current syntax table would say.  These
> markers would be considered to be in the enclosing scope, not part of
> the island they define.
>
> The current syntax table would always be that of the island the current
> position was in.  I suppose there is potential for an island to be
> recognised as such in the "enclosing scope", but not in the island
> itself.  This could be mitigated against by warning super mode
> programmers to use island-start/end syntaxes ONLY in syntax-table text
> properties.
>
> The actual matching of an island to "\\s-" would be delegated to the
> syntax code (as is currently done for "\\s?" expressions).

I am worried about syntax codes in general. Say, we have a syntax like

#+begin_src lisp

#+end_src


Whether "_" is a symbol constituent or not is mode specific. Say, we
have a buffer with mixed org-mode and lisp. The regexp we need to
identify #+end_src will depend on the mode of the buffer that #+end_src
is in. That is the point of course.

But, if you are using #+end_src to delineate islands in the first place,
then what mode the text is in rather indeterminate -- you cannot
guarantee that the islands are in the correct place yet, because this is
why you are looking for #+end_src markers. So, you have to build a
regexp which does not use char classes which differ between modes.

For this to work, I think, you need to be able to say to regexp
functions "ignore islands". Binding "in-islands" to nil might work I
guess.




>
>> >   o - An island might be represented by a C or Lisp structure, it might not
>> >     (not yet decided).  This structure would hold the containing chain,
>> >     markers pointing to the start and end of the chain, and the previous and
>> >     next islands in the chain.
>> >
>> > (v) Syntax, etc.
>> >   o - Two new syntax classes, "open island" and "close island" will be
>> >     introduced.  These will be designated by the characters "{" and "}".  Their
>> >     "matching character" slots will contain the island's chain.  There will be
>> >     an extra flag "chain" (denoted by "i") indicating whether there is a
>> >     previous/next island in the chain.
>> >   o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as
>> >     whitespace, much as they do comments.  They will also treat as whitespace
>> >     the gap between two islands in a chain.
>
>> Difficult to say, but this might produce some counter intuitive
>> behaviour. So, for example, consider some text like so:
>
>> === Example
>
>> (here is some lisp)
>
>
>> ;; This is a long and tedious piece of documentation in my lisp program.
>> (here is some more lisp)
>
>
>> === End Example
>
>> Now moving backward a paragraph will have a significant difference in
>> behaviour -- on the "(" of "here is some more lisp", we move to "(here
>> is some lisp), while on the char before, we move the "This is a long".
>> Good, bad, expected? Don't know.
>
> Assuming that the comment is set up as an island inside the lisp code
> (which might not be the Right Thing to do) ....
>
> As a user action, moving back that paragraph would move from "(here is
> some more lisp)" to ";; This is a long ....", since `in-islands' would
> be nil during command processing.
>
> As part of a program's parsing, `in-islands' would be bound to non-nil,
> and backward-paragraph would move from "(here is some more lisp)" to
> "(here is some lisp)".
>
> This is the intended processing.
>
> [ .... ]
>
>> > (vii) Variables.
>> >   o - Island chain local variable bindings will come into existence.  These
>> >     bindings depend on the island point is in.  There will be lower level
>> >     routines that will have "position" parameters as an alternative to using
>> >     point.
>> >   o - All variables which are currently buffer local will become chain local
>> >     except for those whose symbols are given a non-nil `entire-buffer'
>> >     property.  There will be no new functions like
>> >     `make-chain-local-variable'.
>
>> What is the default-value of a chain local variable, if the variable is
>> also buffer-local?
>
> This would be the (global) default value of the variable.  It would not
> be the buffer-local value.  The intention is that the buffer-local value
> is the value for the portions of the buffer which are not in any
> islands.
>
>> Will we need functions for setting all chains in a certain mode in a
>> single buffer?
>
> I'm not sure what you mean, here.  Does "in a certain mode" mean "INTO a
> certain mode"?


Oh. Say I have a buffer, half in clojure mode, half in markdown mode. I
start cider which connects to a REPL. Currently, cider sets a
buffer-local variable called something like "cider-connected-to-repl-p"
to "t" to indicate the connection.

But, now, we have an island local variable instead. But, surely, if one
island is connected to a repl, then all the others should be as well.




> If so, setting a major or minor mode in a chain will be
> able to be done by putting point inside a pertinent island and calling
> the mode function.  Maybe a new function `mapchains' could be useful.


Yep, that sort of idea.

Phil



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-20 19:44 A vision for multiple major modes: some design notes Alan Mackenzie
  2016-04-20 21:06 ` Drew Adams
  2016-04-20 22:27 ` Phillip Lord
@ 2016-04-21 14:17 ` Eli Zaretskii
  2016-04-21 21:33   ` Alan Mackenzie
  2016-04-21 22:19   ` Alan Mackenzie
  2016-04-22 14:33 ` Dmitry Gutov
  2016-04-22 18:58 ` Richard Stallman
  4 siblings, 2 replies; 45+ messages in thread
From: Eli Zaretskii @ 2016-04-21 14:17 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: dgutov, emacs-devel

> Date: Wed, 20 Apr 2016 19:44:50 +0000
> From: Alan Mackenzie <acm@muc.de>
> 
> This post describes my notion of how multiple major modes {c,sh}ould be
> implemented.  Key notions are "islands", "island chains", and "chain
> local" variable bindings.

Thank you for publishing this.  A few comments and questions below.
Please keep in mind that I never had to write any Lisp that deals with
these issues, so apologies in advance for possibly silly questions and
misunderstandings.

>   o - To the user, the current major mode will be that of the island where
>     point is.  All familiar commands will work without restriction.

Does this mean the display of mode line, menu bar, and tool bar will
change accordingly?

A more subtle issue is with point movements that are not shown to the
user (those done by Lisp code of some command, before redisplay kicks
in) -- what will be the effect of those? do they trigger redisplay,
for example?

>   o - An island chain will have @dfn{chain local} variable bindings.  Such a
>     binding will become current and accessible when point is within one of the
>     chain's islands.  When point is not in an island, the buffer local binding
>     of the variable will be current.

Emacs sometimes examines buffer text without moving point, and we
generally expect for buffer-local bindings to be in effect regardless.
A prominent example is the display engine.  I will return to that
later.


>     * - [Island] will be covered by the text property `island', whose value will be
>       the pertinent island or island chain (see section (ii)) (not yet
>       decided).  Note that if islands are enclosed inside other islands, the
>       value is the innermost island.  There is the possibility of using an
>       interval tree independent of the one for text properties to increase
>       performance.

I don't understand the notion of "enclosed" islands: wouldn't such
"enclosing" simply break the "outer" island into two separate islands?

>   o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as
>     whitespace, much as they do comments.  They will also treat as whitespace
>     the gap between two islands in a chain.

Why whitespace? why not some new category?  By overloading whitespace,
you make things harder on the underlying infrastructure, like regexp
search and matching.

>   o - The regexp engine will be enhanced such that the regexps "\\s-", "\\s ",
>     and "[[:space:]] will match an entire island.

Extending [:space:] that way seems to be an implementation detail
leaking to user level.  I think we should avoid that at all costs.

>   o - The gap between two islands in a chain will also be matched by the above
>     regexps.
>   o - This treatment of an island, and a gap between two islands, as WS will
>     occur only when `in-islands' is non-nil.
>   o - When `in-islands' is nil, there will be no reliable way of scanning over
>     an island by regexps, since it is a potentially nested structure, and FSMs
>     don't recognise arbitrarily nested structures.

> (vii) Variables.
>   o - Island chain local variable bindings will come into existence.  These
>     bindings depend on the island point is in.  There will be lower level
>     routines that will have "position" parameters as an alternative to using
>     point.
>   o - All variables which are currently buffer local will become chain local
>     except for those whose symbols are given a non-nil `entire-buffer'
>     property.  There will be no new functions like
>     `make-chain-local-variable'.
>   o - When the `entire-buffer' property is nil, the buffer local binding of a
>     variable will hold the value pertinent to the areas of the buffer outside
>     of islands.  When that property is non-nil, the binding holds the value
>     for the entire buffer.
>   o - When `in-islands' is nil, the chain local mechanism described here is
>     not used - instead the familiar buffer local binding is used.
>   o - The current binding for a local variable will be the chain local binding
>     of the island chain of the island containing point.  If point is not in an
>     island, the buffer local binding is current.
>   o - If a chain local binding is current, and its value is unbound, the
>     binding of an enclosing scope is NOT used in its place.  Probably the
>     variable's default-value should be used when reading.
>   o - In buffer.h, a new macro CVAR ("island chain variable") analogous to
>     BVAR will be introduced.  It will use BVAR as a fall back.  Most
>     invocations of BVAR will be changed to CVAR.
>   o - In data.c, the mechanism for accessing local variable bindings
>     (e.g. `swap_in_symval_forwarding') will be enhanced to test `in-islands'
>     and handle chain local bindings appropriately.

I'm not sure I understand the details.  E.g., where will the
island-chain local values be stored?  To remind you, buffer-local
variables have a special object in their symbol value cell, and BVAR
only works for the few buffer-local variables that are stored in the
buffer object itself.  I'm not sure I understand how CVAR could solve
the problem you need to solve, which is keeping multiple chains per
buffer, each one with its values of these variables.

> (ix) Miscellaneous commands and functions.
>   o - `point-min' and `point-max' will, when `in-islands' is non-nil, return
>     the max/min point in the visible region in the same chain of islands as
>     point.
>   o - `search-\(forward\|backward\)\(-regexp\)?' will restrict themselves to
>     the current island chain when `in-islands' is non-nil.
>   o - `skip-\(chars\|syntax\)-\(forward\|backward\)' will likewise operate in
>     the current island chain (how?) when `in-islands' is non-nil.
>   o - `\(next\|previous\)-\(single\|char\)-property-change', etc., will do the
>     Right Thing in island chains when `in-islands' is non-nil.
>   o - New functions `island-min', `island-max', `island-chain-min' and
>     `island-chain-max' will do what their names say.
>   o - There will be no restrictions on the use of widening/narrowing, as have
>     been proposed for other support engines for multiple major modes.
>   o - New commands like `beginning-of-island', `narrow-to-island', etc. will
>     be wanted.  More difficultly, bindings for them will be needed.
>   o - ??? Other commands to be amended.

This actually sounds like a simple extension of narrowing, so I wonder
why do we need so many new object types and notions.

> (x) Emacs subsystems and `in-islands'.
>   o - Redisplay will bind `in-islands' to non-nil, but will successfully
>     display all islands wholly or partially in windows being displayed.
>   o - Font Lock will bind `in-islands' to non-nil, but will successfully
>     fontify all pertinent islands.
>   o - `island-before/after-change-function' will be called with `in-islands'
>     nil.
>   o - `before/after-change-functions' will be called with `in-islands' bound
>     to non-nil.
>   o - Major modes will need to bind `in-islands' to non-nil for such things as
>     indentation.
>   o - For normal user interaction, `in-islands' will be nil.

I don't see any discussion of how redisplay will deal with islands.
To remind you, redisplay moves through portions of the buffer, without
moving point, and access buffer-local variables for its job.  You need
to augment the design with something that will allow redisplay see the
correct values of variables depending on the buffer position it is at.
The same problem exists for any features that use display simulation
for making decisions about movement and layout, e.g. vertical-motion.

More generally, perhaps it will help if you publish the rationale for
at least the main points of this design, discussing possible
alternatives and explaining why you ended up with the one you present
as the design decision.  This could help us see the main issues that
are to be dealt with, and perhaps suggest better ways of dealing with
them.  Seeing just the final product of the design tends to limit the
discussions to low-level details, which could easily miss the broader
picture and issues.

Thanks.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-21 14:17 ` Eli Zaretskii
@ 2016-04-21 21:33   ` Alan Mackenzie
  2016-04-21 22:01     ` Drew Adams
                       ` (2 more replies)
  2016-04-21 22:19   ` Alan Mackenzie
  1 sibling, 3 replies; 45+ messages in thread
From: Alan Mackenzie @ 2016-04-21 21:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, dgutov

Hello, Eli.

I'll get a fuller reply to you later.  But for now....

On Thu, Apr 21, 2016 at 05:17:09PM +0300, Eli Zaretskii wrote:
> > Date: Wed, 20 Apr 2016 19:44:50 +0000
> > From: Alan Mackenzie <acm@muc.de>
> > 
> > This post describes my notion of how multiple major modes {c,sh}ould be
> > implemented.  Key notions are "islands", "island chains", and "chain
> > local" variable bindings.

> Thank you for publishing this.  A few comments and questions below.
> Please keep in mind that I never had to write any Lisp that deals with
> these issues, so apologies in advance for possibly silly questions and
> misunderstandings.

[ .... ]

> More generally, perhaps it will help if you publish the rationale for
> at least the main points of this design, discussing possible
> alternatives and explaining why you ended up with the one you present
> as the design decision.  This could help us see the main issues that
> are to be dealt with, and perhaps suggest better ways of dealing with
> them.  Seeing just the final product of the design tends to limit the
> discussions to low-level details, which could easily miss the broader
> picture and issues.

It would be nice if Emacs supported several major modes in a buffer, not
just by awkward workarounds, but fully and natively.  There's no magic
involved in the emergence of the design - it's basically a naive vision
of how things should be, given the current state of Emacs.

The essence of major mode support is buffer local variables.  (Things
like the syntax table and local key map are basically buffer local
variables, even though they are not accessible as such from Lisp.)  So,
at first sight, each "island" in the buffer needs its own set of "buffer
local" variables.

However, a set of variable bindings is a big overhead in terms of RAM,
so it would make sense, wherever possible, to share these bindings
between islands with the same major mode.  Furthermore, in some use
cases, there are sequences of islands which are in essence a single
stream of text.  It thus makes sense to have "chains of islands", all
islands in a chain sharing the "chain local" variable bindings.

There might be a need for actual "island local" variables, with a
separate value in each island.  However, Dmitry and I were unable to
identify any such variables in an earlier thread on emacs-devel.  If
any such variables became apparent, then would be the time to work out
how to implement them.

The parts of a buffer which are not in any island (we won't call these
"the ocean" ;-) also need their own variable bindings.  It seems to make
sense to use the standard buffer local bindings for these, since there
would otherwise be no use for them.  An alternative would be to construe
these regions as being islands in their own right, in their own island
chain.  However, that would fit badly with the syntactic delimiters for
islands (see below).

The above applies to most variables which are currently buffer local.
However, there are some such variables which are intrinsically to do
with the whole buffer, not individual islands within it.  These include
`buffer-undo-list', the mark, `mark-ring', .....  They must be marked as
belonging to the whole buffer, and handled as such, hence the
`entire-buffer' property applied to their symbols.

How do we implement chain local variable bindings?  Why not base them on
the implementation of buffer local bindings?  Some buffer local
variables are fixed slots in the struct buffer, the rest are elements in
an association list in the struct buffer.  Until there's a better idea,
we copy this scheme for chain local variables; the fixed slot variables,
currently accessed by the BVAR macro could instead get a somewhat more
involved macro called "CVAR" which will somehow use the current position
(whatever that means) to select the pertinent struct chain or the
familiar struct buffer.

Given a buffer position, we need to be able to find the corresponding
island chain.  "Obviously", we do this with a text property, which we
might as well call `island', or possibly `chain'.  Since successive
accesses to chain local variables are very likely to be in the same
chain most of the time, we will cache the "current" chain in buffer
local variables.

We want `parse-partial-sexp' and friends to work "properly" wrt islands.
It is immediately clear that the syntactic context of each island chain
is independent of other chains and of the regions outside islands.  It
is also clear that the syntactic context at the end of an island should
be preserved and used as the starting value at the start of the next
island in the same chain.  It thus seems sensible to introduce new
syntactic classes "open island" and "close island" to facilitate this.
Why not give them the characters "{" and "}", which are currently
unused?  This method of delimiting islands does, however, force us to
deal with nested islands.  Clearly, our parser state must be amended to
deal with these stacked and suspended states.

It is currently unclear whether `syntax-ppss' needs to return this
amended state, or whether the simple "state within the chain" would be
adequate.  It is clear that syntactic commands such as `forward-list'
(C-M-n) must confine their operation to a single island chain.

When it comes to movement and search primitives, we want to adapt these
so that the impact on existing major modes is minimised.  Ideally, we
would want major modes to "see" only their own islands (or lack
thereof).  Thus we treat irrelevant islands as blocks of whitespace.  It
seems to make sense to have such islands matched by subexpressions in
regexps which match spaces.  This would obviate the need to amend a
great number of regexps currently coded in major modes.

On the other hand, when a user does C-s or C-M-s, the Right Thing is
surely to search the buffer as a whole, without regard to islands.  We
therefore need a flag which instructs the primitives how to behave when
there are islands.  We might as well call this flag `in-islands', for
want of a better name.

The user will, from time to time, delete the delimiters which define
islands, and will insert other ones.  The super mode needs to be able to
react to these actions, amending its island chains appropriately.  I
have not been able to come up with an adequate scheme for this using
only before/after-change-functions.  These variables are going to be
chain local, and the buffer local values will hold functions for the
buffer regions not in islands.  So we introduce
`island-before/after-change-function', entire-buffer local variables,
each of which will hold a single function intended for adjusting island
chains.  Their return values will direct Emacs which islands need
`before/after-change-functions' invoking on them.

To minimise changes to major modes, quite a few primitives (such as
`skip-syntax-forward' and `next-single-property-change') will be amended
to restrict themselves to island chains when `in-islands' is bound to
non-nil.

Several Emacs subsystems will need enhancement, in particular redisplay
and font-lock.

Sorry this has turned out so long, so pedestrian, and so boring.  :-(
As promised, I have had no magic insights, no sparkling innovations in
drawing up these notes - just a sequence of humdrum decisions, one after
the other.  If I've missed out anything relevant, please say so, then I
can try and fill in the gap.

It's also clear that what I'm proposing can't be implemented in a couple
of weekends - it would be a long hard grind.  But it would enable super
modes to be written with comparative ease.

> Thanks.

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: A vision for multiple major modes: some design notes
  2016-04-21 21:33   ` Alan Mackenzie
@ 2016-04-21 22:01     ` Drew Adams
  2016-04-22  8:13       ` Alan Mackenzie
  2016-04-22  9:04     ` Eli Zaretskii
  2016-06-13 21:17     ` John Wiegley
  2 siblings, 1 reply; 45+ messages in thread
From: Drew Adams @ 2016-04-21 22:01 UTC (permalink / raw)
  To: Alan Mackenzie, Eli Zaretskii; +Cc: dgutov, emacs-devel

[More interesting details.  Thx.]

> Given a buffer position, we need to be able to find the corresponding
> island chain.  "Obviously", we do this with a text property, which we
> might as well call `island', or possibly `chain'.  Since successive
> accesses to chain local variables are very likely to be in the same
> chain most of the time, we will cache the "current" chain in buffer
> local variables.

I guess you are referring to the possibility of more than one
chain having an island at point, and wanting to pick up the right
one as the "current" chain - so you check a text property, which
identifies the chain that is currently active.  Is that right?

> When it comes to movement and search primitives, we want to adapt these
> so that the impact on existing major modes is minimised.  Ideally, we
> would want major modes to "see" only their own islands (or lack
> thereof).  Thus we treat irrelevant islands as blocks of whitespace.  It
> seems to make sense to have such islands matched by subexpressions in
> regexps which match spaces.  This would obviate the need to amend a
> great number of regexps currently coded in major modes.

For search, at least, I don't see why you don't make use of
`isearch-filter-predicate'.  That's what I do in my code, to
search only within (or without: complement) a set of zones
(~chain of islands).  That seems simple and cheap.

[I also optionally dim the non-islands during search (or the
non-non-islands, if complementing), so the areas being searched
stand out more.]

> On the other hand, when a user does C-s or C-M-s, the Right Thing is
> surely to search the buffer as a whole, without regard to islands.  We
> therefore need a flag which instructs the primitives how to behave when
> there are islands.  We might as well call this flag `in-islands', for
> want of a better name.

`isearch-filter-predicate'.  It can let code know whether
you are island-searching or not.

> The user will, from time to time, delete the delimiters which define
> islands, and will insert other ones.

FWIW, markers as delimiters do not have that problem.

[The `isearch-prop.el' code can use zones defined by either
their limits (e.g., markers) or text or overlay properties
on their text.  It lets commands like `query-replace' do
similarly.]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-21 22:01     ` Drew Adams
@ 2016-04-22  8:13       ` Alan Mackenzie
  2016-04-22 17:04         ` Drew Adams
  0 siblings, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2016-04-22  8:13 UTC (permalink / raw)
  To: Drew Adams; +Cc: Eli Zaretskii, dgutov, emacs-devel

Hello, Drew.

On Thu, Apr 21, 2016 at 03:01:12PM -0700, Drew Adams wrote:
> [More interesting details.  Thx.]

> > Given a buffer position, we need to be able to find the corresponding
> > island chain.  "Obviously", we do this with a text property, which we
> > might as well call `island', or possibly `chain'.  Since successive
> > accesses to chain local variables are very likely to be in the same
> > chain most of the time, we will cache the "current" chain in buffer
> > local variables.

> I guess you are referring to the possibility of more than one
> chain having an island at point, and wanting to pick up the right
> one as the "current" chain - so you check a text property, which
> identifies the chain that is currently active.  Is that right?

Er, no.  ;-)  Even when there is only one island at point, we still need
to determine it.  A text property is a good way of doing this.

> > When it comes to movement and search primitives, we want to adapt these
> > so that the impact on existing major modes is minimised.  Ideally, we
> > would want major modes to "see" only their own islands (or lack
> > thereof).  Thus we treat irrelevant islands as blocks of whitespace.  It
> > seems to make sense to have such islands matched by subexpressions in
> > regexps which match spaces.  This would obviate the need to amend a
> > great number of regexps currently coded in major modes.

> For search, at least, I don't see why you don't make use of
> `isearch-filter-predicate'.  That's what I do in my code, to
> search only within (or without: complement) a set of zones
> (~chain of islands).  That seems simple and cheap.

Thanks, I didn't know about that variable.  But it may not be widely
applicable enough.

> [I also optionally dim the non-islands during search (or the
> non-non-islands, if complementing), so the areas being searched
> stand out more.]

That's another matter, at a different level of abstraction from the main
topic.

> > On the other hand, when a user does C-s or C-M-s, the Right Thing is
> > surely to search the buffer as a whole, without regard to islands.  We
> > therefore need a flag which instructs the primitives how to behave when
> > there are islands.  We might as well call this flag `in-islands', for
> > want of a better name.

> `isearch-filter-predicate'.  It can let code know whether
> you are island-searching or not.

That would only work for isearch.

> > The user will, from time to time, delete the delimiters which define
> > islands, and will insert other ones.

> FWIW, markers as delimiters do not have that problem.

I think they do.  What happens when you have two islands bounded by four
markers, and you delete a region containing the two middle markers;

      MaaaaaaaaaaaM       MbbbbbbbbbbbbbM
               dddddddddddddddddd

?  You might well not want the two islands a and b to be coalesced.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: A vision for multiple major modes: some design notes
  2016-04-22  8:13       ` Alan Mackenzie
@ 2016-04-22 17:04         ` Drew Adams
  0 siblings, 0 replies; 45+ messages in thread
From: Drew Adams @ 2016-04-22 17:04 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Eli Zaretskii, dgutov, emacs-devel

> > For search, at least, I don't see why you don't make use of
> > `isearch-filter-predicate'.  That's what I do in my code, to
> > search only within (or without: complement) a set of zones
> > (~chain of islands).  That seems simple and cheap.
> 
> Thanks, I didn't know about that variable.  But it may not be
> widely applicable enough.

I guess you're referring to point movements, among other things.

`isearch-filter-predicate', or similar, could presumably be made
so (more widely applicable).  It is also used by `perform-replace'
(e.g. `query-replace'), BTW - not just search.

The point is that a predicate is more general than a regexp, and
it doesn't interfere with the use of a regexp (and vice versa).

> > > On the other hand, when a user does C-s or C-M-s, the Right Thing is
> > > surely to search the buffer as a whole, without regard to islands.  We
> > > therefore need a flag which instructs the primitives how to behave
> > > when there are islands.
> >
> > `isearch-filter-predicate'.  It can let code know whether
> > you are island-searching or not.
> 
> That would only work for isearch.

Not if other code takes it into account.  It only worked for
search until `perform-replace' started taking it into account.

> > > The user will, from time to time, delete the delimiters
> > > which define islands, and will insert other ones.
> 
> > FWIW, markers as delimiters do not have that problem.
> 
> I think they do.  What happens when you have two islands bounded by four
> markers, and you delete a region containing the two middle markers;
> 
>       MaaaaaaaaaaaM       MbbbbbbbbbbbbbM
>                dddddddddddddddddd
> 
> ?  You might well not want the two islands a and b to be coalesced.

What's the alternative?  If you're worried about different
modes (for example) for aaaaaa and bbbbbb then consider
keeping lists of markers per mode (or whatever) - like we
sometimes handle overlays using one or more lists.

Anyway, it was only a "FWIW".  I use both text properties
and markers.  There are advantages and disadvantages to
any implementation.

Also, where I use markers I allow extra info in a given
zone, in addition to the markers:

;; A "basic zone" is a list of two buffer positions followed
;; by a possibly empty list of extra information:
;; (POS1 POS2 . EXTRA).

IOW, some info is location-specific (buffer and position),
and other info (EXTRA) is zone-specific.

In your scenario, if a zone's second marker is deleted
then code could decide, based on whatever (including whether
or not aaaaaaaa and bbbbbbbb are in the same mode or have
compatible "EXTRA" island info), whether to: coalesce them,
delete them (as islands, not the text), or keep them separate.

The point is that the code can do anything.  But yes, a
single marker does not record more than a buffer and a
position.  I think, however, that the additional info
you are wanting to associate here is really (typically, at
least) info to associate with the island, and not info to
associate with an individual marker.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-21 21:33   ` Alan Mackenzie
  2016-04-21 22:01     ` Drew Adams
@ 2016-04-22  9:04     ` Eli Zaretskii
  2016-06-13 21:17     ` John Wiegley
  2 siblings, 0 replies; 45+ messages in thread
From: Eli Zaretskii @ 2016-04-22  9:04 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel, dgutov

> Date: Thu, 21 Apr 2016 21:33:23 +0000
> Cc: dgutov@yandex.ru, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> I'll get a fuller reply to you later.  But for now....

Thanks.

> Given a buffer position, we need to be able to find the corresponding
> island chain.  "Obviously", we do this with a text property, which we
> might as well call `island', or possibly `chain'.

Why text properties?  Have you considered a hash table?  I think it
will be faster, and will also avoid a few complications that text
properties bring as baggage you don't necessarily want (like what
happens when you copy text to another location or buffer).



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-21 21:33   ` Alan Mackenzie
  2016-04-21 22:01     ` Drew Adams
  2016-04-22  9:04     ` Eli Zaretskii
@ 2016-06-13 21:17     ` John Wiegley
  2016-06-14 13:13       ` Alan Mackenzie
  2 siblings, 1 reply; 45+ messages in thread
From: John Wiegley @ 2016-06-13 21:17 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Eli Zaretskii, dgutov, emacs-devel

>>>>> Alan Mackenzie <acm@muc.de> writes:

> The essence of major mode support is buffer local variables. (Things like
> the syntax table and local key map are basically buffer local variables,
> even though they are not accessible as such from Lisp.) So, at first sight,
> each "island" in the buffer needs its own set of "buffer local" variables.

I don't agree that this is the essence of major mode support. Another aspect
of major modes is an expectation of which text properties might occur
throughout the buffer, and where and why.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-06-13 21:17     ` John Wiegley
@ 2016-06-14 13:13       ` Alan Mackenzie
  2016-06-14 16:27         ` John Wiegley
  0 siblings, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2016-06-14 13:13 UTC (permalink / raw)
  To: emacs-devel

Hello, John.

Trust you've had a good holiday!

On Mon, Jun 13, 2016 at 02:17:40PM -0700, John Wiegley wrote:
> >>>>> Alan Mackenzie <acm@muc.de> writes:

> > The essence of major mode support is buffer local variables. (Things like
> > the syntax table and local key map are basically buffer local variables,
> > even though they are not accessible as such from Lisp.) So, at first sight,
> > each "island" in the buffer needs its own set of "buffer local" variables.

> I don't agree that this is the essence of major mode support. Another aspect
> of major modes is an expectation of which text properties might occur
> throughout the buffer, and where and why.

OK.  Shall we agree that the buffer local variables are a crucially
important part of what constitutes a major mode?  :-)  Clearly text
properties are important (indeed, in the case of, e.g., CC Mode
critically important) too.

> -- 
> John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
> http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-06-14 13:13       ` Alan Mackenzie
@ 2016-06-14 16:27         ` John Wiegley
  0 siblings, 0 replies; 45+ messages in thread
From: John Wiegley @ 2016-06-14 16:27 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

>>>>> Alan Mackenzie <acm@muc.de> writes:

>> I don't agree that this is the essence of major mode support. Another
>> aspect of major modes is an expectation of which text properties might
>> occur throughout the buffer, and where and why.

> OK. Shall we agree that the buffer local variables are a crucially important
> part of what constitutes a major mode? :-) Clearly text properties are
> important (indeed, in the case of, e.g., CC Mode critically important) too.

I'm sorry, after reading this again today I'm not sure why my reaction sounded
so strong.

Surely buffer local variables are a key, essential component to the picture.
There is a "context" that defines what a mode is (buffer local vars, text
properties, event bindings, etc), and we need a way of scoping such contexts
within buffers, which you've begun describing in your proposal.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-21 14:17 ` Eli Zaretskii
  2016-04-21 21:33   ` Alan Mackenzie
@ 2016-04-21 22:19   ` Alan Mackenzie
  2016-04-22  8:48     ` Eli Zaretskii
  2016-04-22 13:42     ` Andy Moreton
  1 sibling, 2 replies; 45+ messages in thread
From: Alan Mackenzie @ 2016-04-21 22:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, dgutov

Hello, Eli.

On Thu, Apr 21, 2016 at 05:17:09PM +0300, Eli Zaretskii wrote:
> > Date: Wed, 20 Apr 2016 19:44:50 +0000
> > From: Alan Mackenzie <acm@muc.de>
> > 
> > This post describes my notion of how multiple major modes {c,sh}ould be
> > implemented.  Key notions are "islands", "island chains", and "chain
> > local" variable bindings.

> Thank you for publishing this.  A few comments and questions below.
> Please keep in mind that I never had to write any Lisp that deals with
> these issues, so apologies in advance for possibly silly questions and
> misunderstandings.

> >   o - To the user, the current major mode will be that of the island where
> >     point is.  All familiar commands will work without restriction.

> Does this mean the display of mode line, menu bar, and tool bar will
> change accordingly?

Yes, please!

> A more subtle issue is with point movements that are not shown to the
> user (those done by Lisp code of some command, before redisplay kicks
> in) -- what will be the effect of those? do they trigger redisplay,
> for example?

They shouldn't trigger redisplay, no.

> >   o - An island chain will have @dfn{chain local} variable bindings.  Such a
> >     binding will become current and accessible when point is within one of the
> >     chain's islands.  When point is not in an island, the buffer local binding
> >     of the variable will be current.

> Emacs sometimes examines buffer text without moving point, and we
> generally expect for buffer-local bindings to be in effect regardless.
> A prominent example is the display engine.  I will return to that
> later.

OK.

> >     * - [Island] will be covered by the text property `island', whose value will be
> >       the pertinent island or island chain (see section (ii)) (not yet
> >       decided).  Note that if islands are enclosed inside other islands, the
> >       value is the innermost island.  There is the possibility of using an
> >       interval tree independent of the one for text properties to increase
> >       performance.

> I don't understand the notion of "enclosed" islands: wouldn't such
> "enclosing" simply break the "outer" island into two separate islands?

If we mark island start and end with the syntax-table text properties
"{" and "}", we're going to have something like

    {     a{  }b    }

.  Simply to break the outer island into two pieces, we'd really need to
apply delimiters at a and b, giving:

    {     }{  }{    }

.  This would overwrite the previous syntaxes at a and b, and this might
be a Bad Thing.

> >   o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as
> >     whitespace, much as they do comments.  They will also treat as whitespace
> >     the gap between two islands in a chain.

> Why whitespace? why not some new category?  By overloading whitespace,
> you make things harder on the underlying infrastructure, like regexp
> search and matching.

I think it's clear that the "foreign" island's syntax has no interaction
with the current island.  If we treat it as whitespace, that should
minimise the amount of adapting we need to do to existing major modes.

I envisage that a regexp element will match the "foreign" island if that
element would match a space.  I know this sounds horrible, but I haven't
come up with a scenario where this wouldn't work well.  (This is
assuming, of course, that the magic flag `in-islands' is non-nil.)

> >   o - The regexp engine will be enhanced such that the regexps "\\s-", "\\s ",
> >     and "[[:space:]] will match an entire island.

> Extending [:space:] that way seems to be an implementation detail
> leaking to user level.  I think we should avoid that at all costs.

Why?  I don't understand your last paragraph.

> >   o - The gap between two islands in a chain will also be matched by the above
> >     regexps.
> >   o - This treatment of an island, and a gap between two islands, as WS will
> >     occur only when `in-islands' is non-nil.
> >   o - When `in-islands' is nil, there will be no reliable way of scanning over
> >     an island by regexps, since it is a potentially nested structure, and FSMs
> >     don't recognise arbitrarily nested structures.

> > (vii) Variables.
> >   o - Island chain local variable bindings will come into existence.  These
> >     bindings depend on the island point is in.  There will be lower level
> >     routines that will have "position" parameters as an alternative to using
> >     point.
> >   o - All variables which are currently buffer local will become chain local
> >     except for those whose symbols are given a non-nil `entire-buffer'
> >     property.  There will be no new functions like
> >     `make-chain-local-variable'.
> >   o - When the `entire-buffer' property is nil, the buffer local binding of a
> >     variable will hold the value pertinent to the areas of the buffer outside
> >     of islands.  When that property is non-nil, the binding holds the value
> >     for the entire buffer.
> >   o - When `in-islands' is nil, the chain local mechanism described here is
> >     not used - instead the familiar buffer local binding is used.
> >   o - The current binding for a local variable will be the chain local binding
> >     of the island chain of the island containing point.  If point is not in an
> >     island, the buffer local binding is current.
> >   o - If a chain local binding is current, and its value is unbound, the
> >     binding of an enclosing scope is NOT used in its place.  Probably the
> >     variable's default-value should be used when reading.
> >   o - In buffer.h, a new macro CVAR ("island chain variable") analogous to
> >     BVAR will be introduced.  It will use BVAR as a fall back.  Most
> >     invocations of BVAR will be changed to CVAR.
> >   o - In data.c, the mechanism for accessing local variable bindings
> >     (e.g. `swap_in_symval_forwarding') will be enhanced to test `in-islands'
> >     and handle chain local bindings appropriately.

> I'm not sure I understand the details.  E.g., where will the
> island-chain local values be stored?

In a C struct chain, analogous to struct buffer, using much the same
mechanisms.

> To remind you, buffer-local variables have a special object in their
> symbol value cell, and BVAR only works for the few buffer-local
> variables that are stored in the buffer object itself.  I'm not sure I
> understand how CVAR could solve the problem you need to solve, which
> is keeping multiple chains per buffer, each one with its values of
> these variables.

CVAR would get the current chain from the `island' (or `chain') text
property at the position.  If this is nil, it would do what BVAR does.
Otherwise it would access the appropriate named element in the struct
chain.  I think CVAR would take three parameters: the variable name, the
buffer, and the buffer position.

Other chain local variables would be accessed through an alist in the
struct chain holding miscellaneous variables, exactly as is done for
the other buffer local variables in struct buffer.

Unless there is a better solution, of course.

> > (ix) Miscellaneous commands and functions.
> >   o - `point-min' and `point-max' will, when `in-islands' is non-nil, return
> >     the max/min point in the visible region in the same chain of islands as
> >     point.
> >   o - `search-\(forward\|backward\)\(-regexp\)?' will restrict themselves to
> >     the current island chain when `in-islands' is non-nil.
> >   o - `skip-\(chars\|syntax\)-\(forward\|backward\)' will likewise operate in
> >     the current island chain (how?) when `in-islands' is non-nil.
> >   o - `\(next\|previous\)-\(single\|char\)-property-change', etc., will do the
> >     Right Thing in island chains when `in-islands' is non-nil.
> >   o - New functions `island-min', `island-max', `island-chain-min' and
> >     `island-chain-max' will do what their names say.
> >   o - There will be no restrictions on the use of widening/narrowing, as have
> >     been proposed for other support engines for multiple major modes.
> >   o - New commands like `beginning-of-island', `narrow-to-island', etc. will
> >     be wanted.  More difficultly, bindings for them will be needed.
> >   o - ??? Other commands to be amended.

> This actually sounds like a simple extension of narrowing, so I wonder
> why do we need so many new object types and notions.

I think it's more like a complicated extension of narrowing.  :-)  I
think that chain local variables are essential to multiple major modes -
you can't have m.m.m. without some sort of chain locality.  I also think
that for a major mode to work transparently over several chained
islands, all the irrelevant stuff between the islands needs to be made,
er, transparent.  That is what section (ix) is about.

> > (x) Emacs subsystems and `in-islands'.
> >   o - Redisplay will bind `in-islands' to non-nil, but will successfully
> >     display all islands wholly or partially in windows being displayed.
> >   o - Font Lock will bind `in-islands' to non-nil, but will successfully
> >     fontify all pertinent islands.
> >   o - `island-before/after-change-function' will be called with `in-islands'
> >     nil.
> >   o - `before/after-change-functions' will be called with `in-islands' bound
> >     to non-nil.
> >   o - Major modes will need to bind `in-islands' to non-nil for such things as
> >     indentation.
> >   o - For normal user interaction, `in-islands' will be nil.

> I don't see any discussion of how redisplay will deal with islands.
> To remind you, redisplay moves through portions of the buffer, without
> moving point, and access buffer-local variables for its job.  You need
> to augment the design with something that will allow redisplay see the
> correct values of variables depending on the buffer position it is at.
> The same problem exists for any features that use display simulation
> for making decisions about movement and layout, e.g. vertical-motion.

I think redisplay is mostly controlled by variables (such as
`scroll-margin') accessed by BVAR.  These calls could be replaced by
CVAR.  Problems will arise if redisplay reads the variable once, and
fails to read it again when its current position moves into or out of an
island.  Redisplay would have to be aware of island boundaries, and
re-read the controlling variables on passing a boundary.  Other than
that, I can't see any big problems.  Not yet, anyway.

[ .... ]

> Thanks.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-21 22:19   ` Alan Mackenzie
@ 2016-04-22  8:48     ` Eli Zaretskii
  2016-04-22 22:35       ` Alan Mackenzie
  2016-04-22 13:42     ` Andy Moreton
  1 sibling, 1 reply; 45+ messages in thread
From: Eli Zaretskii @ 2016-04-22  8:48 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel, dgutov

> Date: Thu, 21 Apr 2016 22:19:43 +0000
> Cc: dgutov@yandex.ru, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > A more subtle issue is with point movements that are not shown to the
> > user (those done by Lisp code of some command, before redisplay kicks
> > in) -- what will be the effect of those? do they trigger redisplay,
> > for example?
> 
> They shouldn't trigger redisplay, no.

But if that code calls sit-for or somesuch, they will, and the result
will be flickering.  But that's not a very important issue.

> > >     * - [Island] will be covered by the text property `island', whose value will be
> > >       the pertinent island or island chain (see section (ii)) (not yet
> > >       decided).  Note that if islands are enclosed inside other islands, the
> > >       value is the innermost island.  There is the possibility of using an
> > >       interval tree independent of the one for text properties to increase
> > >       performance.
> 
> > I don't understand the notion of "enclosed" islands: wouldn't such
> > "enclosing" simply break the "outer" island into two separate islands?
> 
> If we mark island start and end with the syntax-table text properties
> "{" and "}", we're going to have something like
> 
>     {     a{  }b    }
> 
> .  Simply to break the outer island into two pieces, we'd really need to
> apply delimiters at a and b, giving:
> 
>     {     }{  }{    }
> 
> .  This would overwrite the previous syntaxes at a and b, and this might
> be a Bad Thing.

We could design the stuff so that Bad Things won't happen.  I consider
this nesting of islands a (possibly unnecessary) complications that we
shouldn't accept unless we have a very good reason.  Nesting
immediately requires a plethora of operations that are otherwise not
necessary.

> > >   o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as
> > >     whitespace, much as they do comments.  They will also treat as whitespace
> > >     the gap between two islands in a chain.
> 
> > Why whitespace? why not some new category?  By overloading whitespace,
> > you make things harder on the underlying infrastructure, like regexp
> > search and matching.
> 
> I think it's clear that the "foreign" island's syntax has no interaction
> with the current island.

This is not a contradiction to what I suggested.  The new category
could be treated the same as whitespace, in its effect on
syntax-related issues.  By contrast, having whitespace regexp class be
indistinguishable from an island probably means complications on a
very low level of matching regular expressions and syntax constructs,
something that I fear will get in the way.

> If we treat it as whitespace, that should minimise the amount of
> adapting we need to do to existing major modes.

We need to consider the amount of adaptations in the low-level
infrastructure code as well, not only on the application level.

> I envisage that a regexp element will match the "foreign" island if that
> element would match a space.  I know this sounds horrible, but I haven't
> come up with a scenario where this wouldn't work well.

And I say this is a bomb waiting to go off.  It is relatively easy to
add a new regexp construct for an island (e.g., we already support
categories in regexps, so just defining a category is one easy way),
and treat that as whitespace, while still keeping our options open to
make it behave slightly differently if needed, and still allowing the
applications to specify one, but not the other.  By contrast, if we
decide that whitespace matches an island, we are opening a giant can
of worms.  Here's one worm out of that can: some low-level operations
need to search the buffer using regexps disregarding any narrowing --
what you suggest means these operations cannot safely use whitespace
in their regexps.  This is something to stay away of, IMO.

> > Extending [:space:] that way seems to be an implementation detail
> > leaking to user level.  I think we should avoid that at all costs.
> 
> Why?  I don't understand your last paragraph.

See above.  [:space:] is something used a lot in Lisp applications, so
we leak the implementation of islands to that level: from now on, each
Lisp application will need to consider the possibility that searching
for [:space:] will find an island, something that might have no
relation to whitespace.

> > I'm not sure I understand the details.  E.g., where will the
> > island-chain local values be stored?
> 
> In a C struct chain, analogous to struct buffer, using much the same
> mechanisms.

What object(s) will that chain be rooted at?  And how will it be
related to its buffer?

> > To remind you, buffer-local variables have a special object in their
> > symbol value cell, and BVAR only works for the few buffer-local
> > variables that are stored in the buffer object itself.  I'm not sure I
> > understand how CVAR could solve the problem you need to solve, which
> > is keeping multiple chains per buffer, each one with its values of
> > these variables.
> 
> CVAR would get the current chain from the `island' (or `chain') text
> property at the position.

If it is stored in the text property, then you will have to decide
what happens when text is copied and yanked elsewhere.

> If this is nil, it would do what BVAR does.

Once again, BVAR only handles variables that are part of the buffer
object itself.  The other buffer-local variables (which are the
majority) are handled as part of switching the buffer, and the C code
simply refers to them by name.  So BVAR is not necessarily the correct
model for what you are designing.

> Otherwise it would access the appropriate named element in the struct
> chain.  I think CVAR would take three parameters: the variable name, the
> buffer, and the buffer position.

Can you show a pseudo-code of CVAR?  I'm afraid I'm missing something
here, because I don't see clearly what you have in mind.

> Other chain local variables would be accessed through an alist in the
> struct chain holding miscellaneous variables, exactly as is done for
> the other buffer local variables in struct buffer.

There's no such alist in how we access buffer-local variables, not
AFAIK.  Again, I must be missing something here.

> > This actually sounds like a simple extension of narrowing, so I wonder
> > why do we need so many new object types and notions.
> 
> I think it's more like a complicated extension of narrowing.  :-)

It's simple because instead of one region you have more than one, and
the user-level commands don't affect them.  All the other changes are
exact reproduction of what narrowing does.

> I think that chain local variables are essential to multiple major
> modes - you can't have m.m.m. without some sort of chain locality.

What is "chain locality"?

> I also think that for a major mode to work transparently over
> several chained islands, all the irrelevant stuff between the
> islands needs to be made, er, transparent.

Yes, but how is that related to my comment about extending narrowing?

> > I don't see any discussion of how redisplay will deal with islands.
> > To remind you, redisplay moves through portions of the buffer, without
> > moving point, and access buffer-local variables for its job.  You need
> > to augment the design with something that will allow redisplay see the
> > correct values of variables depending on the buffer position it is at.
> > The same problem exists for any features that use display simulation
> > for making decisions about movement and layout, e.g. vertical-motion.
> 
> I think redisplay is mostly controlled by variables (such as
> `scroll-margin') accessed by BVAR.  These calls could be replaced by
> CVAR.

That's not the whole story; once again, you forget about buffer-local
variables that are not part of the buffer object; BVAR is not used for
those.  I gave an example of one such variable: face-remapping-alist,
and I selected that variable for a reason.  Here's how the display
engine refers to it in the current codebase:

	  base_face_id = it->string_from_prefix_prop_p
	    ? (!NILP (Vface_remapping_alist)
	       ? lookup_basic_face (it->f, DEFAULT_FACE_ID)
	       : DEFAULT_FACE_ID)
	    : underlying_face_id (it);

Another example (which I also mentioned) is standard-display-table:

  /* Use the standard display table for displaying strings.  */
  if (DISP_TABLE_P (Vstandard_display_table))
    it->dp = XCHAR_TABLE (Vstandard_display_table);

See? no BVAR anywhere in sight.

> Problems will arise if redisplay reads the variable once, and
> fails to read it again when its current position moves into or out of an
> island.  Redisplay would have to be aware of island boundaries, and
> re-read the controlling variables on passing a boundary.  Other than
> that, I can't see any big problems.  Not yet, anyway.

To remind you, the display engine works by examining characters from
the buffer text one by one.  Are you saying that it will have, for
each character it examines, to look up the island chain for possible
changes?  That would make it abysmally slow, I think.

IOW, part of your design needs to provide some efficient means for
redisplay to "be aware of island boundaries, and re-read the
controlling variables on passing a boundary".

There's one more complication, which is related to redisplay, but not
only to it.  You write:

> (ix) Miscellaneous commands and functions.
>   o - `point-min' and `point-max' will, when `in-islands' is non-nil, return
>     the max/min point in the visible region in the same chain of islands as
>     point.
>   o - `search-\(forward\|backward\)\(-regexp\)?' will restrict themselves to
>     the current island chain when `in-islands' is non-nil.
>   o - `skip-\(chars\|syntax\)-\(forward\|backward\)' will likewise operate in
>     the current island chain (how?) when `in-islands' is non-nil.
>   o - `\(next\|previous\)-\(single\|char\)-property-change', etc., will do the
>     Right Thing in island chains when `in-islands' is non-nil.
>   o - New functions `island-min', `island-max', `island-chain-min' and
>     `island-chain-max' will do what their names say.
>   o - There will be no restrictions on the use of widening/narrowing, as have
>     been proposed for other support engines for multiple major modes.
>   o - New commands like `beginning-of-island', `narrow-to-island', etc. will
>     be wanted.  More difficultly, bindings for them will be needed.

Something bothers me there.  What will "M-<" and "M->" do, if
point-min and point-max are limited to the current island?  Likewise
the search commands -- they cannot be limited to the current island,
unless the user explicitly says so (and personally, I don't envision
users to ask to be so limited).

There's a dichotomy here, between the underlying C-level variables
that currently are set to the limits of the narrowed region, and
affect all user commands and internal operations (e.g., the display
engine never looks beyond these limits); and the multi-mode
functionality that needs to narrow the view even more.  If you
propagate the island-level limitations too deep, they will affect user
commands and features (like display) that have nothing to do with the
reason for which islands are being designed.  E.g., a naïve
replacement of C macros BEGV and ZV with something that returns the
beginning and end of the current island will cause the display show
only the current island, as if you narrowed the buffer to that
island.  I'm sure that's not what we want.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-22  8:48     ` Eli Zaretskii
@ 2016-04-22 22:35       ` Alan Mackenzie
  2016-04-23  7:39         ` Eli Zaretskii
  0 siblings, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2016-04-22 22:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: dgutov, emacs-devel

Hello, Eli.

On Fri, Apr 22, 2016 at 11:48:52AM +0300, Eli Zaretskii wrote:
> > Date: Thu, 21 Apr 2016 22:19:43 +0000
> > Cc: dgutov@yandex.ru, emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>

[ .... ]

> > > >     * - [Island] will be covered by the text property `island', whose value will be
> > > >       the pertinent island or island chain (see section (ii)) (not yet
> > > >       decided).  Note that if islands are enclosed inside other islands, the
> > > >       value is the innermost island.  There is the possibility of using an
> > > >       interval tree independent of the one for text properties to increase
> > > >       performance.

> > > I don't understand the notion of "enclosed" islands: wouldn't such
> > > "enclosing" simply break the "outer" island into two separate islands?

> > If we mark island start and end with the syntax-table text properties
> > "{" and "}", we're going to have something like

> >     {     a{  }b    }

> > .  Simply to break the outer island into two pieces, we'd really need to
> > apply delimiters at a and b, giving:

> >     {     }{  }{    }

> > .  This would overwrite the previous syntaxes at a and b, and this might
> > be a Bad Thing.

> We could design the stuff so that Bad Things won't happen.  I consider
> this nesting of islands a (possibly unnecessary) complication that we
> shouldn't accept unless we have a very good reason.  Nesting
> immediately requires a plethora of operations that are otherwise not
> necessary.

OK.  You're advocating, I think, not having well defined islands in a
chain (i.e., every island having a defined and marked beginning and end),
instead just having regions of text, each of which is associated with an
island chain (via the `island' text property, say).  This would make
syntactic scanning more difficult (though not impossible).

I can't judge at the moment which scheme is the better one.

> > > >   o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as
> > > >     whitespace, much as they do comments.  They will also treat as whitespace
> > > >     the gap between two islands in a chain.

> > > Why whitespace? why not some new category?  By overloading whitespace,
> > > you make things harder on the underlying infrastructure, like regexp
> > > search and matching.

> > I think it's clear that the "foreign" island's syntax has no interaction
> > with the current island.

> This is not a contradiction to what I suggested.  The new category
> could be treated the same as whitespace, in its effect on
> syntax-related issues.  By contrast, having whitespace regexp class be
> indistinguishable from an island probably means complications on a
> very low level of matching regular expressions and syntax constructs,
> something that I fear will get in the way.

> > If we treat it as whitespace, that should minimise the amount of
> > adapting we need to do to existing major modes.

> We need to consider the amount of adaptations in the low-level
> infrastructure code as well, not only on the application level.

I think the adaptations to the regexp engine would be far less work than
adapting many thousands of regexps in major modes we want to use as
sub-modes.  For example there are 115 occurrences in CC Mode of just the
exact string "[ \t".

> > I envisage that a regexp element will match the "foreign" island if that
> > element would match a space.  I know this sounds horrible, but I haven't
> > come up with a scenario where this wouldn't work well.

> And I say this is a bomb waiting to go off.  It is relatively easy to
> add a new regexp construct for an island (e.g., we already support
> categories in regexps, so just defining a category is one easy way),
> and treat that as whitespace, while still keeping our options open to
> make it behave slightly differently if needed, and still allowing the
> applications to specify one, but not the other.

Bear in mind that this matching of an island by a whitespace regexp
element would happen ONLY whilst `in-islands' was bound to non-nil, i.e.
when a major mode is working in its own island chain.  Are there any
circumstances in which we would not want the major mode to see the gap
between its islands as WS?  When `in-islands' is nil (i.e. when the super
mode's code is running, or the user is typing commands) the islands would
NOT match a WS regexp.

> By contrast, if we decide that whitespace matches an island, we are
> opening a giant can of worms.  Here's one worm out of that can: some
> low-level operations need to search the buffer using regexps
> disregarding any narrowing -- what you suggest means these operations
> cannot safely use whitespace in their regexps.  This is something to
> stay away of, IMO.

It depends on whether these low level operations are working within an
island chain (`in-islands' non-nil) or on the buffer as a whole
(`in-islands' nil).  I think such operations would typically be run with
`in-islands' nil, hence would not run up against these problems.

> > > Extending [:space:] that way seems to be an implementation detail
> > > leaking to user level.  I think we should avoid that at all costs.

> > Why?  I don't understand your last paragraph.

> See above.  [:space:] is something used a lot in Lisp applications, so
> we leak the implementation of islands to that level: from now on, each
> Lisp application will need to consider the possibility that searching
> for [:space:] will find an island, something that might have no
> relation to whitespace.

I rather see it as major mode Lisp code not having to concern itself with
the possibility of (foreign) islands or gaps.  It merely has to do
(search-forward-re "....[ \t]...." ...), and it will end up at the next
valid place in its own island chain.  The aim would be for the major mode
to be as unaware of the island mechanism as possible.

Of course the super mode or the user would have to be aware of the
islands, and search for things like, e.g., "\\s{" and "\\s}" to match the
island boundaries.

> > > I'm not sure I understand the details.  E.g., where will the
> > > island-chain local values be stored?

> > In a C struct chain, analogous to struct buffer, using much the same
> > mechanisms.

> What object(s) will that chain be rooted at?  And how will it be
> related to its buffer?

The chain will be the value of the `island' text property set on all the
islands of the chain.  It would also occupy the "matching character" slot
of the "open island" and "close island" syntax descriptors (though I'm
having second thoughts about this bit).  Both of these couple the chain
with its buffer.

> > > To remind you, buffer-local variables have a special object in their
> > > symbol value cell, and BVAR only works for the few buffer-local
> > > variables that are stored in the buffer object itself.  I'm not sure I
> > > understand how CVAR could solve the problem you need to solve, which
> > > is keeping multiple chains per buffer, each one with its values of
> > > these variables.

> > CVAR would get the current chain from the `island' (or `chain') text
> > property at the position.

> If it is stored in the text property, then you will have to decide
> what happens when text is copied and yanked elsewhere.

It would be the job of the `island-after-change-function' to strip the
unwanted text properties (both the `island' and `syntax-table' ones) and
to apply any needed new ones to the yanked region.

> > If this is nil, it would do what BVAR does.

> Once again, BVAR only handles variables that are part of the buffer
> object itself.  The other buffer-local variables (which are the
> majority) are handled as part of switching the buffer, and the C code
> simply refers to them by name.  So BVAR is not necessarily the correct
> model for what you are designing.

> > Otherwise it would access the appropriate named element in the struct
> > chain.  I think CVAR would take three parameters: the variable name, the
> > buffer, and the buffer position.

> Can you show a pseudo-code of CVAR?  I'm afraid I'm missing something
> here, because I don't see clearly what you have in mind.

I'll try.  Something like this:

#define CVAR(var, buf, position) \
    chain = read_text_property (Qisland, buf, position), \
    chain ? chain.var \
          : BVAR (var, buf)
        
, but I don't think that would be a valid Lvalue in C.  :-(

> > Other chain local variables would be accessed through an alist in the
> > struct chain holding miscellaneous variables, exactly as is done for
> > the other buffer local variables in struct buffer.

> There's no such alist in how we access buffer-local variables, not
> AFAIK.  Again, I must be missing something here.

Or, maybe I am.  I thought that the slot `local_var_alist_' in the struct
buffer held the bindings of all the non-BVAR local variables, as an
alist.  I'm not at all clear on when and how buffer local variable
bindings get swapped in and out of, say, C variables like Vfoo.

> > > This actually sounds like a simple extension of narrowing, so I wonder
> > > why do we need so many new object types and notions.

> > I think it's more like a complicated extension of narrowing.  :-)

> It's simple because instead of one region you have more than one, and
> the user-level commands don't affect them.  All the other changes are
> exact reproduction of what narrowing does.

> > I think that chain local variables are essential to multiple major
> > modes - you can't have m.m.m. without some sort of chain locality.

> What is "chain locality"?

Having things (variables) which are local to a chain, as opposed to
global variables or buffer local variables or frame local variables.

> > I also think that for a major mode to work transparently over
> > several chained islands, all the irrelevant stuff between the
> > islands needs to be made, er, transparent.

> Yes, but how is that related to my comment about extending narrowing?

Maybe it's not, very much.

> > > I don't see any discussion of how redisplay will deal with islands.
> > > To remind you, redisplay moves through portions of the buffer, without
> > > moving point, and access buffer-local variables for its job.  You need
> > > to augment the design with something that will allow redisplay see the
> > > correct values of variables depending on the buffer position it is at.
> > > The same problem exists for any features that use display simulation
> > > for making decisions about movement and layout, e.g. vertical-motion.

> > I think redisplay is mostly controlled by variables (such as
> > `scroll-margin') accessed by BVAR.  These calls could be replaced by
> > CVAR.

> That's not the whole story; once again, you forget about buffer-local
> variables that are not part of the buffer object; BVAR is not used for
> those.  I gave an example of one such variable: face-remapping-alist,
> and I selected that variable for a reason.  Here's how the display
> engine refers to it in the current codebase:

> 	  base_face_id = it->string_from_prefix_prop_p
> 	    ? (!NILP (Vface_remapping_alist)
> 	       ? lookup_basic_face (it->f, DEFAULT_FACE_ID)
> 	       : DEFAULT_FACE_ID)
> 	    : underlying_face_id (it);

> Another example (which I also mentioned) is standard-display-table:

>   /* Use the standard display table for displaying strings.  */
>   if (DISP_TABLE_P (Vstandard_display_table))
>     it->dp = XCHAR_TABLE (Vstandard_display_table);

> See? no BVAR anywhere in sight.

OK.  But `face-remapping-alist' can definitely be made buffer local, and
`standard-display-table' most probably can.  There will be some mechanism
(which I don't currently understand) by which buffer local values are
swapped into and out of Vface_remapping_alist when the current buffer
changes.  Surely a similar mechanism could be created for when the
current island changes.

> > Problems will arise if redisplay reads the variable once, and
> > fails to read it again when its current position moves into or out of an
> > island.  Redisplay would have to be aware of island boundaries, and
> > re-read the controlling variables on passing a boundary.  Other than
> > that, I can't see any big problems.  Not yet, anyway.

> To remind you, the display engine works by examining characters from
> the buffer text one by one.  Are you saying that it will have, for
> each character it examines, to look up the island chain for possible
> changes?  That would make it abysmally slow, I think.

> IOW, part of your design needs to provide some efficient means for
> redisplay to "be aware of island boundaries, and re-read the
> controlling variables on passing a boundary".

Yes.

> There's one more complication, which is related to redisplay, but not
> only to it.  You write:

> > (ix) Miscellaneous commands and functions.
> >   o - `point-min' and `point-max' will, when `in-islands' is non-nil, return
> >     the max/min point in the visible region in the same chain of islands as
> >     point.
> >   o - `search-\(forward\|backward\)\(-regexp\)?' will restrict themselves to
> >     the current island chain when `in-islands' is non-nil.
> >   o - `skip-\(chars\|syntax\)-\(forward\|backward\)' will likewise operate in
> >     the current island chain (how?) when `in-islands' is non-nil.
> >   o - `\(next\|previous\)-\(single\|char\)-property-change', etc., will do the
> >     Right Thing in island chains when `in-islands' is non-nil.
> >   o - New functions `island-min', `island-max', `island-chain-min' and
> >     `island-chain-max' will do what their names say.
> >   o - There will be no restrictions on the use of widening/narrowing, as have
> >     been proposed for other support engines for multiple major modes.
> >   o - New commands like `beginning-of-island', `narrow-to-island', etc. will
> >     be wanted.  More difficultly, bindings for them will be needed.

> Something bothers me there.  What will "M-<" and "M->" do, if
> point-min and point-max are limited to the current island?  Likewise
> the search commands -- they cannot be limited to the current island,
> unless the user explicitly says so (and personally, I don't envision
> users to ask to be so limited).

Those restrictions will only apply when `in-islands' is bound to non-nil,
i.e. when major mode code is running.  It will be nil when the user types
in M-<, hence point will move to the beginning of the (visible region of
the) buffer.

So, for example, if the super mode is shell script, and the major mode in
the current island is AWK Mode, (point-min) will return the start of the
AWK Mode island chain (which is useful to AWK Mode), not the very start
of the buffer.

> There's a dichotomy here, between the underlying C-level variables
> that currently are set to the limits of the narrowed region, and
> affect all user commands and internal operations (e.g., the display
> engine never looks beyond these limits); and the multi-mode
> functionality that needs to narrow the view even more.  If you
> propagate the island-level limitations too deep, they will affect user
> commands and features (like display) that have nothing to do with the
> reason for which islands are being designed.  E.g., a naïve
> replacement of C macros BEGV and ZV with something that returns the
> beginning and end of the current island will cause the display show
> only the current island, as if you narrowed the buffer to that
> island.  I'm sure that's not what we want.

No, it's not.  I think BEGV will need to have different meanings
depending on the value of `in-islands'.  When it's nil, BEGV will have
the current meaning.  When it's non-nil, BEGV will mean "the lowest
buffer position which is both within the current island-chain and not
below the lowest visible position".  Or something like that.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-22 22:35       ` Alan Mackenzie
@ 2016-04-23  7:39         ` Eli Zaretskii
  2016-04-23 17:02           ` Alan Mackenzie
  0 siblings, 1 reply; 45+ messages in thread
From: Eli Zaretskii @ 2016-04-23  7:39 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: dgutov, emacs-devel

> Date: Fri, 22 Apr 2016 22:35:08 +0000
> Cc: emacs-devel@gnu.org, dgutov@yandex.ru
> From: Alan Mackenzie <acm@muc.de>
> 
> > > > Why whitespace? why not some new category?  By overloading whitespace,
> > > > you make things harder on the underlying infrastructure, like regexp
> > > > search and matching.
> 
> > > I think it's clear that the "foreign" island's syntax has no interaction
> > > with the current island.
> 
> > This is not a contradiction to what I suggested.  The new category
> > could be treated the same as whitespace, in its effect on
> > syntax-related issues.  By contrast, having whitespace regexp class be
> > indistinguishable from an island probably means complications on a
> > very low level of matching regular expressions and syntax constructs,
> > something that I fear will get in the way.
> 
> > > If we treat it as whitespace, that should minimise the amount of
> > > adapting we need to do to existing major modes.
> 
> > We need to consider the amount of adaptations in the low-level
> > infrastructure code as well, not only on the application level.
> 
> I think the adaptations to the regexp engine would be far less work than
> adapting many thousands of regexps in major modes we want to use as
> sub-modes.  For example there are 115 occurrences in CC Mode of just the
> exact string "[ \t".

Please let's not forget that regexps are used in many places that have
no relation whatsoever to major modes, and searching for whitespace is
a very common operation using regular expressions.  Infecting all
those with this new meaning of whitespace that is totally alien to any
code that doesn't deal with major mode is IMO plain wrong.

More generally, I think we should first and foremost make our goal to
have a clean and reasonably simple design, and only care about the
amount of changes in major mode code as a secondary goal.  Thinking
about the changes in major modes first could easily lead us astray.

> Bear in mind that this matching of an island by a whitespace regexp
> element would happen ONLY whilst `in-islands' was bound to non-nil, i.e.
> when a major mode is working in its own island chain.

I understand, but I don't think this goes far enough to address my
concerns.  And my suggestion to have a separate class/category will
serve your needs just as well, so I'm unsure why we need to piggyback
[:space:].

> Are there any circumstances in which we would not want the major
> mode to see the gap between its islands as WS?

Who says that every major mode necessarily treats whitespace as you
assume?  Most (or even all) of those you know about might, but this is
not written anywhere as a limitation of a major mode.  By hard-wiring
this special meaning of [:space:] into your design, you are limiting
future (and possibly some rare extant) major modes.

> When `in-islands' is nil (i.e. when the super mode's code is
> running, or the user is typing commands) the islands would NOT match
> a WS regexp.

Are you sure that none of the background processing will ever need to
treat islands as such?  I'm talking about stuff like timers, process
filters and sentinels, hook functions run by redisplay and the command
loop, etc.  If any of these might need to observe the island rules and
restrictions, the design which builds on in-islands being bound to
non-nil _only_ when the major mode is running its own code is
unreliable, and will cause unrelated code to find itself dealing with
island peculiarities.  E.g., JIT font-lock runs off an idle timer, but
clearly needs to observe islands, so it sounds like the problem I'm
worried about is pretty much into our faces.

> > By contrast, if we decide that whitespace matches an island, we are
> > opening a giant can of worms.  Here's one worm out of that can: some
> > low-level operations need to search the buffer using regexps
> > disregarding any narrowing -- what you suggest means these operations
> > cannot safely use whitespace in their regexps.  This is something to
> > stay away of, IMO.
> 
> It depends on whether these low level operations are working within an
> island chain (`in-islands' non-nil) or on the buffer as a whole
> (`in-islands' nil).  I think such operations would typically be run with
> `in-islands' nil, hence would not run up against these problems.

"Typically" is not good enough, IMO.  We must convince ourselves that
this happens _always_, and there will _never_ be a reasonably
justifiable need to search the entire buffer for whitespace when
in-islands is non-nil, i.e. in any of the code that is running as a
side-effect of performing some major-mode related operation.

> > > CVAR would get the current chain from the `island' (or `chain') text
> > > property at the position.
> 
> > If it is stored in the text property, then you will have to decide
> > what happens when text is copied and yanked elsewhere.
> 
> It would be the job of the `island-after-change-function' to strip the
> unwanted text properties (both the `island' and `syntax-table' ones) and
> to apply any needed new ones to the yanked region.

The problem is the decision whether they are unwanted or not.  It's
usually not simple to make that decision for text properties that
change the way text is displayed, when surrounding text also affects
that.

> > > Otherwise it would access the appropriate named element in the struct
> > > chain.  I think CVAR would take three parameters: the variable name, the
> > > buffer, and the buffer position.
> 
> > Can you show a pseudo-code of CVAR?  I'm afraid I'm missing something
> > here, because I don't see clearly what you have in mind.
> 
> I'll try.  Something like this:
> 
> #define CVAR(var, buf, position) \
>     chain = read_text_property (Qisland, buf, position), \
>     chain ? chain.var \
>           : BVAR (var, buf)
>         
> , but I don't think that would be a valid Lvalue in C.  :-(

Didn't you talk about some alist to look up?  I see no alist look up
in this pseudo-code.  And 'chain.var' sounds wrong, since 'chain' is
definitely a Lisp object, not a C struct.  Or maybe I don't understand
what hides behind read_text_property.

> > > Other chain local variables would be accessed through an alist in the
> > > struct chain holding miscellaneous variables, exactly as is done for
> > > the other buffer local variables in struct buffer.
> 
> > There's no such alist in how we access buffer-local variables, not
> > AFAIK.  Again, I must be missing something here.
> 
> Or, maybe I am.  I thought that the slot `local_var_alist_' in the struct
> buffer held the bindings of all the non-BVAR local variables, as an
> alist.

Ah, you were talking about local_var_alist_...  OK, but then I don't
see anything like that in CVAR above.

> I'm not at all clear on when and how buffer local variable
> bindings get swapped in and out of, say, C variables like Vfoo.

This happens when we switch buffers, see set_buffer_internal_1.  But
that function is driven by an explicit event of switching buffers,
while in your design you need to do something similar when point
crosses some buffer position, which is a much more subtle event.
E.g., think about all the save-excursion and save-restriction code out
there.

> > > > This actually sounds like a simple extension of narrowing, so I wonder
> > > > why do we need so many new object types and notions.
> 
> > > I think it's more like a complicated extension of narrowing.  :-)
> 
> > It's simple because instead of one region you have more than one, and
> > the user-level commands don't affect them.  All the other changes are
> > exact reproduction of what narrowing does.
> 
> > > I think that chain local variables are essential to multiple major
> > > modes - you can't have m.m.m. without some sort of chain locality.
> 
> > What is "chain locality"?
> 
> Having things (variables) which are local to a chain, as opposed to
> global variables or buffer local variables or frame local variables.

OK, but no one said that applying a restriction and making
island-specific bindings of variables must be parts of the same
feature.  They could be 2 separate features instead.

> > 	  base_face_id = it->string_from_prefix_prop_p
> > 	    ? (!NILP (Vface_remapping_alist)
> > 	       ? lookup_basic_face (it->f, DEFAULT_FACE_ID)
> > 	       : DEFAULT_FACE_ID)
> > 	    : underlying_face_id (it);
> 
> > Another example (which I also mentioned) is standard-display-table:
> 
> >   /* Use the standard display table for displaying strings.  */
> >   if (DISP_TABLE_P (Vstandard_display_table))
> >     it->dp = XCHAR_TABLE (Vstandard_display_table);
> 
> > See? no BVAR anywhere in sight.
> 
> OK.  But `face-remapping-alist' can definitely be made buffer local, and
> `standard-display-table' most probably can.

They both are.

> There will be some mechanism (which I don't currently understand) by
> which buffer local values are swapped into and out of
> Vface_remapping_alist when the current buffer changes.

See above: that mechanism is part of the function that switches to
another buffer.

> Surely a similar mechanism could be created for when the current
> island changes.

The issue is to make it as cheap as possible, because redisplay code
is at liberty to move around the buffer at will, and the location
where it examines buffer text is not directly related to point.

> > Something bothers me there.  What will "M-<" and "M->" do, if
> > point-min and point-max are limited to the current island?  Likewise
> > the search commands -- they cannot be limited to the current island,
> > unless the user explicitly says so (and personally, I don't envision
> > users to ask to be so limited).
> 
> Those restrictions will only apply when `in-islands' is bound to non-nil,
> i.e. when major mode code is running.  It will be nil when the user types
> in M-<, hence point will move to the beginning of the (visible region of
> the) buffer.

See above: there might be some situations, like JIT font-lock, where
you will want to have in-islands non-nil while running async code, and
that might make the islands visible to code that is not strictly part
of any major mode, like the infrastructure which invokes these async
parts of Emacs code.  So I think you need to consider the effects of
those on more than just major modes.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-23  7:39         ` Eli Zaretskii
@ 2016-04-23 17:02           ` Alan Mackenzie
  2016-04-23 18:12             ` Eli Zaretskii
  0 siblings, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2016-04-23 17:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: dgutov, emacs-devel

Hello, Eli.

On Sat, Apr 23, 2016 at 10:39:55AM +0300, Eli Zaretskii wrote:
> > Date: Fri, 22 Apr 2016 22:35:08 +0000
> > Cc: emacs-devel@gnu.org, dgutov@yandex.ru
> > From: Alan Mackenzie <acm@muc.de>

[ .... ]

> Please let's not forget that regexps are used in many places that have
> no relation whatsoever to major modes, and searching for whitespace is
> a very common operation using regular expressions.  Infecting all
> those with this new meaning of whitespace that is totally alien to any
> code that doesn't deal with major mode is IMO plain wrong.

> More generally, I think we should first and foremost make our goal to
> have a clean and reasonably simple design, and only care about the
> amount of changes in major mode code as a secondary goal.  Thinking
> about the changes in major modes first could easily lead us astray.

We must consider both these things together.  A prime design goal is to
allow an arbitrary major mode to be used by a super mode with the minimum
of adaptation to the major mode, ideally none.

> > Bear in mind that this matching of an island by a whitespace regexp
> > element would happen ONLY whilst `in-islands' was bound to non-nil, i.e.
> > when a major mode is working in its own island chain.

> I understand, but I don't think this goes far enough to address my
> concerns.  And my suggestion to have a separate class/category will
> serve your needs just as well, so I'm unsure why we need to piggyback
> [:space:].

If this new category (say, "[:gap:]") only needed to be used in super
modes or subsystems, I might agree with you.  But if [:gap:] needs to be
used in major mode code, that involves massive amounts of editing, which
would make the new mechanism much less useful.

> > Are there any circumstances in which we would not want the major
> > mode to see the gap between its islands as WS?

> Who says that every major mode necessarily treats whitespace as you
> assume?  Most (or even all) of those you know about might, but this is
> not written anywhere as a limitation of a major mode.

It will become expected of a major mode that it doesn't tamper with stuff
outside of its own island chain(s).  That would violate the abstraction
that the island mechanism represents.

> By hard-wiring this special meaning of [:space:] into your design, you
> are limiting future (and possibly some rare extant) major modes.

I don't think it's all that special.  It's natural.  Ideally, a major
mode should see its island chain as the whole buffer.  It should be
unaware that it is running in an island at all.  I see this treatment of
a :gap: as whitespace as the most natural way of implementing this.

Major modes which would violate the island abstraction, if there are any,
shouldn't be used in this new multi major mode mechanism.

> > When `in-islands' is nil (i.e. when the super mode's code is
> > running, or the user is typing commands) the islands would NOT match
> > a WS regexp.

> Are you sure that none of the background processing will ever need to
> treat islands as such?  I'm talking about stuff like timers, process
> filters and sentinels, hook functions run by redisplay and the command
> loop, etc.

All these subsystems will need to be aware of whether they are dealing
with the buffer as a whole, or merely with an island chain.  They will
need to bind `in-islands' appropriately, frequently using the value that
was current when they were invoked.

> If any of these might need to observe the island rules and
> restrictions, the design which builds on in-islands being bound to
> non-nil _only_ when the major mode is running its own code is
> unreliable, and will cause unrelated code to find itself dealing with
> island peculiarities.

Perhaps I expressed myself too forcefully and literally.  If any of these
things are dealing with an island as a unit, they are surely "running
major mode code" in this sense.  Clearly things like redisplay and font
lock will need explicitly to set and clear `in-islands' when they are
dealing with island chain stuff.

> E.g., JIT font-lock runs off an idle timer, but clearly needs to
> observe islands, so it sounds like the problem I'm worried about is
> pretty much into our faces.

The font-lock and jit-lock entry points will need to set `in-islands'.

> > > By contrast, if we decide that whitespace matches an island, we are
> > > opening a giant can of worms.  Here's one worm out of that can: some
> > > low-level operations need to search the buffer using regexps
> > > disregarding any narrowing -- what you suggest means these operations
> > > cannot safely use whitespace in their regexps.  This is something to
> > > stay away of, IMO.

> > It depends on whether these low level operations are working within an
> > island chain (`in-islands' non-nil) or on the buffer as a whole
> > (`in-islands' nil).  I think such operations would typically be run with
> > `in-islands' nil, hence would not run up against these problems.

> "Typically" is not good enough, IMO.  We must convince ourselves that
> this happens _always_, and there will _never_ be a reasonably
> justifiable need to search the entire buffer for whitespace when
> in-islands is non-nil, i.e. in any of the code that is running as a
> side-effect of performing some major-mode related operation.

I agree with that.

[ .... ]

> > > If it is stored in the text property, then you will have to decide
> > > what happens when text is copied and yanked elsewhere.

> > It would be the job of the `island-after-change-function' to strip the
> > unwanted text properties (both the `island' and `syntax-table' ones) and
> > to apply any needed new ones to the yanked region.

> The problem is the decision whether they are unwanted or not.  It's
> usually not simple to make that decision for text properties that
> change the way text is displayed, when surrounding text also affects
> that.

But that decision has to made somewhere, somehow, by the super mode,
regardless of how multiple major modes are implemented.  Just for
clarity, `island-after-change-function' is a hook, not a fixed function,
and writing a super mode's function for this hook would be a substantial
part of writing that mode.

[ .... ]

> > #define CVAR(var, buf, position) \
> >     chain = read_text_property (Qisland, buf, position), \
> >     chain ? chain.var \
> >           : BVAR (var, buf)

> > , but I don't think that would be a valid Lvalue in C.  :-(

> Didn't you talk about some alist to look up?  I see no alist look up
> in this pseudo-code.  And 'chain.var' sounds wrong, since 'chain' is
> definitely a Lisp object, not a C struct.  Or maybe I don't understand
> what hides behind read_text_property.

I think we're talking at cross purposes.  I'm proposing that CVAR would
take the place of BVAR for many of the variables which are slots in the
struct buffer, and they would also become slots in the new struct chain.
The other variables, currently held in `local_var_alist_' in struct
buffer, would be accessed from a `local_var_alist_' element of the struct
chain in much the same way.

Or perhaps there is a better way of implementing chain local variables.
It is more of an implementation detail than an essential part of the
design.

[ .... ]

> > I'm not at all clear on when and how buffer local variable
> > bindings get swapped in and out of, say, C variables like Vfoo.

> This happens when we switch buffers, see set_buffer_internal_1.

Thanks!  I see that now.  I had spent some time looking for that code in
data.c.

> But that function is driven by an explicit event of switching buffers,
> while in your design you need to do something similar when point
> crosses some buffer position, which is a much more subtle event.  E.g.,
> think about all the save-excursion and save-restriction code out there.

A good point.

[ .... ]

> > > What is "chain locality"?

> > Having things (variables) which are local to a chain, as opposed to
> > global variables or buffer local variables or frame local variables.

> OK, but no one said that applying a restriction and making
> island-specific bindings of variables must be parts of the same
> feature.  They could be 2 separate features instead.

Maybe they could.  Maybe it wouldn't make much sense.  I'd have to think
about that a bit more.

[ .... ]

> > Surely a similar mechanism [ to swapping buffer local bindings into
> > and out of fixed variables in the C code ] could be created for when
> > the current island changes.

> The issue is to make it as cheap as possible, because redisplay code
> is at liberty to move around the buffer at will, and the location
> where it examines buffer text is not directly related to point.

Yes, this would require careful design and coding.  One detail struck me
immediately on seeing the code in set_buffer_internal_1.  The code has to
cdr its way down the entire list of variables in local_var_alist_,
despite the fact that only a few of them point to C variables.  Maybe it
would make sense to extract this smaller list into a separate chain.

[ .... ]

> See above: there might be some situations, like JIT font-lock, where
> you will want to have in-islands non-nil while running async code, and
> that might make the islands visible to code that is not strictly part
> of any major mode, like the infrastructure which invokes these async
> parts of Emacs code.  So I think you need to consider the effects of
> those on more than just major modes.

Yes, indeed.  The challenge here will be to identify all of the pertinent
subsystems.

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-23 17:02           ` Alan Mackenzie
@ 2016-04-23 18:12             ` Eli Zaretskii
  2016-04-23 18:26               ` Dmitry Gutov
  2016-04-23 21:08               ` Alan Mackenzie
  0 siblings, 2 replies; 45+ messages in thread
From: Eli Zaretskii @ 2016-04-23 18:12 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: dgutov, emacs-devel

> Date: Sat, 23 Apr 2016 17:02:08 +0000
> Cc: emacs-devel@gnu.org, dgutov@yandex.ru
> From: Alan Mackenzie <acm@muc.de>
> 
> > More generally, I think we should first and foremost make our goal to
> > have a clean and reasonably simple design, and only care about the
> > amount of changes in major mode code as a secondary goal.  Thinking
> > about the changes in major modes first could easily lead us astray.
> 
> We must consider both these things together.  A prime design goal is to
> allow an arbitrary major mode to be used by a super mode with the minimum
> of adaptation to the major mode, ideally none.

I think you make this goal the main one, and that is a mistake.  The
changes that will be needed for supporting multiple modes in the same
buffer will be extensive, whether you want it or not, so trying too
hard to make it easier on modes to adapt will skew the design.

> > By hard-wiring this special meaning of [:space:] into your design, you
> > are limiting future (and possibly some rare extant) major modes.
> 
> I don't think it's all that special.  It's natural.

IME, authors who write Emacs features are known to not limit
themselves to only those things that the infrastructure designers deem
"natural".

> > Are you sure that none of the background processing will ever need to
> > treat islands as such?  I'm talking about stuff like timers, process
> > filters and sentinels, hook functions run by redisplay and the command
> > loop, etc.
> 
> All these subsystems will need to be aware of whether they are dealing
> with the buffer as a whole, or merely with an island chain.  They will
> need to bind `in-islands' appropriately, frequently using the value that
> was current when they were invoked.

Which means that code that was never aware of any "current mode" will
need to adapt.  For example, BEGV and ZV (a.k.a pointy-min and
point-max) will be suddenly limited to an island while such code runs.
That's a major issue, IMO, something that will need changes in many
places.
> > > > If it is stored in the text property, then you will have to decide
> > > > what happens when text is copied and yanked elsewhere.
> 
> > > It would be the job of the `island-after-change-function' to strip the
> > > unwanted text properties (both the `island' and `syntax-table' ones) and
> > > to apply any needed new ones to the yanked region.
> 
> > The problem is the decision whether they are unwanted or not.  It's
> > usually not simple to make that decision for text properties that
> > change the way text is displayed, when surrounding text also affects
> > that.
> 
> But that decision has to made somewhere, somehow, by the super mode,
> regardless of how multiple major modes are implemented.

If the implementation is not based on text properties, then it doesn't
have to.

> One detail struck me immediately on seeing the code in
> set_buffer_internal_1.  The code has to cdr its way down the entire
> list of variables in local_var_alist_, despite the fact that only a
> few of them point to C variables.  Maybe it would make sense to
> extract this smaller list into a separate chain.

You can't: redisplay allows Lisp evaluation in some places (like the
mode line), and any Lisp run there will expect to find buffer-local
bindings of all the variables.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-23 18:12             ` Eli Zaretskii
@ 2016-04-23 18:26               ` Dmitry Gutov
  2016-04-23 21:08               ` Alan Mackenzie
  1 sibling, 0 replies; 45+ messages in thread
From: Dmitry Gutov @ 2016-04-23 18:26 UTC (permalink / raw)
  To: Eli Zaretskii, Alan Mackenzie; +Cc: emacs-devel

On 04/23/2016 09:12 PM, Eli Zaretskii wrote:

>> We must consider both these things together.  A prime design goal is to
>> allow an arbitrary major mode to be used by a super mode with the minimum
>> of adaptation to the major mode, ideally none.
>
> I think you make this goal the main one, and that is a mistake.  The
> changes that will be needed for supporting multiple modes in the same
> buffer will be extensive, whether you want it or not, so trying too
> hard to make it easier on modes to adapt will skew the design.

+1. I also think we can afford to require some changes to the major mode 
code, as long as they're simple, and it's easy to spot whether they have 
been made. A hundred or so regexps to change is not that much if the 
design is otherwise sound.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-23 18:12             ` Eli Zaretskii
  2016-04-23 18:26               ` Dmitry Gutov
@ 2016-04-23 21:08               ` Alan Mackenzie
  2016-04-24  6:29                 ` Eli Zaretskii
  1 sibling, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2016-04-23 21:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: dgutov, emacs-devel

Hello, Eli.

On Sat, Apr 23, 2016 at 09:12:25PM +0300, Eli Zaretskii wrote:
> > Date: Sat, 23 Apr 2016 17:02:08 +0000
> > Cc: emacs-devel@gnu.org, dgutov@yandex.ru
> > From: Alan Mackenzie <acm@muc.de>

> > > More generally, I think we should first and foremost make our goal to
> > > have a clean and reasonably simple design, and only care about the
> > > amount of changes in major mode code as a secondary goal.  Thinking
> > > about the changes in major modes first could easily lead us astray.

> > We must consider both these things together.  A prime design goal is to
> > allow an arbitrary major mode to be used by a super mode with the minimum
> > of adaptation to the major mode, ideally none.

> I think you make this goal the main one, and that is a mistake.  The
> changes that will be needed for supporting multiple modes in the same
> buffer will be extensive, whether you want it or not, so trying too
> hard to make it easier on modes to adapt will skew the design.

Let me put things another way.  Above all, I want this new facility to be
based on clean abstractions.  Such are generally easier to code, easier
to understand, and easier to debug, should such be necessary.  And I
assure you that in my head, the abstractions, particularly that of
islands, came before the design.

I see three layers of software, here:  Major modes, super modes, and
subsystems.  What is the relationship of each of them to islands?

Super modes essentially deal with islands - that is what their main
purpose is.  They create islands, they destroy them, possibly they
coalesce them, they coordinate the rare interactions between islands
(yanking for example), they coordinate change hooks as they affect
islands.  Most of the changes I have proposed is in features directly to
support super modes' handling of islands.

Subsystems code, like redisplay, font locking, timers, ...., is going to
have to deal with islands incidentally - that is not its main purpose,
but there is no getting away from it.  A redisplay action might act on
several islands, so might a font locking action.  And so on.

But major modes?  The abstraction I propose is that major modes see their
own parts of the buffer as the entire buffer, and know nothing of
islands or gaps between them.  This is a clean abstraction and will lead
to all the advantages enumerated a few paragraphs back.

Eli, you seem to disagree with the above analysis.  Would you like to
outline your scheme of abstractions on this topic?

You say that extensive changes will be needed to support multiple modes
in a buffer, and this is clearly true.  Where we seem to differ is where
these changes should be made.  I want the vast bulk of these changes to
be in super mode support and subsystems.  You seem additionally to want
to make subtantial changes in the major mode "layer".  I cannot see this
as a good thing at the moment.

[ .... ]

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-23 21:08               ` Alan Mackenzie
@ 2016-04-24  6:29                 ` Eli Zaretskii
  2016-04-24 16:57                   ` Alan Mackenzie
  0 siblings, 1 reply; 45+ messages in thread
From: Eli Zaretskii @ 2016-04-24  6:29 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: dgutov, emacs-devel

> Date: Sat, 23 Apr 2016 21:08:07 +0000
> Cc: emacs-devel@gnu.org, dgutov@yandex.ru
> From: Alan Mackenzie <acm@muc.de>
> 
> I see three layers of software, here:  Major modes, super modes, and
> subsystems.  What is the relationship of each of them to islands?
> 
> Super modes essentially deal with islands - that is what their main
> purpose is.  They create islands, they destroy them, possibly they
> coalesce them, they coordinate the rare interactions between islands
> (yanking for example), they coordinate change hooks as they affect
> islands.  Most of the changes I have proposed is in features directly to
> support super modes' handling of islands.
> 
> Subsystems code, like redisplay, font locking, timers, ...., is going to
> have to deal with islands incidentally - that is not its main purpose,
> but there is no getting away from it.  A redisplay action might act on
> several islands, so might a font locking action.  And so on.
> 
> But major modes?  The abstraction I propose is that major modes see their
> own parts of the buffer as the entire buffer, and know nothing of
> islands or gaps between them.  This is a clean abstraction and will lead
> to all the advantages enumerated a few paragraphs back.
> 
> Eli, you seem to disagree with the above analysis.  Would you like to
> outline your scheme of abstractions on this topic?

Most of my comments were not about the abstractions.  I don't have any
alternative scheme to offer, because I have no experience in using,
let alone writing, multiple modes in the same buffer.

> You say that extensive changes will be needed to support multiple modes
> in a buffer, and this is clearly true.  Where we seem to differ is where
> these changes should be made.  I want the vast bulk of these changes to
> be in super mode support and subsystems.  You seem additionally to want
> to make subtantial changes in the major mode "layer".  I cannot see this
> as a good thing at the moment.

I'm saying that worrying about the amount of changes in major modes at
this stage is premature optimization.  If major modes will have to
adapt themselves in non-trivial ways, e.g. by changing their regexps
or font-lock settings, it's not a big deal.  It is much more important
to make sure the design doesn't contradict more basic assumptions and
design principles of Emacs, including the low-level code which
implements searching, syntax, redisplay, etc., because if the
contradiction does happen, you will at best have a bunch of hairy
problems to solve, and at worst will simply fail to produce a workable
solution.

IOW, I suggest to forget for a while about the amount of changes major
modes will need, and leave that for later.  At this stage, you should
be worried much more about how core design features of Emacs will work
with islands, and make sure you have all that figured out, before you
decide that the island design is valid.  In practice, this means that,
for example, I would expect you to study all the uses of search in the
low-level code, before you decide that making [:space:] match an
island edge is sound.  E.g., did you know that even bidi.c, which is
about as low-level as you can get, uses regexp search to look for a
certain combination of whitespace characters?  Did you consider how
this will work when islands are in the way?  What about basic features
like find_newline -- did you look into that?  You see, if any of these
break due to islands, you have some major rewrites on your hands, and
the ripples will probably be very far-reaching.  The need to change
major modes pales by comparison.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-24  6:29                 ` Eli Zaretskii
@ 2016-04-24 16:57                   ` Alan Mackenzie
  2016-04-24 19:59                     ` Eli Zaretskii
  0 siblings, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2016-04-24 16:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: dgutov, emacs-devel

Hello, Eli.

On Sun, Apr 24, 2016 at 09:29:58AM +0300, Eli Zaretskii wrote:
> > Date: Sat, 23 Apr 2016 21:08:07 +0000
> > Cc: emacs-devel@gnu.org, dgutov@yandex.ru
> > From: Alan Mackenzie <acm@muc.de>

> > I see three layers of software, here:  Major modes, super modes, and
> > subsystems.  What is the relationship of each of them to islands?

> > Super modes essentially deal with islands - that is what their main
> > purpose is.  They create islands, they destroy them, possibly they
> > coalesce them, they coordinate the rare interactions between islands
> > (yanking for example), they coordinate change hooks as they affect
> > islands.  Most of the changes I have proposed is in features directly to
> > support super modes' handling of islands.

> > Subsystems code, like redisplay, font locking, timers, ...., is going to
> > have to deal with islands incidentally - that is not its main purpose,
> > but there is no getting away from it.  A redisplay action might act on
> > several islands, so might a font locking action.  And so on.

> > But major modes?  The abstraction I propose is that major modes see their
> > own parts of the buffer as the entire buffer, and know nothing of
> > islands or gaps between them.  This is a clean abstraction and will lead
> > to all the advantages enumerated a few paragraphs back.

> > Eli, you seem to disagree with the above analysis.  Would you like to
> > outline your scheme of abstractions on this topic?

> Most of my comments were not about the abstractions.  I don't have any
> alternative scheme to offer, because I have no experience in using,
> let alone writing, multiple modes in the same buffer.

But you have undoubtedly suffered the frustration of bits of scripts,
possibly html files, and the like, not being properly fontified/indented
for lack of such multiple modes.

> > You say that extensive changes will be needed to support multiple modes
> > in a buffer, and this is clearly true.  Where we seem to differ is where
> > these changes should be made.  I want the vast bulk of these changes to
> > be in super mode support and subsystems.  You seem additionally to want
> > to make subtantial changes in the major mode "layer".  I cannot see this
> > as a good thing at the moment.

> I'm saying that worrying about the amount of changes in major modes at
> this stage is premature optimization.

You seem to be saying I should abandon "abstraction A" (that major modes
should remain unaware of islands) as a design principle.  Without this
principle, I'm not sure how much of my design notes make any sense.  I
certainly have no idea of what to replace it by.

> If major modes will have to adapt themselves in non-trivial ways, e.g.
> by changing their regexps or font-lock settings, it's not a big deal.

How do you know?  What I foresee happening is a lot of island handling
code being duplicated many times over, over many major modes.  I think
that is a big deal.

> It is much more important to make sure the design doesn't contradict
> more basic assumptions and design principles of Emacs, including the
> low-level code which implements searching, syntax, redisplay, etc.,
> because if the contradiction does happen, you will at best have a bunch
> of hairy problems to solve, and at worst will simply fail to produce a
> workable solution.

The very basic assumption that each buffer has exactly one major mode is
being superseded.  That is bound to have repercussions on several other
assumptions which are dependent on it, including in the ones you
identify.  Searching, syntax, redisply, etc., will all need to be adapted
because that basic assumption (one major mode) will no longer hold.  The
challenge is to identify all the code that implicitly assumes that
assumption.

I think some of these other dependent assumptions will become ambiguous.
For example, at the moment BEG and Z point to the start and end of the
part of the buffer the current major mode administers (this being the
entire buffer).  Nobody up till now has bothered to separate those two
meanings of BEG and Z.  Such disambiguation will be necessary to support
multiple major modes.  I've already proposed doing this by means of the
magic variable `in-islands'.

> IOW, I suggest to forget for a while about the amount of changes major
> modes will need, and leave that for later.  At this stage, you should
> be worried much more about how core design features of Emacs will work
> with islands, and make sure you have all that figured out, before you
> decide that the island design is valid.

I have spent quite some time studying data.c, syntax.c, xdisp.c,
buffer.[ch], lots of font locking code, and likely quite a few other
relevant files.  I haven't come across anything that would be difficult
to adapt for the island mechanism - just there's a lot to adapt.

> In practice, this means that, for example, I would expect you to study
> all the uses of search in the low-level code, before you decide that
> making [:space:] match an island edge is sound.

[ Actually, it's the entire island I'm proposing be matched as WS. ]

I tend to approach it from the other direction: is that handling of an
island as whitespace a satisfactory abstraction or not?  If it is, the
code will follow.  If it's not, attempts to apply it will collapse in
confusion, probably quite quickly.

> E.g., did you know that even bidi.c, which is about as low-level as you
> can get, uses regexp search to look for a certain combination of
> whitespace characters?

No, but it doesn't surprise me.

> Did you consider how this will work when islands are in the way?

Yes.  The bulk of the adaptation to bidi.c will be the generic changes in
search.c, etc., so that the bidi.c regexps will continue to work despite
the text it's matching over being two islands with a gap in the middle.

I know little about bidi, but there might have to be design decisions
made about how it should behave when the text it's dealing with isn't
contiguous in the whole buffer.

> What about basic features like find_newline -- did you look into that?
> You see, if any of these break due to islands, you have some major
> rewrites on your hands, and the ripples will probably be very
> far-reaching.  The need to change major modes pales by comparison.

No, I hadn't looked at find_newline.  But it will need looking at
regardless of whether a space in a regexp matches an island.  At the very
least, it will have to behave differently for finding newlines in an
island chain rather than finding them in the whole buffer.

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-24 16:57                   ` Alan Mackenzie
@ 2016-04-24 19:59                     ` Eli Zaretskii
  2016-04-25  6:49                       ` Andreas Röhler
  0 siblings, 1 reply; 45+ messages in thread
From: Eli Zaretskii @ 2016-04-24 19:59 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: dgutov, emacs-devel

> Date: Sun, 24 Apr 2016 16:57:21 +0000
> Cc: emacs-devel@gnu.org, dgutov@yandex.ru
> From: Alan Mackenzie <acm@muc.de>
> 
> > Most of my comments were not about the abstractions.  I don't have any
> > alternative scheme to offer, because I have no experience in using,
> > let alone writing, multiple modes in the same buffer.
> 
> But you have undoubtedly suffered the frustration of bits of scripts,
> possibly html files, and the like, not being properly fontified/indented
> for lack of such multiple modes.

No, not really.

> You seem to be saying I should abandon "abstraction A" (that major modes
> should remain unaware of islands) as a design principle.

It's not an abstraction, it's a design goal.  And yes, I think you
need to forget about it for a while.

> Without this principle, I'm not sure how much of my design notes
> make any sense.

I don't see how that invalidates your proposal.

> I certainly have no idea of what to replace it by.

I suggest replacing it with nothing.  Minimizing changes in major
modes (and elsewhere) is a simple economy principle; you don't need to
worry about us forgetting it.

> > If major modes will have to adapt themselves in non-trivial ways, e.g.
> > by changing their regexps or font-lock settings, it's not a big deal.
> 
> How do you know?  What I foresee happening is a lot of island handling
> code being duplicated many times over, over many major modes.  I think
> that is a big deal.

If it is, we will cross that bridge when we get to it.

> > It is much more important to make sure the design doesn't contradict
> > more basic assumptions and design principles of Emacs, including the
> > low-level code which implements searching, syntax, redisplay, etc.,
> > because if the contradiction does happen, you will at best have a bunch
> > of hairy problems to solve, and at worst will simply fail to produce a
> > workable solution.
> 
> The very basic assumption that each buffer has exactly one major mode is
> being superseded.  That is bound to have repercussions on several other
> assumptions which are dependent on it, including in the ones you
> identify.  Searching, syntax, redisply, etc., will all need to be adapted
> because that basic assumption (one major mode) will no longer hold.  The
> challenge is to identify all the code that implicitly assumes that
> assumption.

There's exactly zero references to major mode in C sources.  (There's
a function to store the major mode in the corresponding slot of the
buffer object, but I see no code looking that slot's value.)  And for
a good reason: the major mode is an entirely Lisp-land phenomenon, it
does all of its work by setting local variables and hook functions.

So I think your assumption that having more than one mode in a buffer
is already a cataclysm is incorrect.

> I think some of these other dependent assumptions will become ambiguous.
> For example, at the moment BEG and Z point to the start and end of the
> part of the buffer the current major mode administers (this being the
> entire buffer).  Nobody up till now has bothered to separate those two
> meanings of BEG and Z.  Such disambiguation will be necessary to support
> multiple major modes.  I've already proposed doing this by means of the
> magic variable `in-islands'.

Indeed, I'm much more worried by the effect of islands in BEGV and ZV
than by the fact that there could be more than one major mode active.
Unlike references to the major mode, the number of places that use
BEGV and ZV is enormous, and the unwritten assumptions about them are
abundant and well entrenched.

> I have spent quite some time studying data.c, syntax.c, xdisp.c,
> buffer.[ch], lots of font locking code, and likely quite a few other
> relevant files.  I haven't come across anything that would be difficult
> to adapt for the island mechanism - just there's a lot to adapt.

We should try to minimize that impact as much as we can.

> I tend to approach it from the other direction: is that handling of an
> island as whitespace a satisfactory abstraction or not?

It's not an abstraction at all.  It's a trick, a device to make
adaptation to the island-world easier.  That text between two islands
of the same chain should be invisible for the mode that's active in
the chain -- that is an abstraction.  But no one says that text must
be treated as whitespace -- this is simply a convenient means to reach
your ends.  However, other means towards the same end might be
available, onces that don't overload [:space:] with an entirely alien
meaning.

> > Did you consider how [bidi.c search] will work when islands are in the way?
> 
> Yes.  The bulk of the adaptation to bidi.c will be the generic changes in
> search.c, etc., so that the bidi.c regexps will continue to work despite
> the text it's matching over being two islands with a gap in the middle.

Doesn't that contradict your design to limit point-min and point-max,
since redisplay must be island-aware?

> I know little about bidi, but there might have to be design decisions
> made about how it should behave when the text it's dealing with isn't
> contiguous in the whole buffer.

These design decisions should predate the island-as-whitespace
discussion, IMNSHO.  And if you are sure this feature cannot happen
without affecting bidi.c and search.c, then yes, you should study
those and understand what they do and how.

> No, I hadn't looked at find_newline.  But it will need looking at
> regardless of whether a space in a regexp matches an island.  At the very
> least, it will have to behave differently for finding newlines in an
> island chain rather than finding them in the whole buffer.

See, the ripples are starting already.  That is why I say we should
try to find a design that doesn't require rethinking, redesigning, and
rewriting every single piece of our infrastructure.  If we don't, we
are making the implementation and testing of this feature a much more
complex and hard job than it must be.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-24 19:59                     ` Eli Zaretskii
@ 2016-04-25  6:49                       ` Andreas Röhler
  0 siblings, 0 replies; 45+ messages in thread
From: Andreas Röhler @ 2016-04-25  6:49 UTC (permalink / raw)
  To: emacs-devel; +Cc: Alan Mackenzie, Eli Zaretskii



On 24.04.2016 21:59, Eli Zaretskii wrote:
> [ ...  ]
>> I tend to approach it from the other direction: is that handling of an
>> island as whitespace a satisfactory abstraction or not?
> It's not an abstraction at all.  It's a trick, a device to make
> adaptation to the island-world easier.  That text between two islands
> of the same chain should be invisible for the mode that's active in
> the chain -- that is an abstraction.  But no one says that text must
> be treated as whitespace -- this is simply a convenient means to reach
> your ends.  However, other means towards the same end might be
> available, onces that don't overload [:space:] with an entirely alien
> meaning.
>

Sounds bug-sourcing. Hard to tell which conflicts might show up, but 
there is some probability.


[ ... ]






^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-21 22:19   ` Alan Mackenzie
  2016-04-22  8:48     ` Eli Zaretskii
@ 2016-04-22 13:42     ` Andy Moreton
  2016-04-23 17:14       ` Alan Mackenzie
  1 sibling, 1 reply; 45+ messages in thread
From: Andy Moreton @ 2016-04-22 13:42 UTC (permalink / raw)
  To: emacs-devel

On Thu 21 Apr 2016, Alan Mackenzie wrote:

> Hello, Eli.
>
> On Thu, Apr 21, 2016 at 05:17:09PM +0300, Eli Zaretskii wrote:
>> > Date: Wed, 20 Apr 2016 19:44:50 +0000
>> > From: Alan Mackenzie <acm@muc.de>
>> > 
>> > This post describes my notion of how multiple major modes {c,sh}ould be
>> > implemented.  Key notions are "islands", "island chains", and "chain
>> > local" variable bindings.
>
>> Thank you for publishing this.  A few comments and questions below.
>> Please keep in mind that I never had to write any Lisp that deals with
>> these issues, so apologies in advance for possibly silly questions and
>> misunderstandings.
>
>> >   o - To the user, the current major mode will be that of the island where
>> >     point is.  All familiar commands will work without restriction.
>
>> Does this mean the display of mode line, menu bar, and tool bar will
>> change accordingly?
>
> Yes, please!
>
>> A more subtle issue is with point movements that are not shown to the
>> user (those done by Lisp code of some command, before redisplay kicks
>> in) -- what will be the effect of those? do they trigger redisplay,
>> for example?
>
> They shouldn't trigger redisplay, no.
>
>> >   o - An island chain will have @dfn{chain local} variable bindings.  Such a
>> >     binding will become current and accessible when point is within one of the
>> >     chain's islands.  When point is not in an island, the buffer local binding
>> >     of the variable will be current.
>
>> Emacs sometimes examines buffer text without moving point, and we
>> generally expect for buffer-local bindings to be in effect regardless.
>> A prominent example is the display engine.  I will return to that
>> later.
>
> OK.
>
>> >     * - [Island] will be covered by the text property `island', whose value will be
>> >       the pertinent island or island chain (see section (ii)) (not yet
>> >       decided).  Note that if islands are enclosed inside other islands, the
>> >       value is the innermost island.  There is the possibility of using an
>> >       interval tree independent of the one for text properties to increase
>> >       performance.
>
>> I don't understand the notion of "enclosed" islands: wouldn't such
>> "enclosing" simply break the "outer" island into two separate islands?
>
> If we mark island start and end with the syntax-table text properties
> "{" and "}", we're going to have something like
>
>     {     a{  }b    }
>
> .  Simply to break the outer island into two pieces, we'd really need to
> apply delimiters at a and b, giving:
>
>     {     }{  }{    }
>
> .  This would overwrite the previous syntaxes at a and b, and this might
> be a Bad Thing.

Care will be needed to allow more than one island chain using the same
inner mode, where the chains represent unrelated documents that are
independently embedded in the larger document.

>> >   o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as
>> >     whitespace, much as they do comments.  They will also treat as whitespace
>> >     the gap between two islands in a chain.
>
>> Why whitespace? why not some new category?  By overloading whitespace,
>> you make things harder on the underlying infrastructure, like regexp
>> search and matching.
>
> I think it's clear that the "foreign" island's syntax has no interaction
> with the current island.  If we treat it as whitespace, that should
> minimise the amount of adapting we need to do to existing major modes.

There may be some interaction. The language used for the enclosing text
(using the super mode) may require quoting and escaping to be performed
on the content embedded in it. This means that the textual
representation of the content in the island chain may depend on what it
is embedded into.

The inner mode for the island chain will either need to be aware of this
quoting and escaping syntax (belonging to the super mode), or the text
in the island chain will need to be unescaped and unquoted for the inner
mode to make sense of it.

    AndyM




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-22 13:42     ` Andy Moreton
@ 2016-04-23 17:14       ` Alan Mackenzie
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Mackenzie @ 2016-04-23 17:14 UTC (permalink / raw)
  To: Andy Moreton; +Cc: emacs-devel

Hello, Andy.

On Fri, Apr 22, 2016 at 02:42:07PM +0100, Andy Moreton wrote:
> On Thu 21 Apr 2016, Alan Mackenzie wrote:

[ .... ]

> Care will be needed to allow more than one island chain using the same
> inner mode, where the chains represent unrelated documents that are
> independently embedded in the larger document.

This is built into the fabric of the mechanism, and shouldn't present a
problem.

> >> >   o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as
> >> >     whitespace, much as they do comments.  They will also treat as whitespace
> >> >     the gap between two islands in a chain.

> >> Why whitespace? why not some new category?  By overloading whitespace,
> >> you make things harder on the underlying infrastructure, like regexp
> >> search and matching.

> > I think it's clear that the "foreign" island's syntax has no interaction
> > with the current island.  If we treat it as whitespace, that should
> > minimise the amount of adapting we need to do to existing major modes.

> There may be some interaction. The language used for the enclosing text
> (using the super mode) may require quoting and escaping to be performed
> on the content embedded in it. This means that the textual
> representation of the content in the island chain may depend on what it
> is embedded into.

Good point!  Thanks.

> The inner mode for the island chain will either need to be aware of this
> quoting and escaping syntax (belonging to the super mode), or the text
> in the island chain will need to be unescaped and unquoted for the inner
> mode to make sense of it.

Umm.  Yes, that could happen.  Hopefully it won't be a big problem in
practice.

>     AndyM

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-20 19:44 A vision for multiple major modes: some design notes Alan Mackenzie
                   ` (2 preceding siblings ...)
  2016-04-21 14:17 ` Eli Zaretskii
@ 2016-04-22 14:33 ` Dmitry Gutov
  2016-04-22 18:58 ` Richard Stallman
  4 siblings, 0 replies; 45+ messages in thread
From: Dmitry Gutov @ 2016-04-22 14:33 UTC (permalink / raw)
  To: Alan Mackenzie, emacs-devel

Hello Alan,

Thank you for writing this expansive summary. I'll comment on a couple 
of the items below, but overall, if you're asking my personal opinion, 
we should put a pin in this for now, and first see where the hard-widen 
feature gets us.

After that, we could see whether it wouldn't be easier to extract 
certain individual pieces of this proposal in an independent fashion. 
The main two I see are:

- A way to assign buffer boundaries that make certain core primitives 
treat some buffer regions as whitespace, maybe with support for nesting. 
I don't know if that should be via text properties. As long as this 
feature is only used dynamically, it could be a list structure stored in 
a dynamic variable. That way the `in-islands' variable would become 
redundant.

- A way to quickly store and switch between sets of buffer-local variables.

If you go ahead with this proposal, though, I think it should be 
implemented in close collaboration with an author of a related package. 
Vitalie Spinu and Christoph Wedler (polymode and andlr-mode maintainers) 
would be good candidates, and neither has shown up at this discussion 
yet. Unfortunately, I don't have a lot of time to dedicate to mmm-mode 
lately (and it probably has the highest backward compatibility 
expectations out of the three anyway).

The main drawbacks of this, IMHO, are that it's big (like you mentioned 
yourself), and that it's fairly opinionated. Hence the two-item list above.

On 04/20/2016 10:44 PM, Alan Mackenzie wrote:

>     * - The coordination of these bindings will be carried out by the
>       mechanisms described below, without explicit coding in the super mode.

This seems a little too optimistic. For instance:

>   o - To the user, the current major mode will be that of the island where
>     point is.  All familiar commands will work without restriction.

Imenu, as one example, will require coordination from the super mode, or 
from the multi-mode framework. The user will normally want to see all of 
entries in the current buffer in the index, so something would have to 
merge them.

>   o - To the writer of major modes, a minimal set of restrictions will apply:
>     * - For some major mode commands, the mode will have to bind the variable
>       `in-islands' (see below) to non-nil.

Ideally, the writers of the "island" major modes wouldn't do anything 
special to support multi-mode usage. It would be better if the 
"superior" major modes would have to do all the "special" things.

I.e., it's fine to have to introduce a new major mode for a templating 
language if it can easily use existing major modes for the code regions 
inside.

Here's a related question: would `indent-for-tab-command' bind 
`in-islands' to t, or not?

> (iv) Islands.
>   o - An island will be delimited in two complementary ways:
>     * - It will be enclosed syntactically by characters with "open island" and
>       "close island" syntax (see section (v)).  Both of these syntactic
>       markers will include a flag "chain" indicating whether there is a
>       previous/next island in the chain.  The cdr of the syntax value will be
>       the island chain to which the island belongs.
>     * - It will be covered by the text property `island', whose value will be
>       the pertinent island or island chain (see section (ii)) (not yet
>       decided).  Note that if islands are enclosed inside other islands, the
>       value is the innermost island.  There is the possibility of using an
>       interval tree independent of the one for text properties to increase
>       performance.

Going by the current implementation in mmm-mode, it would be handy if 
the islands could be distinguished using one text property only. Then we 
simply set it on all overlays that cover mmm-mode's subregions. But if 
all three elements are required, so be it.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-20 19:44 A vision for multiple major modes: some design notes Alan Mackenzie
                   ` (3 preceding siblings ...)
  2016-04-22 14:33 ` Dmitry Gutov
@ 2016-04-22 18:58 ` Richard Stallman
  2016-04-22 20:22   ` Alan Mackenzie
  4 siblings, 1 reply; 45+ messages in thread
From: Richard Stallman @ 2016-04-22 18:58 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: dgutov, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

The design seems to assume that every island starts with a
one-character delimiter that always starts an island, and that there
is anothehr one-character delimiter that always ends an island.

Is that really the intention, or did I misunderstand?

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-22 18:58 ` Richard Stallman
@ 2016-04-22 20:22   ` Alan Mackenzie
  2016-04-23 12:27     ` Andreas Röhler
  2016-04-23 12:38     ` Richard Stallman
  0 siblings, 2 replies; 45+ messages in thread
From: Alan Mackenzie @ 2016-04-22 20:22 UTC (permalink / raw)
  To: Richard Stallman; +Cc: emacs-devel, dgutov

Hello, Richard.

On Fri, Apr 22, 2016 at 02:58:53PM -0400, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]

> The design seems to assume that every island starts with a
> one-character delimiter that always starts an island, and that there
> is anothehr one-character delimiter that always ends an island.

> Is that really the intention, or did I misunderstand?

That's not quite how I see it working.  There needs to be some sort of
delimiter to start an island which must be at least 1 character wide.
On this character/one of these characters, the "super mode" will set an
"open island" syntax-table text property.  Similarly, there must be some
delimiter at the end of the island to set a "close island" property on.

For example, in a shell script with an embedded AWK script:

VARIABLE=$(gawk '<script>' < <input-file>) ....
                ^        ^

, the text properties would be set on the marked characters, making
<script> an island which would be initialised to AWK Mode.  As a minor
point, the delimiters enclose an island, but aren't part of it - they
are part of the surrounding text.		

> -- 
> Dr Richard Stallman
> President, Free Software Foundation (gnu.org, fsf.org)
> Internet Hall-of-Famer (internethalloffame.org)
> Skype: No way! See stallman.org/skype.html.

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-22 20:22   ` Alan Mackenzie
@ 2016-04-23 12:27     ` Andreas Röhler
  2016-04-23 12:38     ` Richard Stallman
  1 sibling, 0 replies; 45+ messages in thread
From: Andreas Röhler @ 2016-04-23 12:27 UTC (permalink / raw)
  To: emacs-devel; +Cc: Alan Mackenzie



On 22.04.2016 22:22, Alan Mackenzie wrote:
> Hello, Richard.
>
> On Fri, Apr 22, 2016 at 02:58:53PM -0400, Richard Stallman wrote:
>> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
>> [[[ whether defending the US Constitution against all enemies,     ]]]
>> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>> The design seems to assume that every island starts with a
>> one-character delimiter that always starts an island, and that there
>> is anothehr one-character delimiter that always ends an island.
>> Is that really the intention, or did I misunderstand?
> That's not quite how I see it working.  There needs to be some sort of
> delimiter to start an island which must be at least 1 character wide.

What about keeping a simple index of modes instead?

Just make major-mode work on current indexed chunk only instead of whole 
buffer.

Cheers,

Andreas







^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-22 20:22   ` Alan Mackenzie
  2016-04-23 12:27     ` Andreas Röhler
@ 2016-04-23 12:38     ` Richard Stallman
  2016-04-23 17:31       ` Alan Mackenzie
  1 sibling, 1 reply; 45+ messages in thread
From: Richard Stallman @ 2016-04-23 12:38 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: dgutov, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

Are you saying that there won't be a character with thpe "open island"
in the syntax table, but rather a text property will give a particular
string in the buffer the "open island" syntax?

That makes more sense.  But I think that in some cases "separate
islands" might be a better designation.  For instance, consider the
three sections of a Bison input file.  They are separated by a
delimiter.  It would be artificial and arbitrary to try to divide up
the delimiter into a string to end the previous island and a string to
start the next one.

Which reminds me that the first island in the Bison input file
has no string to "open" it.  It starts at the start of the buffer.
And the third island ends and the end of the buffer.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-23 12:38     ` Richard Stallman
@ 2016-04-23 17:31       ` Alan Mackenzie
  2016-04-24  9:22         ` Richard Stallman
  0 siblings, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2016-04-23 17:31 UTC (permalink / raw)
  To: Richard Stallman; +Cc: dgutov, emacs-devel

Hello, Richard.

On Sat, Apr 23, 2016 at 08:38:04AM -0400, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]

> Are you saying that there won't be a character with the "open island"
> in the syntax table, but rather a text property will give a particular
> string in the buffer the "open island" syntax?

Yes.  I think it would most unwise to assign a character such a syntax
in the syntax table - the island boundaries would be unstable, existing
or not existing depending on the syntax table of the island one was
currently in.  I anticipate writing a warning about this in the Emacs
Lisp manual.

> That makes more sense.  But I think that in some cases "separate
> islands" might be a better designation.  For instance, consider the
> three sections of a Bison input file.  They are separated by a
> delimiter.  It would be artificial and arbitrary to try to divide up
> the delimiter into a string to end the previous island and a string to
> start the next one.

I think Bison and Lex are somewhat special cases; they each divide a
file into three sections of equal status, rather than there being a
containing mode and sections contained within it.

> Which reminds me that the first island in the Bison input file
> has no string to "open" it.  It starts at the start of the buffer.
> And the third island ends and the end of the buffer.

A workaround for this would be to have the first section being the
"super mode" and containing the second and third sections.  The
delimiter "%%" between sections 2 and 3 has space to hold both an island
close and an island open, despite what you say about this being
artificial, etc.  I don't see there would be an absolute need for there
to be a "close island" mark at the end of the buffer.

> -- 
> Dr Richard Stallman
> President, Free Software Foundation (gnu.org, fsf.org)
> Internet Hall-of-Famer (internethalloffame.org)
> Skype: No way! See stallman.org/skype.html.

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: A vision for multiple major modes: some design notes
  2016-04-23 17:31       ` Alan Mackenzie
@ 2016-04-24  9:22         ` Richard Stallman
  0 siblings, 0 replies; 45+ messages in thread
From: Richard Stallman @ 2016-04-24  9:22 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: dgutov, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I think Bison and Lex are somewhat special cases; they each divide a
  > file into three sections of equal status, rather than there being a
  > containing mode and sections contained within it.

You may be right that this kind of case is less common,
but the system should still handle it.

  > A workaround for this would be to have the first section being the
  > "super mode" and containing the second and third sections.  The
  > delimiter "%%" between sections 2 and 3 has space to hold both an island
  > close and an island open, despite what you say about this being
  > artificial, etc.

That could work, but I think it would be cleaner if "island separator"
were allowed too.

  >   I don't see there would be an absolute need for there
  > to be a "close island" mark at the end of the buffer.

I don't either.  I just thought the design required one.
If it doesn't, that is fine with me.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2016-06-14 16:27 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-20 19:44 A vision for multiple major modes: some design notes Alan Mackenzie
2016-04-20 21:06 ` Drew Adams
2016-04-20 23:00   ` Drew Adams
2016-04-21 12:43   ` Alan Mackenzie
2016-04-21 14:24     ` Stefan Monnier
2016-04-23  2:20       ` zhanghj
2016-04-23 22:36       ` Dmitry Gutov
2016-04-21 16:05     ` Drew Adams
2016-04-21 16:31       ` Eli Zaretskii
     [not found]     ` <<64f1d39a-dfd0-44ca-86c1-b4d6104b5702@default>
     [not found]       ` <<83oa926i0e.fsf@gnu.org>
2016-04-21 16:59         ` Drew Adams
2016-04-21 19:55           ` Eli Zaretskii
     [not found]     ` <<<64f1d39a-dfd0-44ca-86c1-b4d6104b5702@default>
     [not found]       ` <<<83oa926i0e.fsf@gnu.org>
     [not found]         ` <<791d74d1-2b1d-4304-8e7e-d6c31af7aa41@default>
     [not found]           ` <<83eg9y68jy.fsf@gnu.org>
2016-04-21 20:26             ` Drew Adams
2016-04-20 22:27 ` Phillip Lord
2016-04-21  9:14   ` Alan Mackenzie
2016-04-22 12:45     ` Phillip Lord
2016-04-21 14:17 ` Eli Zaretskii
2016-04-21 21:33   ` Alan Mackenzie
2016-04-21 22:01     ` Drew Adams
2016-04-22  8:13       ` Alan Mackenzie
2016-04-22 17:04         ` Drew Adams
2016-04-22  9:04     ` Eli Zaretskii
2016-06-13 21:17     ` John Wiegley
2016-06-14 13:13       ` Alan Mackenzie
2016-06-14 16:27         ` John Wiegley
2016-04-21 22:19   ` Alan Mackenzie
2016-04-22  8:48     ` Eli Zaretskii
2016-04-22 22:35       ` Alan Mackenzie
2016-04-23  7:39         ` Eli Zaretskii
2016-04-23 17:02           ` Alan Mackenzie
2016-04-23 18:12             ` Eli Zaretskii
2016-04-23 18:26               ` Dmitry Gutov
2016-04-23 21:08               ` Alan Mackenzie
2016-04-24  6:29                 ` Eli Zaretskii
2016-04-24 16:57                   ` Alan Mackenzie
2016-04-24 19:59                     ` Eli Zaretskii
2016-04-25  6:49                       ` Andreas Röhler
2016-04-22 13:42     ` Andy Moreton
2016-04-23 17:14       ` Alan Mackenzie
2016-04-22 14:33 ` Dmitry Gutov
2016-04-22 18:58 ` Richard Stallman
2016-04-22 20:22   ` Alan Mackenzie
2016-04-23 12:27     ` Andreas Röhler
2016-04-23 12:38     ` Richard Stallman
2016-04-23 17:31       ` Alan Mackenzie
2016-04-24  9:22         ` Richard Stallman

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.