* A vision for multiple major modes: some design notes @ 2016-04-20 19:44 Alan Mackenzie 2016-04-20 21:06 ` Drew Adams ` (4 more replies) 0 siblings, 5 replies; 45+ messages in thread From: Alan Mackenzie @ 2016-04-20 19:44 UTC (permalink / raw) To: emacs-devel, Dmitry Gutov Hello, Dmitry and Emacs. This post describes my notion of how multiple major modes {c,sh}ould be implemented. Key notions are "islands", "island chains", and "chain local" variable bindings. In this scheme, "super modes" will not have to do anything to swap in/out local variable bindings pertinent to islands; this will be done by the underlying C code. Narrowing/widening will not be (ab)used by the super mode mechanism. Major modes will continue to be able to use the entire range of Emacs facilities. Here are some design notes: (i) Overview and motivation. o - The aim is to support several major modes simultaneously in a single buffer. o - The "super mode" will set up "chains of islands" (see below). * - Each chain will have its own major mode, key map, syntax table, etc. * - In each chain, "chain local" variable bindings will exist. Such a binding will be current when point is within an island in the chain. * - The coordination of these bindings will be carried out by the mechanisms described below, without explicit coding in the super mode. o - To the user, the current major mode will be that of the island where point is. All familiar commands will work without restriction. o - To the writer of major modes, a minimal set of restrictions will apply: * - For some major mode commands, the mode will have to bind the variable `in-islands' (see below) to non-nil. * - For regexps which recognise whitespace, the regexp must contain "\\s-" or "\\s " or "[[:space:]]" so that the regexp engine will handle "foreign" islands and gaps between chained islands as whitespace. * - All other Emacs facilities will be available for use, being adapted as necessary for the island mechanism. (ii) Definitions and concepts. o - An @dfn{island} is a contiguous portion of a buffer marked at each end. Its attributes are those of the chain of islands of which it is an element. o - A @dfn{chain} of islands is a canonically ordered chain of islands in a single buffer. An island chain has its own major mode; it has its own syntax table, abbreviation table, font lock settings, etc. It has its own bindings of (most) "buffer" local variables. o - An island chain will have @dfn{chain local} variable bindings. Such a binding will become current and accessible when point is within one of the chain's islands. When point is not in an island, the buffer local binding of the variable will be current. Most variables which are currently buffer local in Emacs 25 will become chain local. Those (relatively few) variables which must retain a single value over an entire buffer will be marked as such with a non-nil value of the `entire-buffer' property. o - The variable `using-islands' will be set non-nil to indicate the current buffer is using the island mechanism. o - The variable `in-islands' will control island and island chain facilities. When this variable is bound to non-nil, the facilities described here (such as chain local variables) are active. When the variable is nil, (most of) the new facilities are inactive, and Emacs behaves as Emacs 25. (iii) Island Chains. o - An island chain will be a Lisp object which is a C struct similar to struct buffer. In particular, it will contain slots for common chain local variables, and an association list for bindings of other chain local variables. o - An island chain might contain pointers to the first and last of its islands (still to be decided). (iv) Islands. o - An island will be delimited in two complementary ways: * - It will be enclosed syntactically by characters with "open island" and "close island" syntax (see section (v)). Both of these syntactic markers will include a flag "chain" indicating whether there is a previous/next island in the chain. The cdr of the syntax value will be the island chain to which the island belongs. * - It will be covered by the text property `island', whose value will be the pertinent island or island chain (see section (ii)) (not yet decided). Note that if islands are enclosed inside other islands, the value is the innermost island. There is the possibility of using an interval tree independent of the one for text properties to increase performance. o - An island might be represented by a C or Lisp structure, it might not (not yet decided). This structure would hold the containing chain, markers pointing to the start and end of the chain, and the previous and next islands in the chain. (v) Syntax, etc. o - Two new syntax classes, "open island" and "close island" will be introduced. These will be designated by the characters "{" and "}". Their "matching character" slots will contain the island's chain. There will be an extra flag "chain" (denoted by "i") indicating whether there is a previous/next island in the chain. o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as whitespace, much as they do comments. They will also treat as whitespace the gap between two islands in a chain. o - The (currently 11 element) parser state will be enhanced to support islands as follows: * - A twelfth element will be introduced. This will contain an association list whose elements will have the form (island-chain . 12-element parse state); each element will contain the suspended state of parsing in the island chain which is the car of the element. An element with a car of nil will represent the suspended parsing state of the buffer outside of islands. * - Elements 12, 13, .... will be island chains of the enclosing islands, elt 12 being that of the innermost enclosing island, etc. An element with a value of nil indicates being outside all islands. o - `parse-partial-sexp' will create and use an enhanced parser state as described above. Note that a two character construct (such as a C comment opener) can not enclose an island, and special handling will be required to exclude this. The syntax table in use will change as the current position passes between islands. o - `syntax-ppss' will do the right thing with the extended parser state. Alternatively, `syntax-ppss' will have an independent 12-element state in each island chain, where elt. 11 is always nil. Its cache mechanism will be enhanced such that buffer changes outside of an island chain need not invalidate the stored cache pertaining to the chain. o - The facilities in this section are active even when `in-islands' is nil. (vi) Regexps. o - The regexp engine will be enhanced such that the regexps "\\s-", "\\s ", and "[[:space:]] will match an entire island. o - The gap between two islands in a chain will also be matched by the above regexps. o - This treatment of an island, and a gap between two islands, as WS will occur only when `in-islands' is non-nil. o - When `in-islands' is nil, there will be no reliable way of scanning over an island by regexps, since it is a potentially nested structure, and FSMs don't recognise arbitrarily nested structures. (vii) Variables. o - Island chain local variable bindings will come into existence. These bindings depend on the island point is in. There will be lower level routines that will have "position" parameters as an alternative to using point. o - All variables which are currently buffer local will become chain local except for those whose symbols are given a non-nil `entire-buffer' property. There will be no new functions like `make-chain-local-variable'. o - When the `entire-buffer' property is nil, the buffer local binding of a variable will hold the value pertinent to the areas of the buffer outside of islands. When that property is non-nil, the binding holds the value for the entire buffer. o - When `in-islands' is nil, the chain local mechanism described here is not used - instead the familiar buffer local binding is used. o - The current binding for a local variable will be the chain local binding of the island chain of the island containing point. If point is not in an island, the buffer local binding is current. o - If a chain local binding is current, and its value is unbound, the binding of an enclosing scope is NOT used in its place. Probably the variable's default-value should be used when reading. o - In buffer.h, a new macro CVAR ("island chain variable") analogous to BVAR will be introduced. It will use BVAR as a fall back. Most invocations of BVAR will be changed to CVAR. o - In data.c, the mechanism for accessing local variable bindings (e.g. `swap_in_symval_forwarding') will be enhanced to test `in-islands' and handle chain local bindings appropriately. (viii) Change hooks. o - There will be two additional abnormal hooks, `island-before-change-function' and `island-after-change-function', which will each hold a single function or nil. These will take the same parameters as `before-change-functions' and `after-change-functions' respectively. o - The return value of these functions will be an association list with members whose car is an island chain (or nil, meaning "outside all islands") and whose cdr is the list of parameters to supply to `before/after-change-functions for that chain. Usually, the alist will have just one member containing BEG, END, and for `after-..' OLD-LEN unchanged. o - After calling each of these functions, Emacs will invoke `before/after-change-functions' on each chain in the returned alist. This will be in place of the standard calls to `before/after-change-functions'. o - The intention of these hooks is that super modes will use them to detect the deletion and insertion of islands, and to do the "de-islandification" and "islandification" as needed. o - `before/after-change-functions' will be normal chain local variables. A chain local binding will hold functions for the individual chain. The buffer local binding will hold functions for the parts of the buffer outside of islands. (ix) Miscellaneous commands and functions. o - `point-min' and `point-max' will, when `in-islands' is non-nil, return the max/min point in the visible region in the same chain of islands as point. o - `search-\(forward\|backward\)\(-regexp\)?' will restrict themselves to the current island chain when `in-islands' is non-nil. o - `skip-\(chars\|syntax\)-\(forward\|backward\)' will likewise operate in the current island chain (how?) when `in-islands' is non-nil. o - `\(next\|previous\)-\(single\|char\)-property-change', etc., will do the Right Thing in island chains when `in-islands' is non-nil. o - New functions `island-min', `island-max', `island-chain-min' and `island-chain-max' will do what their names say. o - There will be no restrictions on the use of widening/narrowing, as have been proposed for other support engines for multiple major modes. o - New commands like `beginning-of-island', `narrow-to-island', etc. will be wanted. More difficultly, bindings for them will be needed. o - ??? Other commands to be amended. (x) Emacs subsystems and `in-islands'. o - Redisplay will bind `in-islands' to non-nil, but will successfully display all islands wholly or partially in windows being displayed. o - Font Lock will bind `in-islands' to non-nil, but will successfully fontify all pertinent islands. o - `island-before/after-change-function' will be called with `in-islands' nil. o - `before/after-change-functions' will be called with `in-islands' bound to non-nil. o - Major modes will need to bind `in-islands' to non-nil for such things as indentation. o - For normal user interaction, `in-islands' will be nil. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* RE: A vision for multiple major modes: some design notes 2016-04-20 19:44 A vision for multiple major modes: some design notes Alan Mackenzie @ 2016-04-20 21:06 ` Drew Adams 2016-04-20 23:00 ` Drew Adams 2016-04-21 12:43 ` Alan Mackenzie 2016-04-20 22:27 ` Phillip Lord ` (3 subsequent siblings) 4 siblings, 2 replies; 45+ messages in thread From: Drew Adams @ 2016-04-20 21:06 UTC (permalink / raw) To: Alan Mackenzie, emacs-devel Sounds very good, a priori. And I commend you for actually putting together a clear and comprehensive design proposal for discussion, instead of just implementing something. Especially for something that is likely to lead to new uses and further possibilities, it is good to open up the big picture for discussion (regardless of the outcome). Some feedback, mostly minor - > * - For regexps which recognise whitespace, the regexp must contain > "\\s-" or "\\s " or "[[:space:]]" so that the regexp engine will > handle "foreign" islands and gaps between chained islands as whitespace. I understand the motivation (you explain it further on). But this hardcoding of what can constitute a "whitespace-matching" pattern seems a bit rigid. No way to flexibly allow for different meanings of whitespace here? What if some code wants to handle \n or \t or \f etc. differently, or to even treat some set of (normally non-whitespace) chars as if they too were whitespace for island purposes? > o - A @dfn{chain} of islands is a canonically ordered chain of islands in > a single buffer. Why limit it necessarily to a single buffer? It is common to want to do things (search etc.) across multiple buffers, and sometimes regardless of mode. That doesn't diminish just because one might want to use chains of non-contiguous text zones. I'm pretty sure I would want to be able do things throughout a chain that spans different buffers. If it were I, I would think about defining all that you are doing using a structure that is multi-buffer. [That is what I did for zones.el, for instance - sets of such text zones are delimited by markers, which automatically record the buffer they pertain to. And they can be persistent, as well. Have you considered the possibility of persisting island chains?] And I would probably want user-level operations, to combine chains (append, intersect, union/coalesce, difference). And why not be able to do that for chains that cross buffers? Being able to add (e.g. append) a chain in one buffer to a chain in another buffer is one simple example. Anything you might want to do with one chain you will likely want to be able to do with a set of chains, or at least with a chain that results from composing a set of chains in various ways. Also, I'm guessing/hoping, but I'm not sure I saw this explicitly, that you can have multiple chains (e.g. in the same buffer) that use the same major mode. Being associated with a major mode is only one possible attribute of a chain - it is not required, and other attributes and uses of a chain are not dependent on it, right? IOW, it is not necessary to think of chains as mode-related - that is just one (albeit common) use & interpretation, right? > o - An island will be delimited in two complementary ways: > * - It will be enclosed syntactically by characters with > "open island" and "close island" syntax (see section (v)). > Both of these syntactic markers will include a flag "chain" > indicating whether there is a previous/next island in the > chain. The cdr of the syntax value will be > the island chain to which the island belongs. > * - It will be covered by the text property `island', whose > value will be the pertinent island or island chain Are both always required, or is either sufficient for most purposes? Is the syntax one needed only when you need to take advantage of it? Can you do most things using either, so that a given operation (that is not specific to only one of them, e.g. not specific to syntax) can be done regardless of which is available? I'm thinking that in many contexts I would not care about delimiting by syntax, and I might not even care about associating a given chain with a mode. Would I be able to use such chains nevertheless (e.g. search/replace across them)? > Note that if islands are enclosed inside other islands, Maybe you can elaborate on overlapping islands and chains? What caveats or use cases do you see? A priori, I would like to have a chain data structure, and as much of the rest of the features as possible, be available and manipulable from Lisp. Something like this has lots of enhancement possibilities and use cases that we are unlikely to imagine at the outset. Implementing more than an absolute minimum in C hampers that exploration and improvement. HTH. I don't claim to have grasped all of what you envisage. It's great food for thought, in any case. (I asked a couple of times, in the bug thread(s) and here, for just this sort of top-level picture of what was envisaged. I gave up hoping that someone might actually make clear what the question/project/plan is. This is a welcome, if unexpected, development.) ^ permalink raw reply [flat|nested] 45+ messages in thread
* RE: A vision for multiple major modes: some design notes 2016-04-20 21:06 ` Drew Adams @ 2016-04-20 23:00 ` Drew Adams 2016-04-21 12:43 ` Alan Mackenzie 1 sibling, 0 replies; 45+ messages in thread From: Drew Adams @ 2016-04-20 23:00 UTC (permalink / raw) To: Alan Mackenzie, emacs-devel I said: > And I would probably want user-level operations, to > combine chains (append, intersect, union/coalesce, > difference). And complement - get a new chain as the complement of a chain, i.e., the islands of one are the non-islands of the other. You should easily be able to search etc. _outside_ the islands of a given chain. -- I did that for zones.el, for instance: zz-zones-complement is a Lisp function in `zones.el'. (zz-zones-complement ZONES &optional BEG END BUFFER) Return a list of zones that is the complement of ZONES, from BEG to END. ZONES is assumed to be a union, i.e., sorted by car, with no overlaps. Any extra info in a zone of ZONES, i.e., after the cadr, is ignored. (The bit about being a union and sorted by car just means that the list of zones must be like your chain of islands: no overlaps and ordered by buffer position. The bit about ignoring cadr has to do with the fact that a zone (~island) can contain other information, in addition to its limits.) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-20 21:06 ` Drew Adams 2016-04-20 23:00 ` Drew Adams @ 2016-04-21 12:43 ` Alan Mackenzie 2016-04-21 14:24 ` Stefan Monnier ` (3 more replies) 1 sibling, 4 replies; 45+ messages in thread From: Alan Mackenzie @ 2016-04-21 12:43 UTC (permalink / raw) To: Drew Adams; +Cc: emacs-devel Hello, Drew. On Wed, Apr 20, 2016 at 02:06:37PM -0700, Drew Adams wrote: > Sounds very good, a priori. And I commend you for actually > putting together a clear and comprehensive design proposal > for discussion, instead of just implementing something. > Especially for something that is likely to lead to new uses > and further possibilities, it is good to open up the big > picture for discussion (regardless of the outcome). Thanks, that's appreciated. > Some feedback, mostly minor - ;-) > > * - For regexps which recognise whitespace, the regexp must contain > > "\\s-" or "\\s " or "[[:space:]]" so that the regexp engine will > > handle "foreign" islands and gaps between chained islands as whitespace. > I understand the motivation (you explain it further on). But this > hardcoding of what can constitute a "whitespace-matching" pattern > seems a bit rigid. No way to flexibly allow for different meanings > of whitespace here? What if some code wants to handle \n or \t or > \f etc. differently, or to even treat some set of (normally > non-whitespace) chars as if they too were whitespace for island > purposes? This is a good point. Maybe it would be better to match an island or the gap between two chained islands with any regexp element which matches the space (the good old 0x20 character). > > o - A @dfn{chain} of islands is a canonically ordered chain of islands in > > a single buffer. > Why limit it necessarily to a single buffer? It is common to > want to do things (search etc.) across multiple buffers, and > sometimes regardless of mode. That doesn't diminish just > because one might want to use chains of non-contiguous text > zones. Why limit it? A buffer is a natural unit of editing. The island chain concept is primarily to allow different regions of a buffer to have different major modes, whilst minimising ugly workarounds, artificial restrictions, and so on. > I'm pretty sure I would want to be able do things throughout > a chain that spans different buffers. If it were I, I would > think about defining all that you are doing using a structure > that is multi-buffer. I don't envisage that the island chains will really be that useful for (user initiated) searching, etc. The idea is that, to the user, such a buffer will look much like it already does, except that the font locking will be appropriate for each island, the major mode key map will be right for each island, and so on. > [That is what I did for zones.el, for instance - sets of such > text zones are delimited by markers, which automatically record > the buffer they pertain to. And they can be persistent, as well. > Have you considered the possibility of persisting island chains?] > And I would probably want user-level operations, to combine > chains (append, intersect, union/coalesce, difference). > And why not be able to do that for chains that cross buffers? The chains will be disjoint, so intersection/difference wouldn't be useful. Given that the essential feature of a chain is its major mode, it wouldn't make sense to combine chains (which will usually have different major modes). I'm still trying to think through the idea of a chain having islands in several buffers. > Being able to add (e.g. append) a chain in one buffer to a chain > in another buffer is one simple example. Anything you might want > to do with one chain you will likely want to be able to do with > a set of chains, or at least with a chain that results from > composing a set of chains in various ways. > Also, I'm guessing/hoping, but I'm not sure I saw this explicitly, > that you can have multiple chains (e.g. in the same buffer) that > use the same major mode. Indeed, yes. > Being associated with a major mode is only one possible attribute of a > chain - it is not required, and other attributes and uses of a chain > are not dependent on it, right? IOW, it is not necessary to think of > chains as mode-related - that is just one (albeit common) use & > interpretation, right? Not right, sorry. The major mode is an essential attribute of an island chain. There will be a slot for it in the structure which holds chain data, just as there is currently a slot for it in the (C) buffer structure. There will likewise be slots for the syntax table, major mode key map, and so on. None of these slots would work well with a null value. > > o - An island will be delimited in two complementary ways: > > * - It will be enclosed syntactically by characters with > > "open island" and "close island" syntax (see section (v)). > > Both of these syntactic markers will include a flag "chain" > > indicating whether there is a previous/next island in the > > chain. The cdr of the syntax value will be > > the island chain to which the island belongs. > > * - It will be covered by the text property `island', whose > > value will be the pertinent island or island chain > Are both always required, or is either sufficient for most > purposes? Both are required, yes. They will both be used. > Is the syntax one needed only when you need to take advantage of it? > Can you do most things using either, so that a given operation (that > is not specific to only one of them, e.g. not specific to syntax) can > be done regardless of which is available? Primarily, the text property is to allow the chain local variable mechanism quickly to find the correct chain for accessing the variables from. There is a worry that the extra cost of accessing this text property may slow Emacs down excessively. There will probably have to be some sort of cacheing of the current island. > I'm thinking that in many contexts I would not care about > delimiting by syntax, and I might not even care about > associating a given chain with a mode. Would I be able to > use such chains nevertheless (e.g. search/replace across them)? I'm not sure this island mechanism is the right tool for doing what you're suggesting. For searching/replacing at the user level, some extra option meaning "only in the current chain" would need to be added to the user interface. > > Note that if islands are enclosed inside other islands, > Maybe you can elaborate on overlapping islands and chains? > What caveats or use cases do you see? Islands would not be permitted (not sure how at this stage) to "overlap" eachother. Two islands must either be disjoint, or one completely contain the other. The major mode for any position would be that of the "innermost" current island. > A priori, I would like to have a chain data structure, and > as much of the rest of the features as possible, be available > and manipulable from Lisp. Something like this has lots of > enhancement possibilities and use cases that we are unlikely > to imagine at the outset. Implementing more than an absolute > minimum in C hampers that exploration and improvement. One idea would be to implement a chain feature, one of whose uses would be the major mode islands I've been trying to specify. A significant part of this would have to be implemented at the C level for speed - chain local variables are already going to be slower to access than buffer local variables. We must keep that difference to a minimum. > HTH. I don't claim to have grasped all of what you envisage. > It's great food for thought, in any case. > (I asked a couple of times, in the bug thread(s) and here, > for just this sort of top-level picture of what was envisaged. > I gave up hoping that someone might actually make clear what > the question/project/plan is. This is a welcome, if unexpected, > development.) Thanks! -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-21 12:43 ` Alan Mackenzie @ 2016-04-21 14:24 ` Stefan Monnier 2016-04-23 2:20 ` zhanghj 2016-04-23 22:36 ` Dmitry Gutov 2016-04-21 16:05 ` Drew Adams ` (2 subsequent siblings) 3 siblings, 2 replies; 45+ messages in thread From: Stefan Monnier @ 2016-04-21 14:24 UTC (permalink / raw) To: emacs-devel I haven't kept up with this discussion, but I think it'd worthwhile taking a look at what things like SublimeText do for syntax highlighting, because it's a lot more powerful than what font-lock does (IOW it lets you define contexts and is hence closer to a parser whereas font-lock is closer to a lexer), and it might be an interesting starting point for multiple major modes. I think font-lock is old and deserves a replacement. Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-21 14:24 ` Stefan Monnier @ 2016-04-23 2:20 ` zhanghj 2016-04-23 22:36 ` Dmitry Gutov 1 sibling, 0 replies; 45+ messages in thread From: zhanghj @ 2016-04-23 2:20 UTC (permalink / raw) To: Stefan Monnier; +Cc: netjune, emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: > I haven't kept up with this discussion, but I think it'd worthwhile > taking a look at what things like SublimeText do for syntax > highlighting, because it's a lot more powerful than what font-lock does > (IOW it lets you define contexts and is hence closer to a parser whereas > font-lock is closer to a lexer), and it might be an interesting starting > point for multiple major modes. > > I think font-lock is old and deserves a replacement. > > > Stefan Yes. It can also do symbol indexing (like imenu in emacs, or ctags tools) based on the syntax files. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-21 14:24 ` Stefan Monnier 2016-04-23 2:20 ` zhanghj @ 2016-04-23 22:36 ` Dmitry Gutov 1 sibling, 0 replies; 45+ messages in thread From: Dmitry Gutov @ 2016-04-23 22:36 UTC (permalink / raw) To: Stefan Monnier, emacs-devel On 04/21/2016 05:24 PM, Stefan Monnier wrote: > I haven't kept up with this discussion, but I think it'd worthwhile > taking a look at what things like SublimeText do for syntax > highlighting, because it's a lot more powerful than what font-lock does > (IOW it lets you define contexts and is hence closer to a parser whereas > font-lock is closer to a lexer), and it might be an interesting starting > point for multiple major modes. Indeed. If anyone's interested, here's some documentation: https://www.sublimetext.com/docs/3/syntax.html Apparently, Sublime, TextMate, Atom and even Vim all use this or similar approaches. If it were somehow adopted for Emacs (plenty of details would need to be worked out), it would allow describing more complex grammars, and e.g. support a related Ruby feature that I'm having difficulty implementing right now. Further, if it provides a different mechanism of syntactic parsing, it could be an alternative to using islands to make parse-partial-sexp skip over "foreign" regions. Although, unless we're going to change how we write indentation code, we'd still need to be able to compute the current paren nesting. Ultimately, the new way of defining a grammar could also be a way to define and apply "island" boundaries automatically, without the need for third-party code. Where it's less likely to help, though, is with being able to combine and reuse settings and code from multiple major modes in one file. For anything like that to happen, the syntax definitions would have to be using a format that's highly composable, at least. I'm not sure I'm seeing that in any of the current grammars in the aforementioned editors. And the current way to combine the functionality from different languages is to call different major mode functions and switch between sets of buffer-local variables. Not sure what's the alternative for that. ^ permalink raw reply [flat|nested] 45+ messages in thread
* RE: A vision for multiple major modes: some design notes 2016-04-21 12:43 ` Alan Mackenzie 2016-04-21 14:24 ` Stefan Monnier @ 2016-04-21 16:05 ` Drew Adams 2016-04-21 16:31 ` Eli Zaretskii [not found] ` <<64f1d39a-dfd0-44ca-86c1-b4d6104b5702@default> [not found] ` <<<64f1d39a-dfd0-44ca-86c1-b4d6104b5702@default> 3 siblings, 1 reply; 45+ messages in thread From: Drew Adams @ 2016-04-21 16:05 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel > This is a good point. Maybe it would be better to match an island or > the gap between two chained islands with any regexp element which > matches the space (the good old 0x20 character). See also Eli's feedback about this. I think I agree with him that trying to repurpose whitespace matching for this is maybe the best approach. A separate matching should perhaps be used - nothing to do with whitespace per se, even if the matching used might take whitespace (also) into account. > > I'm pretty sure I would want to be able do things throughout > > a chain that spans different buffers. If it were I, I would > > think about defining all that you are doing using a structure > > that is multi-buffer. > > I don't envisage that the island chains will really be that useful for > (user initiated) searching, etc. The idea is that, to the user, such a > buffer will look much like it already does, except that the font locking > will be appropriate for each island, the major mode key map will be > right for each island, and so on. I see it differently. I think you see it that way because for you the major mode thing is an essential part of the feature you want to implement - it is primary. To me, chains of islands should be the primary, and a very general, thing, and one (important) use of them would be to apply a mode to them ("multi-modes"). IOW, I see (lots of) possible uses for chains of islands that go beyond (i.e., do not necessarily involve) the application of a particular mode to them. And in the general case I see no reason to limit chains to a single buffer. That doesn't mean that there wouldn't be important cases that do limit the use to either (a) applying a given major mode or (b) a single buffer. I just don't see why we would build such limits into the design (i.e., hardcoded, making it hard to extend to either (a) mode-agnostic or (b) multi-buffer). > > [That is what I did for zones.el, for instance - sets of such > > text zones are delimited by markers, which automatically record > > the buffer they pertain to. And they can be persistent, as well. > > Have you considered the possibility of persisting island chains?] Persistence? > > And I would probably want user-level operations, to combine > > chains (append, intersect, union/coalesce, difference). > > And why not be able to do that for chains that cross buffers? > > The chains will be disjoint, so intersection/difference wouldn't be > useful. I understand that the islands in a chain would be disjoint. But why would chains necessarily be disjoint? Why shouldn't chains be independent (at least be able to be independent)? Why would defining one chain impose limits on defining other chains (any new chains would need to be disjoint from existing ones)? See above, regarding the utility of being able to ignore a chain's mode for certain operations (and the ability for a chain to not even have an associated mode). I suspect that you are not seeing the use cases I am, which involve doing all kinds of things to/with the text in a chain of islands. As Eli suggested, think of a chain of islands as an extension of narrowing. Now think of the many different kinds of things you (or code) do to a narrowed region. This should be a more general feature, I think, than what is available in something like MuMaMo or mmm. "Multi-modes" is a subcase. Again, I see a chain of (ordered) text regions as the primary, general feature, and the mapping (restriction) of a major mode to such a chain as a subsidiary feature. > Given that the essential feature of a chain is its major mode, That is where we differ, and that explains, I think, the narrower focus you have. I wouldn't limit the feature to being coupled to a mode. That should be a possibility but not a requirement. > it wouldn't make sense to combine chains (which will usually > have different major modes). It would make sense, depending on what kind of operation you wanted to apply to the text in chains. And chains with the same mode could also be combined, whether in the same buffer or not. > I'm still trying to think through the idea of a > chain having islands in several buffers. Think of the chains first as just buffer narrowings that are multi-region, i.e., ignoring all the syntax and major-mode features that you are thinking about. (You can still think of those, but they come in at a different level - a specific subfeature or set of use cases.) > > Being able to add (e.g. append) a chain in one buffer to a chain > > in another buffer is one simple example. Anything you might want > > to do with one chain you will likely want to be able to do with > > a set of chains, or at least with a chain that results from > > composing a set of chains in various ways. > > > Also, I'm guessing/hoping, but I'm not sure I saw this explicitly, > > that you can have multiple chains (e.g. in the same buffer) that > > use the same major mode. > > Indeed, yes. > > > Being associated with a major mode is only one possible attribute of a > > chain - it is not required, and other attributes and uses of a chain > > are not dependent on it, right? IOW, it is not necessary to think of > > chains as mode-related - that is just one (albeit common) use & > > interpretation, right? > > Not right, sorry. The major mode is an essential attribute of an > island chain. Why? What's necessarily essential about it? That's a design choice, no? Would you consider dropping it as a requirement and keeping it as an option (for any given chain)? > There will be a slot for it in the structure which holds chain > data, just as there is currently a slot for it in the (C) buffer > structure. Must the slot be filled? Always? (Why?) > There will likewise be slots for the syntax table, major > mode key map, and so on. None of these slots would work well with a > null value. Why not optional? Of course if such a slot is not used then it, and anything that depends on it, would not "work well". But that should not prevent other, non-mode-related uses of a chain from working OK. > > > o - An island will be delimited in two complementary ways: > > > * - It will be enclosed syntactically by characters with > > > "open island" and "close island" syntax (see section (v)). > > > Both of these syntactic markers will include a flag "chain" > > > indicating whether there is a previous/next island in the > > > chain. The cdr of the syntax value will be > > > the island chain to which the island belongs. > > > * - It will be covered by the text property `island', whose > > > value will be the pertinent island or island chain > > > Are both always required, or is either sufficient for most > > purposes? > > Both are required, yes. They will both be used. Why required? Why can't the design tolerate not having syntax-based delimiting? I would prefer to see what you're envisaging placed within the context of a more general feature. I see 3 possible levels, in fact: 1. Arbitrary sets of text zones. Not necessarily ordered (e.g. by buffer position). Not necessarily without overlap. 2. #1, but as chains: ordered, non-overlapping. 3. #2, but with an associated major mode per chain. This is essentially what you have in mind, I think. For all 3 levels I can see use cases for chains that cross buffers and use cases for chain-combining operations. I can also imagine using some chain-local variables that are not buffer-specific or mode-specific. (You already allow for that, IIUC.) > > I'm thinking that in many contexts I would not care about > > delimiting by syntax, and I might not even care about > > associating a given chain with a mode. Would I be able to > > use such chains nevertheless (e.g. search/replace across them)? > > I'm not sure this island mechanism is the right tool for doing what > you're suggesting. Depends on what it ends up being. ;-) > For searching/replacing at the user level, some > extra option meaning "only in the current chain" would need to be > added to the user interface. FWIW, I've done this for arbitrary sets of zones (including across buffers). The code is in `isearch-prop.el' (which depends on `zones.el' for this feature). Also, wrt "the current chain": You might want to look at the zones.el code for the use of variables (which can be buffer-local, but need not be) that hold sets of zones (including sets that are "chains") - how users can create them, choose among them, clone them, persist them, etc. > > A priori, I would like to have a chain data structure, and > > as much of the rest of the features as possible, be available > > and manipulable from Lisp. Something like this has lots of > > enhancement possibilities and use cases that we are unlikely > > to imagine at the outset. Implementing more than an absolute > > minimum in C hampers that exploration and improvement. > > One idea would be to implement a chain feature, one of whose uses would > be the major mode islands I've been trying to specify. That's what I've been trying to suggest: chains of zones are more general than the feature you've described. That doesn't take away from the importance of the use case you have in mind. > A significant > part of this would have to be implemented at the C level for speed - > chain local variables are already going to be slower to access than > buffer local variables. We must keep that difference to a minimum. I have no problem with stuff being in C for performance reasons. When that is not critical, keeping stuff in Lisp is good. Especially for a new and very general feature: let folks play with it and experiment with new possibilities. We can later optimize any parts we like. We should avoid doing that prematurely, as always - but especially for Emacs, where Lisp enhancement by users is really the name of the game. Thanks again for opening this discussion and providing a detailed first proposal. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-21 16:05 ` Drew Adams @ 2016-04-21 16:31 ` Eli Zaretskii 0 siblings, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2016-04-21 16:31 UTC (permalink / raw) To: Drew Adams; +Cc: acm, emacs-devel > Date: Thu, 21 Apr 2016 09:05:23 -0700 (PDT) > From: Drew Adams <drew.adams@oracle.com> > Cc: emacs-devel@gnu.org > > I have no problem with stuff being in C for performance reasons. > When that is not critical, keeping stuff in Lisp is good. > > Especially for a new and very general feature: let folks play > with it and experiment with new possibilities. We can later > optimize any parts we like. The parts that affect redisplay must at least partially be in C, because there's no existing infrastructure that I'm aware of that can be piggy-backed to do this kind of stuff. ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <<64f1d39a-dfd0-44ca-86c1-b4d6104b5702@default>]
[parent not found: <<83oa926i0e.fsf@gnu.org>]
* RE: A vision for multiple major modes: some design notes [not found] ` <<83oa926i0e.fsf@gnu.org> @ 2016-04-21 16:59 ` Drew Adams 2016-04-21 19:55 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Drew Adams @ 2016-04-21 16:59 UTC (permalink / raw) To: Eli Zaretskii, Drew Adams; +Cc: acm, emacs-devel > > I have no problem with stuff being in C for performance reasons. > > When that is not critical, keeping stuff in Lisp is good. > > > > Especially for a new and very general feature: let folks play > > with it and experiment with new possibilities. We can later > > optimize any parts we like. > > The parts that affect redisplay must at least partially be in C, > because there's no existing infrastructure that I'm aware of that > can be piggy-backed to do this kind of stuff. Anything that must be in C must be in C, of course. ;-) But just what does "the parts that affect redisplay" mean? If we mean parts that need to do something particular wrt redisplay, then yes, that makes sense. If we mean also some parts that would just be faster if done in C then maybe, or maybe not. You mentioned earlier that redisplay needs to access buffer-local variables as it moves through the buffer. And you said that redisplay needs to get the right values of such variables. But for some island-chain operations, e.g. some that I'm thinking of that do not care about the mode of a chain or whether it even has a mode, I don't see why redisplay would need to do anything special. No, I don't claim to understand this. I'll stick with agreeing that if there is an effect from this feature on redisplay, or if redisplay affects this feature somehow, and if that means that some bits of the feature must be implemented in C, that's fine. I would just prefer that we not go overboard wrt a C implementation, just because we can or because something might be faster in C. I'd just as soon have a general, open, and easy-to-modify-&-extend feature at the outset, and worry later about optimizing bits of it that are important to optimize. From my point of view, a feature such as this opens new possibilities that are ripe for exploration. And that shouts, "Lisp, please!". Anyway, not knowing anything about this part of things, I'll shut up about C vs Lisp, at least for now. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-21 16:59 ` Drew Adams @ 2016-04-21 19:55 ` Eli Zaretskii 0 siblings, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2016-04-21 19:55 UTC (permalink / raw) To: Drew Adams; +Cc: acm, emacs-devel > Date: Thu, 21 Apr 2016 09:59:02 -0700 (PDT) > From: Drew Adams <drew.adams@oracle.com> > Cc: acm@muc.de, emacs-devel@gnu.org > > But just what does "the parts that affect redisplay" mean? > If we mean parts that need to do something particular wrt > redisplay, then yes, that makes sense. I mean the part that is needed for redisplay to behave in each island according to user expectations. For example, imagine that a mode that is relevant to a certain island chain sets up face-remapping-alist in some particular way -- when redisplay does its job, it repeatedly consults this variable when it needs to compute faces. I'm saying that the part of the changes for this feature that affects redisplay will have to arrange for recalculation of the value of face-remapping-alist when the display engine gets to examining the portion of buffer text that belongs to this island chain. Since the position where the display engine processes is not visible to Lisp, this arrangement will have to be in C. And similarly with any other variable whose value the display engine accesses from its C code, like standard-display-table, for example. > You mentioned earlier that redisplay needs to access > buffer-local variables as it moves through the buffer. > And you said that redisplay needs to get the right values > of such variables. > > But for some island-chain operations, e.g. some that I'm > thinking of that do not care about the mode of a chain > or whether it even has a mode, I don't see why redisplay > would need to do anything special. This could be so in some particular use cases, but it's not so in general. Modes do affect the way text is displayed. Besides, Alan says that "most" buffer-local variables will become island-chain local. If we believe him, then your use cases you mention above are lucky exceptions rather than the rule. ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <<<64f1d39a-dfd0-44ca-86c1-b4d6104b5702@default>]
[parent not found: <<<83oa926i0e.fsf@gnu.org>]
[parent not found: <<791d74d1-2b1d-4304-8e7e-d6c31af7aa41@default>]
[parent not found: <<83eg9y68jy.fsf@gnu.org>]
* RE: A vision for multiple major modes: some design notes [not found] ` <<83eg9y68jy.fsf@gnu.org> @ 2016-04-21 20:26 ` Drew Adams 0 siblings, 0 replies; 45+ messages in thread From: Drew Adams @ 2016-04-21 20:26 UTC (permalink / raw) To: Eli Zaretskii, Drew Adams; +Cc: acm, emacs-devel > > But just what does "the parts that affect redisplay" mean? > > If we mean parts that need to do something particular wrt > > redisplay, then yes, that makes sense. > > I mean the part that is needed for redisplay to behave in each island > according to user expectations. For example, imagine that a mode that > is relevant to a certain island chain sets up face-remapping-alist in > some particular way -- when redisplay does its job, it repeatedly > consults this variable when it needs to compute faces. I'm saying > that the part of the changes for this feature that affects redisplay > will have to arrange for recalculation of the value of > face-remapping-alist when the display engine gets to examining the > portion of buffer text that belongs to this island chain. Since the > position where the display engine processes is not visible to Lisp, > this arrangement will have to be in C. And similarly with any other > variable whose value the display engine accesses from its C code, like > standard-display-table, for example. Thanks for the example. That's the kind of thing I thought you had in mind. > > You mentioned earlier that redisplay needs to access > > buffer-local variables as it moves through the buffer. > > And you said that redisplay needs to get the right values > > of such variables. > > > > But for some island-chain operations, e.g. some that I'm > > thinking of that do not care about the mode of a chain > > or whether it even has a mode, I don't see why redisplay > > would need to do anything special. > > This could be so in some particular use cases, but it's not > so in general. Depends on what one means by "in general". ;-) To me, having a different mode associated with a chain is a special case of either having such a mode or not having one. Likewise, for having chain-local variables or not. Both having and not having are special cases of "in general". > Modes do affect the way text is displayed. Yes. But if a chain does not use a mode that is different from the buffer's mode, then there should be no special mode-specific handling needed for it. > Besides, Alan says that "most" buffer-local variables will > become island-chain local. If we believe him, then your > use cases you mention above are lucky exceptions rather > than the rule. I don't see them as either lucky exceptions or the rule. I imagine that there are lots of possible uses of a chain of islands of text, some of which involve a different mode or in some other way involve different display possibilities, and some of which do not. From the point of view of C code (e.g. redisplay) modification, the latter use cases would I guess be lucky (little or nothing new to do). That doesn't mean they would be exceptional (rare) in terms of user use cases. (Dunno know whether they would be.) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-20 19:44 A vision for multiple major modes: some design notes Alan Mackenzie 2016-04-20 21:06 ` Drew Adams @ 2016-04-20 22:27 ` Phillip Lord 2016-04-21 9:14 ` Alan Mackenzie 2016-04-21 14:17 ` Eli Zaretskii ` (2 subsequent siblings) 4 siblings, 1 reply; 45+ messages in thread From: Phillip Lord @ 2016-04-20 22:27 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Dmitry Gutov, emacs-devel A few comments, rather than an in-depth analysis, am afraid. Alan Mackenzie <acm@muc.de> writes: > (iv) Islands. > o - An island will be delimited in two complementary ways: > * - It will be enclosed syntactically by characters with "open island" and > "close island" syntax (see section (v)). Both of these syntactic > markers will include a flag "chain" indicating whether there is a > previous/next island in the chain. The cdr of the syntax value will be > the island chain to which the island belongs. > * - It will be covered by the text property `island', whose value will be > the pertinent island or island chain (see section (ii)) (not yet > decided). Note that if islands are enclosed inside other islands, the > value is the innermost island. There is the possibility of using an > interval tree independent of the one for text properties to increase > performance. When you say "complementary" do you mean alternative or simultaneous? I.e. will an island always be enclosed by syntax markers and always have a text property. Or can it have either? I'm still not understanding how the chain of islands is set up. Is this entirely the super modes responsibility? The use of "syntax" suggests that the islands can be detected *purely* syntactically. But, there are many places where this is not true: consider org-mode: #+begin_src emacs-lisp (message "hello world") #+end_src We cannot assume that "+end_src" is the end of a island. Also, how will the regexp engine work when it spans an island? I ask because, if we use the regexp engine to match delimiters, the which syntax do we use, if there are multiple modes in the buffer. > o - An island might be represented by a C or Lisp structure, it might not > (not yet decided). This structure would hold the containing chain, > markers pointing to the start and end of the chain, and the previous and > next islands in the chain. > > (v) Syntax, etc. > o - Two new syntax classes, "open island" and "close island" will be > introduced. These will be designated by the characters "{" and "}". Their > "matching character" slots will contain the island's chain. There will be > an extra flag "chain" (denoted by "i") indicating whether there is a > previous/next island in the chain. > o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as > whitespace, much as they do comments. They will also treat as whitespace > the gap between two islands in a chain. Difficult to say, but this might produce some counter intuitive behaviour. So, for example, consider some text like so: === Example (here is some lisp) ;; This is a long and tedious piece of documentation in my lisp program. (here is some more lisp) === End Example Now moving backward a paragraph will have a significant difference in behaviour -- on the "(" of "here is some more lisp", we move to "(here is some lisp), while on the char before, we move the "This is a long". Good, bad, expected? Don't know. > o - The (currently 11 element) parser state will be enhanced to support > islands as follows: > * - A twelfth element will be introduced. This will contain an > association list whose elements will have the form (island-chain > . 12-element parse state); each element will contain the suspended state > of parsing in the island chain which is the car of the element. An > element with a car of nil will represent the suspended parsing state of > the buffer outside of islands. > * - Elements 12, 13, .... will be island chains of the enclosing islands, > elt 12 being that of the innermost enclosing island, etc. An element > with a value of nil indicates being outside all islands. > o - `parse-partial-sexp' will create and use an enhanced parser state as > described above. Note that a two character construct (such as a C comment > opener) can not enclose an island, and special handling will be required > to exclude this. The syntax table in use will change as the current > position passes between islands. > o - `syntax-ppss' will do the right thing with the extended parser state. > Alternatively, `syntax-ppss' will have an independent 12-element state in > each island chain, where elt. 11 is always nil. Its cache mechanism will > be enhanced such that buffer changes outside of an island chain need not > invalidate the stored cache pertaining to the chain. > o - The facilities in this section are active even when `in-islands' is > nil. > > (vi) Regexps. > o - The regexp engine will be enhanced such that the regexps "\\s-", "\\s ", > and "[[:space:]] will match an entire island. > o - The gap between two islands in a chain will also be matched by the above > regexps. > o - This treatment of an island, and a gap between two islands, as WS will > occur only when `in-islands' is non-nil. > o - When `in-islands' is nil, there will be no reliable way of scanning over > an island by regexps, since it is a potentially nested structure, and FSMs > don't recognise arbitrarily nested structures. > > (vii) Variables. > o - Island chain local variable bindings will come into existence. These > bindings depend on the island point is in. There will be lower level > routines that will have "position" parameters as an alternative to using > point. > o - All variables which are currently buffer local will become chain local > except for those whose symbols are given a non-nil `entire-buffer' > property. There will be no new functions like > `make-chain-local-variable'. What is the default-value of a chain local variable, if the variable is also buffer-local? Will we need functions for setting all chains in a certain mode in a single buffer? Phil ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-20 22:27 ` Phillip Lord @ 2016-04-21 9:14 ` Alan Mackenzie 2016-04-22 12:45 ` Phillip Lord 0 siblings, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2016-04-21 9:14 UTC (permalink / raw) To: Phillip Lord; +Cc: emacs-devel, Dmitry Gutov Hello, Phillip On Wed, Apr 20, 2016 at 11:27:34PM +0100, Phillip Lord wrote: > A few comments, rather than an in-depth analysis, am afraid. Thanks! > Alan Mackenzie <acm@muc.de> writes: > > (iv) Islands. > > o - An island will be delimited in two complementary ways: > > * - It will be enclosed syntactically by characters with "open island" and > > "close island" syntax (see section (v)). Both of these syntactic > > markers will include a flag "chain" indicating whether there is a > > previous/next island in the chain. The cdr of the syntax value will be > > the island chain to which the island belongs. > > * - It will be covered by the text property `island', whose value will be > > the pertinent island or island chain (see section (ii)) (not yet > > decided). Note that if islands are enclosed inside other islands, the > > value is the innermost island. There is the possibility of using an > > interval tree independent of the one for text properties to increase > > performance. > When you say "complementary" do you mean alternative or simultaneous? > I.e. will an island always be enclosed by syntax markers and always have > a text property. Or can it have either? Sorry, that wasn't very clear. It would always have both. The text property would enable the code for chain local variables quickly to determine the current chain. The syntactic markers would enable efficient scanning by parse-partial-sexp, etc. > I'm still not understanding how the chain of islands is set up. Is this > entirely the super modes responsibility? Yes, it would be entirely for the super mode to do. There would be a set of functions to do this, for example: (defun create-island-chain (beg end major-mode ...) ...) (where BEG and END would be the bounds of the first island in the chain). (defun add-island-to-chain (chain beg end ...) ...) (which would delimit (BEG END) as an island, and link it into CHAIN) There would also be functions for removing islands from a chain, etc. I should really have put this into the notes. Thanks! > The use of "syntax" suggests that the islands can be detected *purely* > syntactically. No. It would be up to the super mode to determine them (however is appropriate), then to call, e.g., `create-island-chain' and `add-island-to-chain'. > But, there are many places where this is not true: consider org-mode: > #+begin_src emacs-lisp > (message "hello world") > #+end_src > We cannot assume that "+end_src" is the end of a island. > Also, how will the regexp engine work when it spans an island? I ask > because, if we use the regexp engine to match delimiters, the which > syntax do we use, if there are multiple modes in the buffer. I imagine that the island-start/end syntactic markers would normally be set by the super mode as syntax-table text properties. These always take priority over whatever the current syntax table would say. These markers would be considered to be in the enclosing scope, not part of the island they define. The current syntax table would always be that of the island the current position was in. I suppose there is potential for an island to be recognised as such in the "enclosing scope", but not in the island itself. This could be mitigated against by warning super mode programmers to use island-start/end syntaxes ONLY in syntax-table text properties. The actual matching of an island to "\\s-" would be delegated to the syntax code (as is currently done for "\\s?" expressions). > > o - An island might be represented by a C or Lisp structure, it might not > > (not yet decided). This structure would hold the containing chain, > > markers pointing to the start and end of the chain, and the previous and > > next islands in the chain. > > > > (v) Syntax, etc. > > o - Two new syntax classes, "open island" and "close island" will be > > introduced. These will be designated by the characters "{" and "}". Their > > "matching character" slots will contain the island's chain. There will be > > an extra flag "chain" (denoted by "i") indicating whether there is a > > previous/next island in the chain. > > o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as > > whitespace, much as they do comments. They will also treat as whitespace > > the gap between two islands in a chain. > Difficult to say, but this might produce some counter intuitive > behaviour. So, for example, consider some text like so: > === Example > (here is some lisp) > ;; This is a long and tedious piece of documentation in my lisp program. > (here is some more lisp) > === End Example > Now moving backward a paragraph will have a significant difference in > behaviour -- on the "(" of "here is some more lisp", we move to "(here > is some lisp), while on the char before, we move the "This is a long". > Good, bad, expected? Don't know. Assuming that the comment is set up as an island inside the lisp code (which might not be the Right Thing to do) .... As a user action, moving back that paragraph would move from "(here is some more lisp)" to ";; This is a long ....", since `in-islands' would be nil during command processing. As part of a program's parsing, `in-islands' would be bound to non-nil, and backward-paragraph would move from "(here is some more lisp)" to "(here is some lisp)". This is the intended processing. [ .... ] > > (vii) Variables. > > o - Island chain local variable bindings will come into existence. These > > bindings depend on the island point is in. There will be lower level > > routines that will have "position" parameters as an alternative to using > > point. > > o - All variables which are currently buffer local will become chain local > > except for those whose symbols are given a non-nil `entire-buffer' > > property. There will be no new functions like > > `make-chain-local-variable'. > What is the default-value of a chain local variable, if the variable is > also buffer-local? This would be the (global) default value of the variable. It would not be the buffer-local value. The intention is that the buffer-local value is the value for the portions of the buffer which are not in any islands. > Will we need functions for setting all chains in a certain mode in a > single buffer? I'm not sure what you mean, here. Does "in a certain mode" mean "INTO a certain mode"? If so, setting a major or minor mode in a chain will be able to be done by putting point inside a pertinent island and calling the mode function. Maybe a new function `mapchains' could be useful. > Phil -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-21 9:14 ` Alan Mackenzie @ 2016-04-22 12:45 ` Phillip Lord 0 siblings, 0 replies; 45+ messages in thread From: Phillip Lord @ 2016-04-22 12:45 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel, Dmitry Gutov Alan Mackenzie <acm@muc.de> writes: >> When you say "complementary" do you mean alternative or simultaneous? >> I.e. will an island always be enclosed by syntax markers and always have >> a text property. Or can it have either? > > Sorry, that wasn't very clear. It would always have both. The text > property would enable the code for chain local variables quickly to > determine the current chain. The syntactic markers would enable > efficient scanning by parse-partial-sexp, etc. > >> I'm still not understanding how the chain of islands is set up. Is this >> entirely the super modes responsibility? > > Yes, it would be entirely for the super mode to do. There would be a > set of functions to do this, for example: > > (defun create-island-chain (beg end major-mode ...) ...) (where BEG > and END would be the bounds of the first island in the chain). > > (defun add-island-to-chain (chain beg end ...) ...) (which would > delimit (BEG END) as an island, and link it into CHAIN) > > There would also be functions for removing islands from a chain, etc. I > should really have put this into the notes. Thanks! I think that you would need some good utility functions, and call backs to support this though. Say, I have a mode with some syntactic markers identifing islands. Who has the job of checking that the islands are still the same? I had this problem with "lentic" -- and it's hard work. You need to hook into the various change functions, and sometimes rescan the entire buffer. Using the change functions is a PITA anyway, and easy to get wrong. And, avoiding scanning the whole buffer after every change is good to avoid. Font-lock avoids this, for instance, by getting core Emacs to tell each mode when to re-fontifify different regions. I think you would need something similar. >> Also, how will the regexp engine work when it spans an island? I ask >> because, if we use the regexp engine to match delimiters, the which >> syntax do we use, if there are multiple modes in the buffer. > > I imagine that the island-start/end syntactic markers would normally be > set by the super mode as syntax-table text properties. These always > take priority over whatever the current syntax table would say. These > markers would be considered to be in the enclosing scope, not part of > the island they define. > > The current syntax table would always be that of the island the current > position was in. I suppose there is potential for an island to be > recognised as such in the "enclosing scope", but not in the island > itself. This could be mitigated against by warning super mode > programmers to use island-start/end syntaxes ONLY in syntax-table text > properties. > > The actual matching of an island to "\\s-" would be delegated to the > syntax code (as is currently done for "\\s?" expressions). I am worried about syntax codes in general. Say, we have a syntax like #+begin_src lisp #+end_src Whether "_" is a symbol constituent or not is mode specific. Say, we have a buffer with mixed org-mode and lisp. The regexp we need to identify #+end_src will depend on the mode of the buffer that #+end_src is in. That is the point of course. But, if you are using #+end_src to delineate islands in the first place, then what mode the text is in rather indeterminate -- you cannot guarantee that the islands are in the correct place yet, because this is why you are looking for #+end_src markers. So, you have to build a regexp which does not use char classes which differ between modes. For this to work, I think, you need to be able to say to regexp functions "ignore islands". Binding "in-islands" to nil might work I guess. > >> > o - An island might be represented by a C or Lisp structure, it might not >> > (not yet decided). This structure would hold the containing chain, >> > markers pointing to the start and end of the chain, and the previous and >> > next islands in the chain. >> > >> > (v) Syntax, etc. >> > o - Two new syntax classes, "open island" and "close island" will be >> > introduced. These will be designated by the characters "{" and "}". Their >> > "matching character" slots will contain the island's chain. There will be >> > an extra flag "chain" (denoted by "i") indicating whether there is a >> > previous/next island in the chain. >> > o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as >> > whitespace, much as they do comments. They will also treat as whitespace >> > the gap between two islands in a chain. > >> Difficult to say, but this might produce some counter intuitive >> behaviour. So, for example, consider some text like so: > >> === Example > >> (here is some lisp) > > >> ;; This is a long and tedious piece of documentation in my lisp program. >> (here is some more lisp) > > >> === End Example > >> Now moving backward a paragraph will have a significant difference in >> behaviour -- on the "(" of "here is some more lisp", we move to "(here >> is some lisp), while on the char before, we move the "This is a long". >> Good, bad, expected? Don't know. > > Assuming that the comment is set up as an island inside the lisp code > (which might not be the Right Thing to do) .... > > As a user action, moving back that paragraph would move from "(here is > some more lisp)" to ";; This is a long ....", since `in-islands' would > be nil during command processing. > > As part of a program's parsing, `in-islands' would be bound to non-nil, > and backward-paragraph would move from "(here is some more lisp)" to > "(here is some lisp)". > > This is the intended processing. > > [ .... ] > >> > (vii) Variables. >> > o - Island chain local variable bindings will come into existence. These >> > bindings depend on the island point is in. There will be lower level >> > routines that will have "position" parameters as an alternative to using >> > point. >> > o - All variables which are currently buffer local will become chain local >> > except for those whose symbols are given a non-nil `entire-buffer' >> > property. There will be no new functions like >> > `make-chain-local-variable'. > >> What is the default-value of a chain local variable, if the variable is >> also buffer-local? > > This would be the (global) default value of the variable. It would not > be the buffer-local value. The intention is that the buffer-local value > is the value for the portions of the buffer which are not in any > islands. > >> Will we need functions for setting all chains in a certain mode in a >> single buffer? > > I'm not sure what you mean, here. Does "in a certain mode" mean "INTO a > certain mode"? Oh. Say I have a buffer, half in clojure mode, half in markdown mode. I start cider which connects to a REPL. Currently, cider sets a buffer-local variable called something like "cider-connected-to-repl-p" to "t" to indicate the connection. But, now, we have an island local variable instead. But, surely, if one island is connected to a repl, then all the others should be as well. > If so, setting a major or minor mode in a chain will be > able to be done by putting point inside a pertinent island and calling > the mode function. Maybe a new function `mapchains' could be useful. Yep, that sort of idea. Phil ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-20 19:44 A vision for multiple major modes: some design notes Alan Mackenzie 2016-04-20 21:06 ` Drew Adams 2016-04-20 22:27 ` Phillip Lord @ 2016-04-21 14:17 ` Eli Zaretskii 2016-04-21 21:33 ` Alan Mackenzie 2016-04-21 22:19 ` Alan Mackenzie 2016-04-22 14:33 ` Dmitry Gutov 2016-04-22 18:58 ` Richard Stallman 4 siblings, 2 replies; 45+ messages in thread From: Eli Zaretskii @ 2016-04-21 14:17 UTC (permalink / raw) To: Alan Mackenzie; +Cc: dgutov, emacs-devel > Date: Wed, 20 Apr 2016 19:44:50 +0000 > From: Alan Mackenzie <acm@muc.de> > > This post describes my notion of how multiple major modes {c,sh}ould be > implemented. Key notions are "islands", "island chains", and "chain > local" variable bindings. Thank you for publishing this. A few comments and questions below. Please keep in mind that I never had to write any Lisp that deals with these issues, so apologies in advance for possibly silly questions and misunderstandings. > o - To the user, the current major mode will be that of the island where > point is. All familiar commands will work without restriction. Does this mean the display of mode line, menu bar, and tool bar will change accordingly? A more subtle issue is with point movements that are not shown to the user (those done by Lisp code of some command, before redisplay kicks in) -- what will be the effect of those? do they trigger redisplay, for example? > o - An island chain will have @dfn{chain local} variable bindings. Such a > binding will become current and accessible when point is within one of the > chain's islands. When point is not in an island, the buffer local binding > of the variable will be current. Emacs sometimes examines buffer text without moving point, and we generally expect for buffer-local bindings to be in effect regardless. A prominent example is the display engine. I will return to that later. > * - [Island] will be covered by the text property `island', whose value will be > the pertinent island or island chain (see section (ii)) (not yet > decided). Note that if islands are enclosed inside other islands, the > value is the innermost island. There is the possibility of using an > interval tree independent of the one for text properties to increase > performance. I don't understand the notion of "enclosed" islands: wouldn't such "enclosing" simply break the "outer" island into two separate islands? > o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as > whitespace, much as they do comments. They will also treat as whitespace > the gap between two islands in a chain. Why whitespace? why not some new category? By overloading whitespace, you make things harder on the underlying infrastructure, like regexp search and matching. > o - The regexp engine will be enhanced such that the regexps "\\s-", "\\s ", > and "[[:space:]] will match an entire island. Extending [:space:] that way seems to be an implementation detail leaking to user level. I think we should avoid that at all costs. > o - The gap between two islands in a chain will also be matched by the above > regexps. > o - This treatment of an island, and a gap between two islands, as WS will > occur only when `in-islands' is non-nil. > o - When `in-islands' is nil, there will be no reliable way of scanning over > an island by regexps, since it is a potentially nested structure, and FSMs > don't recognise arbitrarily nested structures. > (vii) Variables. > o - Island chain local variable bindings will come into existence. These > bindings depend on the island point is in. There will be lower level > routines that will have "position" parameters as an alternative to using > point. > o - All variables which are currently buffer local will become chain local > except for those whose symbols are given a non-nil `entire-buffer' > property. There will be no new functions like > `make-chain-local-variable'. > o - When the `entire-buffer' property is nil, the buffer local binding of a > variable will hold the value pertinent to the areas of the buffer outside > of islands. When that property is non-nil, the binding holds the value > for the entire buffer. > o - When `in-islands' is nil, the chain local mechanism described here is > not used - instead the familiar buffer local binding is used. > o - The current binding for a local variable will be the chain local binding > of the island chain of the island containing point. If point is not in an > island, the buffer local binding is current. > o - If a chain local binding is current, and its value is unbound, the > binding of an enclosing scope is NOT used in its place. Probably the > variable's default-value should be used when reading. > o - In buffer.h, a new macro CVAR ("island chain variable") analogous to > BVAR will be introduced. It will use BVAR as a fall back. Most > invocations of BVAR will be changed to CVAR. > o - In data.c, the mechanism for accessing local variable bindings > (e.g. `swap_in_symval_forwarding') will be enhanced to test `in-islands' > and handle chain local bindings appropriately. I'm not sure I understand the details. E.g., where will the island-chain local values be stored? To remind you, buffer-local variables have a special object in their symbol value cell, and BVAR only works for the few buffer-local variables that are stored in the buffer object itself. I'm not sure I understand how CVAR could solve the problem you need to solve, which is keeping multiple chains per buffer, each one with its values of these variables. > (ix) Miscellaneous commands and functions. > o - `point-min' and `point-max' will, when `in-islands' is non-nil, return > the max/min point in the visible region in the same chain of islands as > point. > o - `search-\(forward\|backward\)\(-regexp\)?' will restrict themselves to > the current island chain when `in-islands' is non-nil. > o - `skip-\(chars\|syntax\)-\(forward\|backward\)' will likewise operate in > the current island chain (how?) when `in-islands' is non-nil. > o - `\(next\|previous\)-\(single\|char\)-property-change', etc., will do the > Right Thing in island chains when `in-islands' is non-nil. > o - New functions `island-min', `island-max', `island-chain-min' and > `island-chain-max' will do what their names say. > o - There will be no restrictions on the use of widening/narrowing, as have > been proposed for other support engines for multiple major modes. > o - New commands like `beginning-of-island', `narrow-to-island', etc. will > be wanted. More difficultly, bindings for them will be needed. > o - ??? Other commands to be amended. This actually sounds like a simple extension of narrowing, so I wonder why do we need so many new object types and notions. > (x) Emacs subsystems and `in-islands'. > o - Redisplay will bind `in-islands' to non-nil, but will successfully > display all islands wholly or partially in windows being displayed. > o - Font Lock will bind `in-islands' to non-nil, but will successfully > fontify all pertinent islands. > o - `island-before/after-change-function' will be called with `in-islands' > nil. > o - `before/after-change-functions' will be called with `in-islands' bound > to non-nil. > o - Major modes will need to bind `in-islands' to non-nil for such things as > indentation. > o - For normal user interaction, `in-islands' will be nil. I don't see any discussion of how redisplay will deal with islands. To remind you, redisplay moves through portions of the buffer, without moving point, and access buffer-local variables for its job. You need to augment the design with something that will allow redisplay see the correct values of variables depending on the buffer position it is at. The same problem exists for any features that use display simulation for making decisions about movement and layout, e.g. vertical-motion. More generally, perhaps it will help if you publish the rationale for at least the main points of this design, discussing possible alternatives and explaining why you ended up with the one you present as the design decision. This could help us see the main issues that are to be dealt with, and perhaps suggest better ways of dealing with them. Seeing just the final product of the design tends to limit the discussions to low-level details, which could easily miss the broader picture and issues. Thanks. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-21 14:17 ` Eli Zaretskii @ 2016-04-21 21:33 ` Alan Mackenzie 2016-04-21 22:01 ` Drew Adams ` (2 more replies) 2016-04-21 22:19 ` Alan Mackenzie 1 sibling, 3 replies; 45+ messages in thread From: Alan Mackenzie @ 2016-04-21 21:33 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, dgutov Hello, Eli. I'll get a fuller reply to you later. But for now.... On Thu, Apr 21, 2016 at 05:17:09PM +0300, Eli Zaretskii wrote: > > Date: Wed, 20 Apr 2016 19:44:50 +0000 > > From: Alan Mackenzie <acm@muc.de> > > > > This post describes my notion of how multiple major modes {c,sh}ould be > > implemented. Key notions are "islands", "island chains", and "chain > > local" variable bindings. > Thank you for publishing this. A few comments and questions below. > Please keep in mind that I never had to write any Lisp that deals with > these issues, so apologies in advance for possibly silly questions and > misunderstandings. [ .... ] > More generally, perhaps it will help if you publish the rationale for > at least the main points of this design, discussing possible > alternatives and explaining why you ended up with the one you present > as the design decision. This could help us see the main issues that > are to be dealt with, and perhaps suggest better ways of dealing with > them. Seeing just the final product of the design tends to limit the > discussions to low-level details, which could easily miss the broader > picture and issues. It would be nice if Emacs supported several major modes in a buffer, not just by awkward workarounds, but fully and natively. There's no magic involved in the emergence of the design - it's basically a naive vision of how things should be, given the current state of Emacs. The essence of major mode support is buffer local variables. (Things like the syntax table and local key map are basically buffer local variables, even though they are not accessible as such from Lisp.) So, at first sight, each "island" in the buffer needs its own set of "buffer local" variables. However, a set of variable bindings is a big overhead in terms of RAM, so it would make sense, wherever possible, to share these bindings between islands with the same major mode. Furthermore, in some use cases, there are sequences of islands which are in essence a single stream of text. It thus makes sense to have "chains of islands", all islands in a chain sharing the "chain local" variable bindings. There might be a need for actual "island local" variables, with a separate value in each island. However, Dmitry and I were unable to identify any such variables in an earlier thread on emacs-devel. If any such variables became apparent, then would be the time to work out how to implement them. The parts of a buffer which are not in any island (we won't call these "the ocean" ;-) also need their own variable bindings. It seems to make sense to use the standard buffer local bindings for these, since there would otherwise be no use for them. An alternative would be to construe these regions as being islands in their own right, in their own island chain. However, that would fit badly with the syntactic delimiters for islands (see below). The above applies to most variables which are currently buffer local. However, there are some such variables which are intrinsically to do with the whole buffer, not individual islands within it. These include `buffer-undo-list', the mark, `mark-ring', ..... They must be marked as belonging to the whole buffer, and handled as such, hence the `entire-buffer' property applied to their symbols. How do we implement chain local variable bindings? Why not base them on the implementation of buffer local bindings? Some buffer local variables are fixed slots in the struct buffer, the rest are elements in an association list in the struct buffer. Until there's a better idea, we copy this scheme for chain local variables; the fixed slot variables, currently accessed by the BVAR macro could instead get a somewhat more involved macro called "CVAR" which will somehow use the current position (whatever that means) to select the pertinent struct chain or the familiar struct buffer. Given a buffer position, we need to be able to find the corresponding island chain. "Obviously", we do this with a text property, which we might as well call `island', or possibly `chain'. Since successive accesses to chain local variables are very likely to be in the same chain most of the time, we will cache the "current" chain in buffer local variables. We want `parse-partial-sexp' and friends to work "properly" wrt islands. It is immediately clear that the syntactic context of each island chain is independent of other chains and of the regions outside islands. It is also clear that the syntactic context at the end of an island should be preserved and used as the starting value at the start of the next island in the same chain. It thus seems sensible to introduce new syntactic classes "open island" and "close island" to facilitate this. Why not give them the characters "{" and "}", which are currently unused? This method of delimiting islands does, however, force us to deal with nested islands. Clearly, our parser state must be amended to deal with these stacked and suspended states. It is currently unclear whether `syntax-ppss' needs to return this amended state, or whether the simple "state within the chain" would be adequate. It is clear that syntactic commands such as `forward-list' (C-M-n) must confine their operation to a single island chain. When it comes to movement and search primitives, we want to adapt these so that the impact on existing major modes is minimised. Ideally, we would want major modes to "see" only their own islands (or lack thereof). Thus we treat irrelevant islands as blocks of whitespace. It seems to make sense to have such islands matched by subexpressions in regexps which match spaces. This would obviate the need to amend a great number of regexps currently coded in major modes. On the other hand, when a user does C-s or C-M-s, the Right Thing is surely to search the buffer as a whole, without regard to islands. We therefore need a flag which instructs the primitives how to behave when there are islands. We might as well call this flag `in-islands', for want of a better name. The user will, from time to time, delete the delimiters which define islands, and will insert other ones. The super mode needs to be able to react to these actions, amending its island chains appropriately. I have not been able to come up with an adequate scheme for this using only before/after-change-functions. These variables are going to be chain local, and the buffer local values will hold functions for the buffer regions not in islands. So we introduce `island-before/after-change-function', entire-buffer local variables, each of which will hold a single function intended for adjusting island chains. Their return values will direct Emacs which islands need `before/after-change-functions' invoking on them. To minimise changes to major modes, quite a few primitives (such as `skip-syntax-forward' and `next-single-property-change') will be amended to restrict themselves to island chains when `in-islands' is bound to non-nil. Several Emacs subsystems will need enhancement, in particular redisplay and font-lock. Sorry this has turned out so long, so pedestrian, and so boring. :-( As promised, I have had no magic insights, no sparkling innovations in drawing up these notes - just a sequence of humdrum decisions, one after the other. If I've missed out anything relevant, please say so, then I can try and fill in the gap. It's also clear that what I'm proposing can't be implemented in a couple of weekends - it would be a long hard grind. But it would enable super modes to be written with comparative ease. > Thanks. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* RE: A vision for multiple major modes: some design notes 2016-04-21 21:33 ` Alan Mackenzie @ 2016-04-21 22:01 ` Drew Adams 2016-04-22 8:13 ` Alan Mackenzie 2016-04-22 9:04 ` Eli Zaretskii 2016-06-13 21:17 ` John Wiegley 2 siblings, 1 reply; 45+ messages in thread From: Drew Adams @ 2016-04-21 22:01 UTC (permalink / raw) To: Alan Mackenzie, Eli Zaretskii; +Cc: dgutov, emacs-devel [More interesting details. Thx.] > Given a buffer position, we need to be able to find the corresponding > island chain. "Obviously", we do this with a text property, which we > might as well call `island', or possibly `chain'. Since successive > accesses to chain local variables are very likely to be in the same > chain most of the time, we will cache the "current" chain in buffer > local variables. I guess you are referring to the possibility of more than one chain having an island at point, and wanting to pick up the right one as the "current" chain - so you check a text property, which identifies the chain that is currently active. Is that right? > When it comes to movement and search primitives, we want to adapt these > so that the impact on existing major modes is minimised. Ideally, we > would want major modes to "see" only their own islands (or lack > thereof). Thus we treat irrelevant islands as blocks of whitespace. It > seems to make sense to have such islands matched by subexpressions in > regexps which match spaces. This would obviate the need to amend a > great number of regexps currently coded in major modes. For search, at least, I don't see why you don't make use of `isearch-filter-predicate'. That's what I do in my code, to search only within (or without: complement) a set of zones (~chain of islands). That seems simple and cheap. [I also optionally dim the non-islands during search (or the non-non-islands, if complementing), so the areas being searched stand out more.] > On the other hand, when a user does C-s or C-M-s, the Right Thing is > surely to search the buffer as a whole, without regard to islands. We > therefore need a flag which instructs the primitives how to behave when > there are islands. We might as well call this flag `in-islands', for > want of a better name. `isearch-filter-predicate'. It can let code know whether you are island-searching or not. > The user will, from time to time, delete the delimiters which define > islands, and will insert other ones. FWIW, markers as delimiters do not have that problem. [The `isearch-prop.el' code can use zones defined by either their limits (e.g., markers) or text or overlay properties on their text. It lets commands like `query-replace' do similarly.] ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-21 22:01 ` Drew Adams @ 2016-04-22 8:13 ` Alan Mackenzie 2016-04-22 17:04 ` Drew Adams 0 siblings, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2016-04-22 8:13 UTC (permalink / raw) To: Drew Adams; +Cc: Eli Zaretskii, dgutov, emacs-devel Hello, Drew. On Thu, Apr 21, 2016 at 03:01:12PM -0700, Drew Adams wrote: > [More interesting details. Thx.] > > Given a buffer position, we need to be able to find the corresponding > > island chain. "Obviously", we do this with a text property, which we > > might as well call `island', or possibly `chain'. Since successive > > accesses to chain local variables are very likely to be in the same > > chain most of the time, we will cache the "current" chain in buffer > > local variables. > I guess you are referring to the possibility of more than one > chain having an island at point, and wanting to pick up the right > one as the "current" chain - so you check a text property, which > identifies the chain that is currently active. Is that right? Er, no. ;-) Even when there is only one island at point, we still need to determine it. A text property is a good way of doing this. > > When it comes to movement and search primitives, we want to adapt these > > so that the impact on existing major modes is minimised. Ideally, we > > would want major modes to "see" only their own islands (or lack > > thereof). Thus we treat irrelevant islands as blocks of whitespace. It > > seems to make sense to have such islands matched by subexpressions in > > regexps which match spaces. This would obviate the need to amend a > > great number of regexps currently coded in major modes. > For search, at least, I don't see why you don't make use of > `isearch-filter-predicate'. That's what I do in my code, to > search only within (or without: complement) a set of zones > (~chain of islands). That seems simple and cheap. Thanks, I didn't know about that variable. But it may not be widely applicable enough. > [I also optionally dim the non-islands during search (or the > non-non-islands, if complementing), so the areas being searched > stand out more.] That's another matter, at a different level of abstraction from the main topic. > > On the other hand, when a user does C-s or C-M-s, the Right Thing is > > surely to search the buffer as a whole, without regard to islands. We > > therefore need a flag which instructs the primitives how to behave when > > there are islands. We might as well call this flag `in-islands', for > > want of a better name. > `isearch-filter-predicate'. It can let code know whether > you are island-searching or not. That would only work for isearch. > > The user will, from time to time, delete the delimiters which define > > islands, and will insert other ones. > FWIW, markers as delimiters do not have that problem. I think they do. What happens when you have two islands bounded by four markers, and you delete a region containing the two middle markers; MaaaaaaaaaaaM MbbbbbbbbbbbbbM dddddddddddddddddd ? You might well not want the two islands a and b to be coalesced. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* RE: A vision for multiple major modes: some design notes 2016-04-22 8:13 ` Alan Mackenzie @ 2016-04-22 17:04 ` Drew Adams 0 siblings, 0 replies; 45+ messages in thread From: Drew Adams @ 2016-04-22 17:04 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Eli Zaretskii, dgutov, emacs-devel > > For search, at least, I don't see why you don't make use of > > `isearch-filter-predicate'. That's what I do in my code, to > > search only within (or without: complement) a set of zones > > (~chain of islands). That seems simple and cheap. > > Thanks, I didn't know about that variable. But it may not be > widely applicable enough. I guess you're referring to point movements, among other things. `isearch-filter-predicate', or similar, could presumably be made so (more widely applicable). It is also used by `perform-replace' (e.g. `query-replace'), BTW - not just search. The point is that a predicate is more general than a regexp, and it doesn't interfere with the use of a regexp (and vice versa). > > > On the other hand, when a user does C-s or C-M-s, the Right Thing is > > > surely to search the buffer as a whole, without regard to islands. We > > > therefore need a flag which instructs the primitives how to behave > > > when there are islands. > > > > `isearch-filter-predicate'. It can let code know whether > > you are island-searching or not. > > That would only work for isearch. Not if other code takes it into account. It only worked for search until `perform-replace' started taking it into account. > > > The user will, from time to time, delete the delimiters > > > which define islands, and will insert other ones. > > > FWIW, markers as delimiters do not have that problem. > > I think they do. What happens when you have two islands bounded by four > markers, and you delete a region containing the two middle markers; > > MaaaaaaaaaaaM MbbbbbbbbbbbbbM > dddddddddddddddddd > > ? You might well not want the two islands a and b to be coalesced. What's the alternative? If you're worried about different modes (for example) for aaaaaa and bbbbbb then consider keeping lists of markers per mode (or whatever) - like we sometimes handle overlays using one or more lists. Anyway, it was only a "FWIW". I use both text properties and markers. There are advantages and disadvantages to any implementation. Also, where I use markers I allow extra info in a given zone, in addition to the markers: ;; A "basic zone" is a list of two buffer positions followed ;; by a possibly empty list of extra information: ;; (POS1 POS2 . EXTRA). IOW, some info is location-specific (buffer and position), and other info (EXTRA) is zone-specific. In your scenario, if a zone's second marker is deleted then code could decide, based on whatever (including whether or not aaaaaaaa and bbbbbbbb are in the same mode or have compatible "EXTRA" island info), whether to: coalesce them, delete them (as islands, not the text), or keep them separate. The point is that the code can do anything. But yes, a single marker does not record more than a buffer and a position. I think, however, that the additional info you are wanting to associate here is really (typically, at least) info to associate with the island, and not info to associate with an individual marker. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-21 21:33 ` Alan Mackenzie 2016-04-21 22:01 ` Drew Adams @ 2016-04-22 9:04 ` Eli Zaretskii 2016-06-13 21:17 ` John Wiegley 2 siblings, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2016-04-22 9:04 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel, dgutov > Date: Thu, 21 Apr 2016 21:33:23 +0000 > Cc: dgutov@yandex.ru, emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > I'll get a fuller reply to you later. But for now.... Thanks. > Given a buffer position, we need to be able to find the corresponding > island chain. "Obviously", we do this with a text property, which we > might as well call `island', or possibly `chain'. Why text properties? Have you considered a hash table? I think it will be faster, and will also avoid a few complications that text properties bring as baggage you don't necessarily want (like what happens when you copy text to another location or buffer). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-21 21:33 ` Alan Mackenzie 2016-04-21 22:01 ` Drew Adams 2016-04-22 9:04 ` Eli Zaretskii @ 2016-06-13 21:17 ` John Wiegley 2016-06-14 13:13 ` Alan Mackenzie 2 siblings, 1 reply; 45+ messages in thread From: John Wiegley @ 2016-06-13 21:17 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Eli Zaretskii, dgutov, emacs-devel >>>>> Alan Mackenzie <acm@muc.de> writes: > The essence of major mode support is buffer local variables. (Things like > the syntax table and local key map are basically buffer local variables, > even though they are not accessible as such from Lisp.) So, at first sight, > each "island" in the buffer needs its own set of "buffer local" variables. I don't agree that this is the essence of major mode support. Another aspect of major modes is an expectation of which text properties might occur throughout the buffer, and where and why. -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-06-13 21:17 ` John Wiegley @ 2016-06-14 13:13 ` Alan Mackenzie 2016-06-14 16:27 ` John Wiegley 0 siblings, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2016-06-14 13:13 UTC (permalink / raw) To: emacs-devel Hello, John. Trust you've had a good holiday! On Mon, Jun 13, 2016 at 02:17:40PM -0700, John Wiegley wrote: > >>>>> Alan Mackenzie <acm@muc.de> writes: > > The essence of major mode support is buffer local variables. (Things like > > the syntax table and local key map are basically buffer local variables, > > even though they are not accessible as such from Lisp.) So, at first sight, > > each "island" in the buffer needs its own set of "buffer local" variables. > I don't agree that this is the essence of major mode support. Another aspect > of major modes is an expectation of which text properties might occur > throughout the buffer, and where and why. OK. Shall we agree that the buffer local variables are a crucially important part of what constitutes a major mode? :-) Clearly text properties are important (indeed, in the case of, e.g., CC Mode critically important) too. > -- > John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F > http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-06-14 13:13 ` Alan Mackenzie @ 2016-06-14 16:27 ` John Wiegley 0 siblings, 0 replies; 45+ messages in thread From: John Wiegley @ 2016-06-14 16:27 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel >>>>> Alan Mackenzie <acm@muc.de> writes: >> I don't agree that this is the essence of major mode support. Another >> aspect of major modes is an expectation of which text properties might >> occur throughout the buffer, and where and why. > OK. Shall we agree that the buffer local variables are a crucially important > part of what constitutes a major mode? :-) Clearly text properties are > important (indeed, in the case of, e.g., CC Mode critically important) too. I'm sorry, after reading this again today I'm not sure why my reaction sounded so strong. Surely buffer local variables are a key, essential component to the picture. There is a "context" that defines what a mode is (buffer local vars, text properties, event bindings, etc), and we need a way of scoping such contexts within buffers, which you've begun describing in your proposal. -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-21 14:17 ` Eli Zaretskii 2016-04-21 21:33 ` Alan Mackenzie @ 2016-04-21 22:19 ` Alan Mackenzie 2016-04-22 8:48 ` Eli Zaretskii 2016-04-22 13:42 ` Andy Moreton 1 sibling, 2 replies; 45+ messages in thread From: Alan Mackenzie @ 2016-04-21 22:19 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, dgutov Hello, Eli. On Thu, Apr 21, 2016 at 05:17:09PM +0300, Eli Zaretskii wrote: > > Date: Wed, 20 Apr 2016 19:44:50 +0000 > > From: Alan Mackenzie <acm@muc.de> > > > > This post describes my notion of how multiple major modes {c,sh}ould be > > implemented. Key notions are "islands", "island chains", and "chain > > local" variable bindings. > Thank you for publishing this. A few comments and questions below. > Please keep in mind that I never had to write any Lisp that deals with > these issues, so apologies in advance for possibly silly questions and > misunderstandings. > > o - To the user, the current major mode will be that of the island where > > point is. All familiar commands will work without restriction. > Does this mean the display of mode line, menu bar, and tool bar will > change accordingly? Yes, please! > A more subtle issue is with point movements that are not shown to the > user (those done by Lisp code of some command, before redisplay kicks > in) -- what will be the effect of those? do they trigger redisplay, > for example? They shouldn't trigger redisplay, no. > > o - An island chain will have @dfn{chain local} variable bindings. Such a > > binding will become current and accessible when point is within one of the > > chain's islands. When point is not in an island, the buffer local binding > > of the variable will be current. > Emacs sometimes examines buffer text without moving point, and we > generally expect for buffer-local bindings to be in effect regardless. > A prominent example is the display engine. I will return to that > later. OK. > > * - [Island] will be covered by the text property `island', whose value will be > > the pertinent island or island chain (see section (ii)) (not yet > > decided). Note that if islands are enclosed inside other islands, the > > value is the innermost island. There is the possibility of using an > > interval tree independent of the one for text properties to increase > > performance. > I don't understand the notion of "enclosed" islands: wouldn't such > "enclosing" simply break the "outer" island into two separate islands? If we mark island start and end with the syntax-table text properties "{" and "}", we're going to have something like { a{ }b } . Simply to break the outer island into two pieces, we'd really need to apply delimiters at a and b, giving: { }{ }{ } . This would overwrite the previous syntaxes at a and b, and this might be a Bad Thing. > > o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as > > whitespace, much as they do comments. They will also treat as whitespace > > the gap between two islands in a chain. > Why whitespace? why not some new category? By overloading whitespace, > you make things harder on the underlying infrastructure, like regexp > search and matching. I think it's clear that the "foreign" island's syntax has no interaction with the current island. If we treat it as whitespace, that should minimise the amount of adapting we need to do to existing major modes. I envisage that a regexp element will match the "foreign" island if that element would match a space. I know this sounds horrible, but I haven't come up with a scenario where this wouldn't work well. (This is assuming, of course, that the magic flag `in-islands' is non-nil.) > > o - The regexp engine will be enhanced such that the regexps "\\s-", "\\s ", > > and "[[:space:]] will match an entire island. > Extending [:space:] that way seems to be an implementation detail > leaking to user level. I think we should avoid that at all costs. Why? I don't understand your last paragraph. > > o - The gap between two islands in a chain will also be matched by the above > > regexps. > > o - This treatment of an island, and a gap between two islands, as WS will > > occur only when `in-islands' is non-nil. > > o - When `in-islands' is nil, there will be no reliable way of scanning over > > an island by regexps, since it is a potentially nested structure, and FSMs > > don't recognise arbitrarily nested structures. > > (vii) Variables. > > o - Island chain local variable bindings will come into existence. These > > bindings depend on the island point is in. There will be lower level > > routines that will have "position" parameters as an alternative to using > > point. > > o - All variables which are currently buffer local will become chain local > > except for those whose symbols are given a non-nil `entire-buffer' > > property. There will be no new functions like > > `make-chain-local-variable'. > > o - When the `entire-buffer' property is nil, the buffer local binding of a > > variable will hold the value pertinent to the areas of the buffer outside > > of islands. When that property is non-nil, the binding holds the value > > for the entire buffer. > > o - When `in-islands' is nil, the chain local mechanism described here is > > not used - instead the familiar buffer local binding is used. > > o - The current binding for a local variable will be the chain local binding > > of the island chain of the island containing point. If point is not in an > > island, the buffer local binding is current. > > o - If a chain local binding is current, and its value is unbound, the > > binding of an enclosing scope is NOT used in its place. Probably the > > variable's default-value should be used when reading. > > o - In buffer.h, a new macro CVAR ("island chain variable") analogous to > > BVAR will be introduced. It will use BVAR as a fall back. Most > > invocations of BVAR will be changed to CVAR. > > o - In data.c, the mechanism for accessing local variable bindings > > (e.g. `swap_in_symval_forwarding') will be enhanced to test `in-islands' > > and handle chain local bindings appropriately. > I'm not sure I understand the details. E.g., where will the > island-chain local values be stored? In a C struct chain, analogous to struct buffer, using much the same mechanisms. > To remind you, buffer-local variables have a special object in their > symbol value cell, and BVAR only works for the few buffer-local > variables that are stored in the buffer object itself. I'm not sure I > understand how CVAR could solve the problem you need to solve, which > is keeping multiple chains per buffer, each one with its values of > these variables. CVAR would get the current chain from the `island' (or `chain') text property at the position. If this is nil, it would do what BVAR does. Otherwise it would access the appropriate named element in the struct chain. I think CVAR would take three parameters: the variable name, the buffer, and the buffer position. Other chain local variables would be accessed through an alist in the struct chain holding miscellaneous variables, exactly as is done for the other buffer local variables in struct buffer. Unless there is a better solution, of course. > > (ix) Miscellaneous commands and functions. > > o - `point-min' and `point-max' will, when `in-islands' is non-nil, return > > the max/min point in the visible region in the same chain of islands as > > point. > > o - `search-\(forward\|backward\)\(-regexp\)?' will restrict themselves to > > the current island chain when `in-islands' is non-nil. > > o - `skip-\(chars\|syntax\)-\(forward\|backward\)' will likewise operate in > > the current island chain (how?) when `in-islands' is non-nil. > > o - `\(next\|previous\)-\(single\|char\)-property-change', etc., will do the > > Right Thing in island chains when `in-islands' is non-nil. > > o - New functions `island-min', `island-max', `island-chain-min' and > > `island-chain-max' will do what their names say. > > o - There will be no restrictions on the use of widening/narrowing, as have > > been proposed for other support engines for multiple major modes. > > o - New commands like `beginning-of-island', `narrow-to-island', etc. will > > be wanted. More difficultly, bindings for them will be needed. > > o - ??? Other commands to be amended. > This actually sounds like a simple extension of narrowing, so I wonder > why do we need so many new object types and notions. I think it's more like a complicated extension of narrowing. :-) I think that chain local variables are essential to multiple major modes - you can't have m.m.m. without some sort of chain locality. I also think that for a major mode to work transparently over several chained islands, all the irrelevant stuff between the islands needs to be made, er, transparent. That is what section (ix) is about. > > (x) Emacs subsystems and `in-islands'. > > o - Redisplay will bind `in-islands' to non-nil, but will successfully > > display all islands wholly or partially in windows being displayed. > > o - Font Lock will bind `in-islands' to non-nil, but will successfully > > fontify all pertinent islands. > > o - `island-before/after-change-function' will be called with `in-islands' > > nil. > > o - `before/after-change-functions' will be called with `in-islands' bound > > to non-nil. > > o - Major modes will need to bind `in-islands' to non-nil for such things as > > indentation. > > o - For normal user interaction, `in-islands' will be nil. > I don't see any discussion of how redisplay will deal with islands. > To remind you, redisplay moves through portions of the buffer, without > moving point, and access buffer-local variables for its job. You need > to augment the design with something that will allow redisplay see the > correct values of variables depending on the buffer position it is at. > The same problem exists for any features that use display simulation > for making decisions about movement and layout, e.g. vertical-motion. I think redisplay is mostly controlled by variables (such as `scroll-margin') accessed by BVAR. These calls could be replaced by CVAR. Problems will arise if redisplay reads the variable once, and fails to read it again when its current position moves into or out of an island. Redisplay would have to be aware of island boundaries, and re-read the controlling variables on passing a boundary. Other than that, I can't see any big problems. Not yet, anyway. [ .... ] > Thanks. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-21 22:19 ` Alan Mackenzie @ 2016-04-22 8:48 ` Eli Zaretskii 2016-04-22 22:35 ` Alan Mackenzie 2016-04-22 13:42 ` Andy Moreton 1 sibling, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2016-04-22 8:48 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel, dgutov > Date: Thu, 21 Apr 2016 22:19:43 +0000 > Cc: dgutov@yandex.ru, emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > > A more subtle issue is with point movements that are not shown to the > > user (those done by Lisp code of some command, before redisplay kicks > > in) -- what will be the effect of those? do they trigger redisplay, > > for example? > > They shouldn't trigger redisplay, no. But if that code calls sit-for or somesuch, they will, and the result will be flickering. But that's not a very important issue. > > > * - [Island] will be covered by the text property `island', whose value will be > > > the pertinent island or island chain (see section (ii)) (not yet > > > decided). Note that if islands are enclosed inside other islands, the > > > value is the innermost island. There is the possibility of using an > > > interval tree independent of the one for text properties to increase > > > performance. > > > I don't understand the notion of "enclosed" islands: wouldn't such > > "enclosing" simply break the "outer" island into two separate islands? > > If we mark island start and end with the syntax-table text properties > "{" and "}", we're going to have something like > > { a{ }b } > > . Simply to break the outer island into two pieces, we'd really need to > apply delimiters at a and b, giving: > > { }{ }{ } > > . This would overwrite the previous syntaxes at a and b, and this might > be a Bad Thing. We could design the stuff so that Bad Things won't happen. I consider this nesting of islands a (possibly unnecessary) complications that we shouldn't accept unless we have a very good reason. Nesting immediately requires a plethora of operations that are otherwise not necessary. > > > o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as > > > whitespace, much as they do comments. They will also treat as whitespace > > > the gap between two islands in a chain. > > > Why whitespace? why not some new category? By overloading whitespace, > > you make things harder on the underlying infrastructure, like regexp > > search and matching. > > I think it's clear that the "foreign" island's syntax has no interaction > with the current island. This is not a contradiction to what I suggested. The new category could be treated the same as whitespace, in its effect on syntax-related issues. By contrast, having whitespace regexp class be indistinguishable from an island probably means complications on a very low level of matching regular expressions and syntax constructs, something that I fear will get in the way. > If we treat it as whitespace, that should minimise the amount of > adapting we need to do to existing major modes. We need to consider the amount of adaptations in the low-level infrastructure code as well, not only on the application level. > I envisage that a regexp element will match the "foreign" island if that > element would match a space. I know this sounds horrible, but I haven't > come up with a scenario where this wouldn't work well. And I say this is a bomb waiting to go off. It is relatively easy to add a new regexp construct for an island (e.g., we already support categories in regexps, so just defining a category is one easy way), and treat that as whitespace, while still keeping our options open to make it behave slightly differently if needed, and still allowing the applications to specify one, but not the other. By contrast, if we decide that whitespace matches an island, we are opening a giant can of worms. Here's one worm out of that can: some low-level operations need to search the buffer using regexps disregarding any narrowing -- what you suggest means these operations cannot safely use whitespace in their regexps. This is something to stay away of, IMO. > > Extending [:space:] that way seems to be an implementation detail > > leaking to user level. I think we should avoid that at all costs. > > Why? I don't understand your last paragraph. See above. [:space:] is something used a lot in Lisp applications, so we leak the implementation of islands to that level: from now on, each Lisp application will need to consider the possibility that searching for [:space:] will find an island, something that might have no relation to whitespace. > > I'm not sure I understand the details. E.g., where will the > > island-chain local values be stored? > > In a C struct chain, analogous to struct buffer, using much the same > mechanisms. What object(s) will that chain be rooted at? And how will it be related to its buffer? > > To remind you, buffer-local variables have a special object in their > > symbol value cell, and BVAR only works for the few buffer-local > > variables that are stored in the buffer object itself. I'm not sure I > > understand how CVAR could solve the problem you need to solve, which > > is keeping multiple chains per buffer, each one with its values of > > these variables. > > CVAR would get the current chain from the `island' (or `chain') text > property at the position. If it is stored in the text property, then you will have to decide what happens when text is copied and yanked elsewhere. > If this is nil, it would do what BVAR does. Once again, BVAR only handles variables that are part of the buffer object itself. The other buffer-local variables (which are the majority) are handled as part of switching the buffer, and the C code simply refers to them by name. So BVAR is not necessarily the correct model for what you are designing. > Otherwise it would access the appropriate named element in the struct > chain. I think CVAR would take three parameters: the variable name, the > buffer, and the buffer position. Can you show a pseudo-code of CVAR? I'm afraid I'm missing something here, because I don't see clearly what you have in mind. > Other chain local variables would be accessed through an alist in the > struct chain holding miscellaneous variables, exactly as is done for > the other buffer local variables in struct buffer. There's no such alist in how we access buffer-local variables, not AFAIK. Again, I must be missing something here. > > This actually sounds like a simple extension of narrowing, so I wonder > > why do we need so many new object types and notions. > > I think it's more like a complicated extension of narrowing. :-) It's simple because instead of one region you have more than one, and the user-level commands don't affect them. All the other changes are exact reproduction of what narrowing does. > I think that chain local variables are essential to multiple major > modes - you can't have m.m.m. without some sort of chain locality. What is "chain locality"? > I also think that for a major mode to work transparently over > several chained islands, all the irrelevant stuff between the > islands needs to be made, er, transparent. Yes, but how is that related to my comment about extending narrowing? > > I don't see any discussion of how redisplay will deal with islands. > > To remind you, redisplay moves through portions of the buffer, without > > moving point, and access buffer-local variables for its job. You need > > to augment the design with something that will allow redisplay see the > > correct values of variables depending on the buffer position it is at. > > The same problem exists for any features that use display simulation > > for making decisions about movement and layout, e.g. vertical-motion. > > I think redisplay is mostly controlled by variables (such as > `scroll-margin') accessed by BVAR. These calls could be replaced by > CVAR. That's not the whole story; once again, you forget about buffer-local variables that are not part of the buffer object; BVAR is not used for those. I gave an example of one such variable: face-remapping-alist, and I selected that variable for a reason. Here's how the display engine refers to it in the current codebase: base_face_id = it->string_from_prefix_prop_p ? (!NILP (Vface_remapping_alist) ? lookup_basic_face (it->f, DEFAULT_FACE_ID) : DEFAULT_FACE_ID) : underlying_face_id (it); Another example (which I also mentioned) is standard-display-table: /* Use the standard display table for displaying strings. */ if (DISP_TABLE_P (Vstandard_display_table)) it->dp = XCHAR_TABLE (Vstandard_display_table); See? no BVAR anywhere in sight. > Problems will arise if redisplay reads the variable once, and > fails to read it again when its current position moves into or out of an > island. Redisplay would have to be aware of island boundaries, and > re-read the controlling variables on passing a boundary. Other than > that, I can't see any big problems. Not yet, anyway. To remind you, the display engine works by examining characters from the buffer text one by one. Are you saying that it will have, for each character it examines, to look up the island chain for possible changes? That would make it abysmally slow, I think. IOW, part of your design needs to provide some efficient means for redisplay to "be aware of island boundaries, and re-read the controlling variables on passing a boundary". There's one more complication, which is related to redisplay, but not only to it. You write: > (ix) Miscellaneous commands and functions. > o - `point-min' and `point-max' will, when `in-islands' is non-nil, return > the max/min point in the visible region in the same chain of islands as > point. > o - `search-\(forward\|backward\)\(-regexp\)?' will restrict themselves to > the current island chain when `in-islands' is non-nil. > o - `skip-\(chars\|syntax\)-\(forward\|backward\)' will likewise operate in > the current island chain (how?) when `in-islands' is non-nil. > o - `\(next\|previous\)-\(single\|char\)-property-change', etc., will do the > Right Thing in island chains when `in-islands' is non-nil. > o - New functions `island-min', `island-max', `island-chain-min' and > `island-chain-max' will do what their names say. > o - There will be no restrictions on the use of widening/narrowing, as have > been proposed for other support engines for multiple major modes. > o - New commands like `beginning-of-island', `narrow-to-island', etc. will > be wanted. More difficultly, bindings for them will be needed. Something bothers me there. What will "M-<" and "M->" do, if point-min and point-max are limited to the current island? Likewise the search commands -- they cannot be limited to the current island, unless the user explicitly says so (and personally, I don't envision users to ask to be so limited). There's a dichotomy here, between the underlying C-level variables that currently are set to the limits of the narrowed region, and affect all user commands and internal operations (e.g., the display engine never looks beyond these limits); and the multi-mode functionality that needs to narrow the view even more. If you propagate the island-level limitations too deep, they will affect user commands and features (like display) that have nothing to do with the reason for which islands are being designed. E.g., a naïve replacement of C macros BEGV and ZV with something that returns the beginning and end of the current island will cause the display show only the current island, as if you narrowed the buffer to that island. I'm sure that's not what we want. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-22 8:48 ` Eli Zaretskii @ 2016-04-22 22:35 ` Alan Mackenzie 2016-04-23 7:39 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2016-04-22 22:35 UTC (permalink / raw) To: Eli Zaretskii; +Cc: dgutov, emacs-devel Hello, Eli. On Fri, Apr 22, 2016 at 11:48:52AM +0300, Eli Zaretskii wrote: > > Date: Thu, 21 Apr 2016 22:19:43 +0000 > > Cc: dgutov@yandex.ru, emacs-devel@gnu.org > > From: Alan Mackenzie <acm@muc.de> [ .... ] > > > > * - [Island] will be covered by the text property `island', whose value will be > > > > the pertinent island or island chain (see section (ii)) (not yet > > > > decided). Note that if islands are enclosed inside other islands, the > > > > value is the innermost island. There is the possibility of using an > > > > interval tree independent of the one for text properties to increase > > > > performance. > > > I don't understand the notion of "enclosed" islands: wouldn't such > > > "enclosing" simply break the "outer" island into two separate islands? > > If we mark island start and end with the syntax-table text properties > > "{" and "}", we're going to have something like > > { a{ }b } > > . Simply to break the outer island into two pieces, we'd really need to > > apply delimiters at a and b, giving: > > { }{ }{ } > > . This would overwrite the previous syntaxes at a and b, and this might > > be a Bad Thing. > We could design the stuff so that Bad Things won't happen. I consider > this nesting of islands a (possibly unnecessary) complication that we > shouldn't accept unless we have a very good reason. Nesting > immediately requires a plethora of operations that are otherwise not > necessary. OK. You're advocating, I think, not having well defined islands in a chain (i.e., every island having a defined and marked beginning and end), instead just having regions of text, each of which is associated with an island chain (via the `island' text property, say). This would make syntactic scanning more difficult (though not impossible). I can't judge at the moment which scheme is the better one. > > > > o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as > > > > whitespace, much as they do comments. They will also treat as whitespace > > > > the gap between two islands in a chain. > > > Why whitespace? why not some new category? By overloading whitespace, > > > you make things harder on the underlying infrastructure, like regexp > > > search and matching. > > I think it's clear that the "foreign" island's syntax has no interaction > > with the current island. > This is not a contradiction to what I suggested. The new category > could be treated the same as whitespace, in its effect on > syntax-related issues. By contrast, having whitespace regexp class be > indistinguishable from an island probably means complications on a > very low level of matching regular expressions and syntax constructs, > something that I fear will get in the way. > > If we treat it as whitespace, that should minimise the amount of > > adapting we need to do to existing major modes. > We need to consider the amount of adaptations in the low-level > infrastructure code as well, not only on the application level. I think the adaptations to the regexp engine would be far less work than adapting many thousands of regexps in major modes we want to use as sub-modes. For example there are 115 occurrences in CC Mode of just the exact string "[ \t". > > I envisage that a regexp element will match the "foreign" island if that > > element would match a space. I know this sounds horrible, but I haven't > > come up with a scenario where this wouldn't work well. > And I say this is a bomb waiting to go off. It is relatively easy to > add a new regexp construct for an island (e.g., we already support > categories in regexps, so just defining a category is one easy way), > and treat that as whitespace, while still keeping our options open to > make it behave slightly differently if needed, and still allowing the > applications to specify one, but not the other. Bear in mind that this matching of an island by a whitespace regexp element would happen ONLY whilst `in-islands' was bound to non-nil, i.e. when a major mode is working in its own island chain. Are there any circumstances in which we would not want the major mode to see the gap between its islands as WS? When `in-islands' is nil (i.e. when the super mode's code is running, or the user is typing commands) the islands would NOT match a WS regexp. > By contrast, if we decide that whitespace matches an island, we are > opening a giant can of worms. Here's one worm out of that can: some > low-level operations need to search the buffer using regexps > disregarding any narrowing -- what you suggest means these operations > cannot safely use whitespace in their regexps. This is something to > stay away of, IMO. It depends on whether these low level operations are working within an island chain (`in-islands' non-nil) or on the buffer as a whole (`in-islands' nil). I think such operations would typically be run with `in-islands' nil, hence would not run up against these problems. > > > Extending [:space:] that way seems to be an implementation detail > > > leaking to user level. I think we should avoid that at all costs. > > Why? I don't understand your last paragraph. > See above. [:space:] is something used a lot in Lisp applications, so > we leak the implementation of islands to that level: from now on, each > Lisp application will need to consider the possibility that searching > for [:space:] will find an island, something that might have no > relation to whitespace. I rather see it as major mode Lisp code not having to concern itself with the possibility of (foreign) islands or gaps. It merely has to do (search-forward-re "....[ \t]...." ...), and it will end up at the next valid place in its own island chain. The aim would be for the major mode to be as unaware of the island mechanism as possible. Of course the super mode or the user would have to be aware of the islands, and search for things like, e.g., "\\s{" and "\\s}" to match the island boundaries. > > > I'm not sure I understand the details. E.g., where will the > > > island-chain local values be stored? > > In a C struct chain, analogous to struct buffer, using much the same > > mechanisms. > What object(s) will that chain be rooted at? And how will it be > related to its buffer? The chain will be the value of the `island' text property set on all the islands of the chain. It would also occupy the "matching character" slot of the "open island" and "close island" syntax descriptors (though I'm having second thoughts about this bit). Both of these couple the chain with its buffer. > > > To remind you, buffer-local variables have a special object in their > > > symbol value cell, and BVAR only works for the few buffer-local > > > variables that are stored in the buffer object itself. I'm not sure I > > > understand how CVAR could solve the problem you need to solve, which > > > is keeping multiple chains per buffer, each one with its values of > > > these variables. > > CVAR would get the current chain from the `island' (or `chain') text > > property at the position. > If it is stored in the text property, then you will have to decide > what happens when text is copied and yanked elsewhere. It would be the job of the `island-after-change-function' to strip the unwanted text properties (both the `island' and `syntax-table' ones) and to apply any needed new ones to the yanked region. > > If this is nil, it would do what BVAR does. > Once again, BVAR only handles variables that are part of the buffer > object itself. The other buffer-local variables (which are the > majority) are handled as part of switching the buffer, and the C code > simply refers to them by name. So BVAR is not necessarily the correct > model for what you are designing. > > Otherwise it would access the appropriate named element in the struct > > chain. I think CVAR would take three parameters: the variable name, the > > buffer, and the buffer position. > Can you show a pseudo-code of CVAR? I'm afraid I'm missing something > here, because I don't see clearly what you have in mind. I'll try. Something like this: #define CVAR(var, buf, position) \ chain = read_text_property (Qisland, buf, position), \ chain ? chain.var \ : BVAR (var, buf) , but I don't think that would be a valid Lvalue in C. :-( > > Other chain local variables would be accessed through an alist in the > > struct chain holding miscellaneous variables, exactly as is done for > > the other buffer local variables in struct buffer. > There's no such alist in how we access buffer-local variables, not > AFAIK. Again, I must be missing something here. Or, maybe I am. I thought that the slot `local_var_alist_' in the struct buffer held the bindings of all the non-BVAR local variables, as an alist. I'm not at all clear on when and how buffer local variable bindings get swapped in and out of, say, C variables like Vfoo. > > > This actually sounds like a simple extension of narrowing, so I wonder > > > why do we need so many new object types and notions. > > I think it's more like a complicated extension of narrowing. :-) > It's simple because instead of one region you have more than one, and > the user-level commands don't affect them. All the other changes are > exact reproduction of what narrowing does. > > I think that chain local variables are essential to multiple major > > modes - you can't have m.m.m. without some sort of chain locality. > What is "chain locality"? Having things (variables) which are local to a chain, as opposed to global variables or buffer local variables or frame local variables. > > I also think that for a major mode to work transparently over > > several chained islands, all the irrelevant stuff between the > > islands needs to be made, er, transparent. > Yes, but how is that related to my comment about extending narrowing? Maybe it's not, very much. > > > I don't see any discussion of how redisplay will deal with islands. > > > To remind you, redisplay moves through portions of the buffer, without > > > moving point, and access buffer-local variables for its job. You need > > > to augment the design with something that will allow redisplay see the > > > correct values of variables depending on the buffer position it is at. > > > The same problem exists for any features that use display simulation > > > for making decisions about movement and layout, e.g. vertical-motion. > > I think redisplay is mostly controlled by variables (such as > > `scroll-margin') accessed by BVAR. These calls could be replaced by > > CVAR. > That's not the whole story; once again, you forget about buffer-local > variables that are not part of the buffer object; BVAR is not used for > those. I gave an example of one such variable: face-remapping-alist, > and I selected that variable for a reason. Here's how the display > engine refers to it in the current codebase: > base_face_id = it->string_from_prefix_prop_p > ? (!NILP (Vface_remapping_alist) > ? lookup_basic_face (it->f, DEFAULT_FACE_ID) > : DEFAULT_FACE_ID) > : underlying_face_id (it); > Another example (which I also mentioned) is standard-display-table: > /* Use the standard display table for displaying strings. */ > if (DISP_TABLE_P (Vstandard_display_table)) > it->dp = XCHAR_TABLE (Vstandard_display_table); > See? no BVAR anywhere in sight. OK. But `face-remapping-alist' can definitely be made buffer local, and `standard-display-table' most probably can. There will be some mechanism (which I don't currently understand) by which buffer local values are swapped into and out of Vface_remapping_alist when the current buffer changes. Surely a similar mechanism could be created for when the current island changes. > > Problems will arise if redisplay reads the variable once, and > > fails to read it again when its current position moves into or out of an > > island. Redisplay would have to be aware of island boundaries, and > > re-read the controlling variables on passing a boundary. Other than > > that, I can't see any big problems. Not yet, anyway. > To remind you, the display engine works by examining characters from > the buffer text one by one. Are you saying that it will have, for > each character it examines, to look up the island chain for possible > changes? That would make it abysmally slow, I think. > IOW, part of your design needs to provide some efficient means for > redisplay to "be aware of island boundaries, and re-read the > controlling variables on passing a boundary". Yes. > There's one more complication, which is related to redisplay, but not > only to it. You write: > > (ix) Miscellaneous commands and functions. > > o - `point-min' and `point-max' will, when `in-islands' is non-nil, return > > the max/min point in the visible region in the same chain of islands as > > point. > > o - `search-\(forward\|backward\)\(-regexp\)?' will restrict themselves to > > the current island chain when `in-islands' is non-nil. > > o - `skip-\(chars\|syntax\)-\(forward\|backward\)' will likewise operate in > > the current island chain (how?) when `in-islands' is non-nil. > > o - `\(next\|previous\)-\(single\|char\)-property-change', etc., will do the > > Right Thing in island chains when `in-islands' is non-nil. > > o - New functions `island-min', `island-max', `island-chain-min' and > > `island-chain-max' will do what their names say. > > o - There will be no restrictions on the use of widening/narrowing, as have > > been proposed for other support engines for multiple major modes. > > o - New commands like `beginning-of-island', `narrow-to-island', etc. will > > be wanted. More difficultly, bindings for them will be needed. > Something bothers me there. What will "M-<" and "M->" do, if > point-min and point-max are limited to the current island? Likewise > the search commands -- they cannot be limited to the current island, > unless the user explicitly says so (and personally, I don't envision > users to ask to be so limited). Those restrictions will only apply when `in-islands' is bound to non-nil, i.e. when major mode code is running. It will be nil when the user types in M-<, hence point will move to the beginning of the (visible region of the) buffer. So, for example, if the super mode is shell script, and the major mode in the current island is AWK Mode, (point-min) will return the start of the AWK Mode island chain (which is useful to AWK Mode), not the very start of the buffer. > There's a dichotomy here, between the underlying C-level variables > that currently are set to the limits of the narrowed region, and > affect all user commands and internal operations (e.g., the display > engine never looks beyond these limits); and the multi-mode > functionality that needs to narrow the view even more. If you > propagate the island-level limitations too deep, they will affect user > commands and features (like display) that have nothing to do with the > reason for which islands are being designed. E.g., a naïve > replacement of C macros BEGV and ZV with something that returns the > beginning and end of the current island will cause the display show > only the current island, as if you narrowed the buffer to that > island. I'm sure that's not what we want. No, it's not. I think BEGV will need to have different meanings depending on the value of `in-islands'. When it's nil, BEGV will have the current meaning. When it's non-nil, BEGV will mean "the lowest buffer position which is both within the current island-chain and not below the lowest visible position". Or something like that. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-22 22:35 ` Alan Mackenzie @ 2016-04-23 7:39 ` Eli Zaretskii 2016-04-23 17:02 ` Alan Mackenzie 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2016-04-23 7:39 UTC (permalink / raw) To: Alan Mackenzie; +Cc: dgutov, emacs-devel > Date: Fri, 22 Apr 2016 22:35:08 +0000 > Cc: emacs-devel@gnu.org, dgutov@yandex.ru > From: Alan Mackenzie <acm@muc.de> > > > > > Why whitespace? why not some new category? By overloading whitespace, > > > > you make things harder on the underlying infrastructure, like regexp > > > > search and matching. > > > > I think it's clear that the "foreign" island's syntax has no interaction > > > with the current island. > > > This is not a contradiction to what I suggested. The new category > > could be treated the same as whitespace, in its effect on > > syntax-related issues. By contrast, having whitespace regexp class be > > indistinguishable from an island probably means complications on a > > very low level of matching regular expressions and syntax constructs, > > something that I fear will get in the way. > > > > If we treat it as whitespace, that should minimise the amount of > > > adapting we need to do to existing major modes. > > > We need to consider the amount of adaptations in the low-level > > infrastructure code as well, not only on the application level. > > I think the adaptations to the regexp engine would be far less work than > adapting many thousands of regexps in major modes we want to use as > sub-modes. For example there are 115 occurrences in CC Mode of just the > exact string "[ \t". Please let's not forget that regexps are used in many places that have no relation whatsoever to major modes, and searching for whitespace is a very common operation using regular expressions. Infecting all those with this new meaning of whitespace that is totally alien to any code that doesn't deal with major mode is IMO plain wrong. More generally, I think we should first and foremost make our goal to have a clean and reasonably simple design, and only care about the amount of changes in major mode code as a secondary goal. Thinking about the changes in major modes first could easily lead us astray. > Bear in mind that this matching of an island by a whitespace regexp > element would happen ONLY whilst `in-islands' was bound to non-nil, i.e. > when a major mode is working in its own island chain. I understand, but I don't think this goes far enough to address my concerns. And my suggestion to have a separate class/category will serve your needs just as well, so I'm unsure why we need to piggyback [:space:]. > Are there any circumstances in which we would not want the major > mode to see the gap between its islands as WS? Who says that every major mode necessarily treats whitespace as you assume? Most (or even all) of those you know about might, but this is not written anywhere as a limitation of a major mode. By hard-wiring this special meaning of [:space:] into your design, you are limiting future (and possibly some rare extant) major modes. > When `in-islands' is nil (i.e. when the super mode's code is > running, or the user is typing commands) the islands would NOT match > a WS regexp. Are you sure that none of the background processing will ever need to treat islands as such? I'm talking about stuff like timers, process filters and sentinels, hook functions run by redisplay and the command loop, etc. If any of these might need to observe the island rules and restrictions, the design which builds on in-islands being bound to non-nil _only_ when the major mode is running its own code is unreliable, and will cause unrelated code to find itself dealing with island peculiarities. E.g., JIT font-lock runs off an idle timer, but clearly needs to observe islands, so it sounds like the problem I'm worried about is pretty much into our faces. > > By contrast, if we decide that whitespace matches an island, we are > > opening a giant can of worms. Here's one worm out of that can: some > > low-level operations need to search the buffer using regexps > > disregarding any narrowing -- what you suggest means these operations > > cannot safely use whitespace in their regexps. This is something to > > stay away of, IMO. > > It depends on whether these low level operations are working within an > island chain (`in-islands' non-nil) or on the buffer as a whole > (`in-islands' nil). I think such operations would typically be run with > `in-islands' nil, hence would not run up against these problems. "Typically" is not good enough, IMO. We must convince ourselves that this happens _always_, and there will _never_ be a reasonably justifiable need to search the entire buffer for whitespace when in-islands is non-nil, i.e. in any of the code that is running as a side-effect of performing some major-mode related operation. > > > CVAR would get the current chain from the `island' (or `chain') text > > > property at the position. > > > If it is stored in the text property, then you will have to decide > > what happens when text is copied and yanked elsewhere. > > It would be the job of the `island-after-change-function' to strip the > unwanted text properties (both the `island' and `syntax-table' ones) and > to apply any needed new ones to the yanked region. The problem is the decision whether they are unwanted or not. It's usually not simple to make that decision for text properties that change the way text is displayed, when surrounding text also affects that. > > > Otherwise it would access the appropriate named element in the struct > > > chain. I think CVAR would take three parameters: the variable name, the > > > buffer, and the buffer position. > > > Can you show a pseudo-code of CVAR? I'm afraid I'm missing something > > here, because I don't see clearly what you have in mind. > > I'll try. Something like this: > > #define CVAR(var, buf, position) \ > chain = read_text_property (Qisland, buf, position), \ > chain ? chain.var \ > : BVAR (var, buf) > > , but I don't think that would be a valid Lvalue in C. :-( Didn't you talk about some alist to look up? I see no alist look up in this pseudo-code. And 'chain.var' sounds wrong, since 'chain' is definitely a Lisp object, not a C struct. Or maybe I don't understand what hides behind read_text_property. > > > Other chain local variables would be accessed through an alist in the > > > struct chain holding miscellaneous variables, exactly as is done for > > > the other buffer local variables in struct buffer. > > > There's no such alist in how we access buffer-local variables, not > > AFAIK. Again, I must be missing something here. > > Or, maybe I am. I thought that the slot `local_var_alist_' in the struct > buffer held the bindings of all the non-BVAR local variables, as an > alist. Ah, you were talking about local_var_alist_... OK, but then I don't see anything like that in CVAR above. > I'm not at all clear on when and how buffer local variable > bindings get swapped in and out of, say, C variables like Vfoo. This happens when we switch buffers, see set_buffer_internal_1. But that function is driven by an explicit event of switching buffers, while in your design you need to do something similar when point crosses some buffer position, which is a much more subtle event. E.g., think about all the save-excursion and save-restriction code out there. > > > > This actually sounds like a simple extension of narrowing, so I wonder > > > > why do we need so many new object types and notions. > > > > I think it's more like a complicated extension of narrowing. :-) > > > It's simple because instead of one region you have more than one, and > > the user-level commands don't affect them. All the other changes are > > exact reproduction of what narrowing does. > > > > I think that chain local variables are essential to multiple major > > > modes - you can't have m.m.m. without some sort of chain locality. > > > What is "chain locality"? > > Having things (variables) which are local to a chain, as opposed to > global variables or buffer local variables or frame local variables. OK, but no one said that applying a restriction and making island-specific bindings of variables must be parts of the same feature. They could be 2 separate features instead. > > base_face_id = it->string_from_prefix_prop_p > > ? (!NILP (Vface_remapping_alist) > > ? lookup_basic_face (it->f, DEFAULT_FACE_ID) > > : DEFAULT_FACE_ID) > > : underlying_face_id (it); > > > Another example (which I also mentioned) is standard-display-table: > > > /* Use the standard display table for displaying strings. */ > > if (DISP_TABLE_P (Vstandard_display_table)) > > it->dp = XCHAR_TABLE (Vstandard_display_table); > > > See? no BVAR anywhere in sight. > > OK. But `face-remapping-alist' can definitely be made buffer local, and > `standard-display-table' most probably can. They both are. > There will be some mechanism (which I don't currently understand) by > which buffer local values are swapped into and out of > Vface_remapping_alist when the current buffer changes. See above: that mechanism is part of the function that switches to another buffer. > Surely a similar mechanism could be created for when the current > island changes. The issue is to make it as cheap as possible, because redisplay code is at liberty to move around the buffer at will, and the location where it examines buffer text is not directly related to point. > > Something bothers me there. What will "M-<" and "M->" do, if > > point-min and point-max are limited to the current island? Likewise > > the search commands -- they cannot be limited to the current island, > > unless the user explicitly says so (and personally, I don't envision > > users to ask to be so limited). > > Those restrictions will only apply when `in-islands' is bound to non-nil, > i.e. when major mode code is running. It will be nil when the user types > in M-<, hence point will move to the beginning of the (visible region of > the) buffer. See above: there might be some situations, like JIT font-lock, where you will want to have in-islands non-nil while running async code, and that might make the islands visible to code that is not strictly part of any major mode, like the infrastructure which invokes these async parts of Emacs code. So I think you need to consider the effects of those on more than just major modes. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-23 7:39 ` Eli Zaretskii @ 2016-04-23 17:02 ` Alan Mackenzie 2016-04-23 18:12 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2016-04-23 17:02 UTC (permalink / raw) To: Eli Zaretskii; +Cc: dgutov, emacs-devel Hello, Eli. On Sat, Apr 23, 2016 at 10:39:55AM +0300, Eli Zaretskii wrote: > > Date: Fri, 22 Apr 2016 22:35:08 +0000 > > Cc: emacs-devel@gnu.org, dgutov@yandex.ru > > From: Alan Mackenzie <acm@muc.de> [ .... ] > Please let's not forget that regexps are used in many places that have > no relation whatsoever to major modes, and searching for whitespace is > a very common operation using regular expressions. Infecting all > those with this new meaning of whitespace that is totally alien to any > code that doesn't deal with major mode is IMO plain wrong. > More generally, I think we should first and foremost make our goal to > have a clean and reasonably simple design, and only care about the > amount of changes in major mode code as a secondary goal. Thinking > about the changes in major modes first could easily lead us astray. We must consider both these things together. A prime design goal is to allow an arbitrary major mode to be used by a super mode with the minimum of adaptation to the major mode, ideally none. > > Bear in mind that this matching of an island by a whitespace regexp > > element would happen ONLY whilst `in-islands' was bound to non-nil, i.e. > > when a major mode is working in its own island chain. > I understand, but I don't think this goes far enough to address my > concerns. And my suggestion to have a separate class/category will > serve your needs just as well, so I'm unsure why we need to piggyback > [:space:]. If this new category (say, "[:gap:]") only needed to be used in super modes or subsystems, I might agree with you. But if [:gap:] needs to be used in major mode code, that involves massive amounts of editing, which would make the new mechanism much less useful. > > Are there any circumstances in which we would not want the major > > mode to see the gap between its islands as WS? > Who says that every major mode necessarily treats whitespace as you > assume? Most (or even all) of those you know about might, but this is > not written anywhere as a limitation of a major mode. It will become expected of a major mode that it doesn't tamper with stuff outside of its own island chain(s). That would violate the abstraction that the island mechanism represents. > By hard-wiring this special meaning of [:space:] into your design, you > are limiting future (and possibly some rare extant) major modes. I don't think it's all that special. It's natural. Ideally, a major mode should see its island chain as the whole buffer. It should be unaware that it is running in an island at all. I see this treatment of a :gap: as whitespace as the most natural way of implementing this. Major modes which would violate the island abstraction, if there are any, shouldn't be used in this new multi major mode mechanism. > > When `in-islands' is nil (i.e. when the super mode's code is > > running, or the user is typing commands) the islands would NOT match > > a WS regexp. > Are you sure that none of the background processing will ever need to > treat islands as such? I'm talking about stuff like timers, process > filters and sentinels, hook functions run by redisplay and the command > loop, etc. All these subsystems will need to be aware of whether they are dealing with the buffer as a whole, or merely with an island chain. They will need to bind `in-islands' appropriately, frequently using the value that was current when they were invoked. > If any of these might need to observe the island rules and > restrictions, the design which builds on in-islands being bound to > non-nil _only_ when the major mode is running its own code is > unreliable, and will cause unrelated code to find itself dealing with > island peculiarities. Perhaps I expressed myself too forcefully and literally. If any of these things are dealing with an island as a unit, they are surely "running major mode code" in this sense. Clearly things like redisplay and font lock will need explicitly to set and clear `in-islands' when they are dealing with island chain stuff. > E.g., JIT font-lock runs off an idle timer, but clearly needs to > observe islands, so it sounds like the problem I'm worried about is > pretty much into our faces. The font-lock and jit-lock entry points will need to set `in-islands'. > > > By contrast, if we decide that whitespace matches an island, we are > > > opening a giant can of worms. Here's one worm out of that can: some > > > low-level operations need to search the buffer using regexps > > > disregarding any narrowing -- what you suggest means these operations > > > cannot safely use whitespace in their regexps. This is something to > > > stay away of, IMO. > > It depends on whether these low level operations are working within an > > island chain (`in-islands' non-nil) or on the buffer as a whole > > (`in-islands' nil). I think such operations would typically be run with > > `in-islands' nil, hence would not run up against these problems. > "Typically" is not good enough, IMO. We must convince ourselves that > this happens _always_, and there will _never_ be a reasonably > justifiable need to search the entire buffer for whitespace when > in-islands is non-nil, i.e. in any of the code that is running as a > side-effect of performing some major-mode related operation. I agree with that. [ .... ] > > > If it is stored in the text property, then you will have to decide > > > what happens when text is copied and yanked elsewhere. > > It would be the job of the `island-after-change-function' to strip the > > unwanted text properties (both the `island' and `syntax-table' ones) and > > to apply any needed new ones to the yanked region. > The problem is the decision whether they are unwanted or not. It's > usually not simple to make that decision for text properties that > change the way text is displayed, when surrounding text also affects > that. But that decision has to made somewhere, somehow, by the super mode, regardless of how multiple major modes are implemented. Just for clarity, `island-after-change-function' is a hook, not a fixed function, and writing a super mode's function for this hook would be a substantial part of writing that mode. [ .... ] > > #define CVAR(var, buf, position) \ > > chain = read_text_property (Qisland, buf, position), \ > > chain ? chain.var \ > > : BVAR (var, buf) > > , but I don't think that would be a valid Lvalue in C. :-( > Didn't you talk about some alist to look up? I see no alist look up > in this pseudo-code. And 'chain.var' sounds wrong, since 'chain' is > definitely a Lisp object, not a C struct. Or maybe I don't understand > what hides behind read_text_property. I think we're talking at cross purposes. I'm proposing that CVAR would take the place of BVAR for many of the variables which are slots in the struct buffer, and they would also become slots in the new struct chain. The other variables, currently held in `local_var_alist_' in struct buffer, would be accessed from a `local_var_alist_' element of the struct chain in much the same way. Or perhaps there is a better way of implementing chain local variables. It is more of an implementation detail than an essential part of the design. [ .... ] > > I'm not at all clear on when and how buffer local variable > > bindings get swapped in and out of, say, C variables like Vfoo. > This happens when we switch buffers, see set_buffer_internal_1. Thanks! I see that now. I had spent some time looking for that code in data.c. > But that function is driven by an explicit event of switching buffers, > while in your design you need to do something similar when point > crosses some buffer position, which is a much more subtle event. E.g., > think about all the save-excursion and save-restriction code out there. A good point. [ .... ] > > > What is "chain locality"? > > Having things (variables) which are local to a chain, as opposed to > > global variables or buffer local variables or frame local variables. > OK, but no one said that applying a restriction and making > island-specific bindings of variables must be parts of the same > feature. They could be 2 separate features instead. Maybe they could. Maybe it wouldn't make much sense. I'd have to think about that a bit more. [ .... ] > > Surely a similar mechanism [ to swapping buffer local bindings into > > and out of fixed variables in the C code ] could be created for when > > the current island changes. > The issue is to make it as cheap as possible, because redisplay code > is at liberty to move around the buffer at will, and the location > where it examines buffer text is not directly related to point. Yes, this would require careful design and coding. One detail struck me immediately on seeing the code in set_buffer_internal_1. The code has to cdr its way down the entire list of variables in local_var_alist_, despite the fact that only a few of them point to C variables. Maybe it would make sense to extract this smaller list into a separate chain. [ .... ] > See above: there might be some situations, like JIT font-lock, where > you will want to have in-islands non-nil while running async code, and > that might make the islands visible to code that is not strictly part > of any major mode, like the infrastructure which invokes these async > parts of Emacs code. So I think you need to consider the effects of > those on more than just major modes. Yes, indeed. The challenge here will be to identify all of the pertinent subsystems. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-23 17:02 ` Alan Mackenzie @ 2016-04-23 18:12 ` Eli Zaretskii 2016-04-23 18:26 ` Dmitry Gutov 2016-04-23 21:08 ` Alan Mackenzie 0 siblings, 2 replies; 45+ messages in thread From: Eli Zaretskii @ 2016-04-23 18:12 UTC (permalink / raw) To: Alan Mackenzie; +Cc: dgutov, emacs-devel > Date: Sat, 23 Apr 2016 17:02:08 +0000 > Cc: emacs-devel@gnu.org, dgutov@yandex.ru > From: Alan Mackenzie <acm@muc.de> > > > More generally, I think we should first and foremost make our goal to > > have a clean and reasonably simple design, and only care about the > > amount of changes in major mode code as a secondary goal. Thinking > > about the changes in major modes first could easily lead us astray. > > We must consider both these things together. A prime design goal is to > allow an arbitrary major mode to be used by a super mode with the minimum > of adaptation to the major mode, ideally none. I think you make this goal the main one, and that is a mistake. The changes that will be needed for supporting multiple modes in the same buffer will be extensive, whether you want it or not, so trying too hard to make it easier on modes to adapt will skew the design. > > By hard-wiring this special meaning of [:space:] into your design, you > > are limiting future (and possibly some rare extant) major modes. > > I don't think it's all that special. It's natural. IME, authors who write Emacs features are known to not limit themselves to only those things that the infrastructure designers deem "natural". > > Are you sure that none of the background processing will ever need to > > treat islands as such? I'm talking about stuff like timers, process > > filters and sentinels, hook functions run by redisplay and the command > > loop, etc. > > All these subsystems will need to be aware of whether they are dealing > with the buffer as a whole, or merely with an island chain. They will > need to bind `in-islands' appropriately, frequently using the value that > was current when they were invoked. Which means that code that was never aware of any "current mode" will need to adapt. For example, BEGV and ZV (a.k.a pointy-min and point-max) will be suddenly limited to an island while such code runs. That's a major issue, IMO, something that will need changes in many places. > > > > If it is stored in the text property, then you will have to decide > > > > what happens when text is copied and yanked elsewhere. > > > > It would be the job of the `island-after-change-function' to strip the > > > unwanted text properties (both the `island' and `syntax-table' ones) and > > > to apply any needed new ones to the yanked region. > > > The problem is the decision whether they are unwanted or not. It's > > usually not simple to make that decision for text properties that > > change the way text is displayed, when surrounding text also affects > > that. > > But that decision has to made somewhere, somehow, by the super mode, > regardless of how multiple major modes are implemented. If the implementation is not based on text properties, then it doesn't have to. > One detail struck me immediately on seeing the code in > set_buffer_internal_1. The code has to cdr its way down the entire > list of variables in local_var_alist_, despite the fact that only a > few of them point to C variables. Maybe it would make sense to > extract this smaller list into a separate chain. You can't: redisplay allows Lisp evaluation in some places (like the mode line), and any Lisp run there will expect to find buffer-local bindings of all the variables. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-23 18:12 ` Eli Zaretskii @ 2016-04-23 18:26 ` Dmitry Gutov 2016-04-23 21:08 ` Alan Mackenzie 1 sibling, 0 replies; 45+ messages in thread From: Dmitry Gutov @ 2016-04-23 18:26 UTC (permalink / raw) To: Eli Zaretskii, Alan Mackenzie; +Cc: emacs-devel On 04/23/2016 09:12 PM, Eli Zaretskii wrote: >> We must consider both these things together. A prime design goal is to >> allow an arbitrary major mode to be used by a super mode with the minimum >> of adaptation to the major mode, ideally none. > > I think you make this goal the main one, and that is a mistake. The > changes that will be needed for supporting multiple modes in the same > buffer will be extensive, whether you want it or not, so trying too > hard to make it easier on modes to adapt will skew the design. +1. I also think we can afford to require some changes to the major mode code, as long as they're simple, and it's easy to spot whether they have been made. A hundred or so regexps to change is not that much if the design is otherwise sound. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-23 18:12 ` Eli Zaretskii 2016-04-23 18:26 ` Dmitry Gutov @ 2016-04-23 21:08 ` Alan Mackenzie 2016-04-24 6:29 ` Eli Zaretskii 1 sibling, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2016-04-23 21:08 UTC (permalink / raw) To: Eli Zaretskii; +Cc: dgutov, emacs-devel Hello, Eli. On Sat, Apr 23, 2016 at 09:12:25PM +0300, Eli Zaretskii wrote: > > Date: Sat, 23 Apr 2016 17:02:08 +0000 > > Cc: emacs-devel@gnu.org, dgutov@yandex.ru > > From: Alan Mackenzie <acm@muc.de> > > > More generally, I think we should first and foremost make our goal to > > > have a clean and reasonably simple design, and only care about the > > > amount of changes in major mode code as a secondary goal. Thinking > > > about the changes in major modes first could easily lead us astray. > > We must consider both these things together. A prime design goal is to > > allow an arbitrary major mode to be used by a super mode with the minimum > > of adaptation to the major mode, ideally none. > I think you make this goal the main one, and that is a mistake. The > changes that will be needed for supporting multiple modes in the same > buffer will be extensive, whether you want it or not, so trying too > hard to make it easier on modes to adapt will skew the design. Let me put things another way. Above all, I want this new facility to be based on clean abstractions. Such are generally easier to code, easier to understand, and easier to debug, should such be necessary. And I assure you that in my head, the abstractions, particularly that of islands, came before the design. I see three layers of software, here: Major modes, super modes, and subsystems. What is the relationship of each of them to islands? Super modes essentially deal with islands - that is what their main purpose is. They create islands, they destroy them, possibly they coalesce them, they coordinate the rare interactions between islands (yanking for example), they coordinate change hooks as they affect islands. Most of the changes I have proposed is in features directly to support super modes' handling of islands. Subsystems code, like redisplay, font locking, timers, ...., is going to have to deal with islands incidentally - that is not its main purpose, but there is no getting away from it. A redisplay action might act on several islands, so might a font locking action. And so on. But major modes? The abstraction I propose is that major modes see their own parts of the buffer as the entire buffer, and know nothing of islands or gaps between them. This is a clean abstraction and will lead to all the advantages enumerated a few paragraphs back. Eli, you seem to disagree with the above analysis. Would you like to outline your scheme of abstractions on this topic? You say that extensive changes will be needed to support multiple modes in a buffer, and this is clearly true. Where we seem to differ is where these changes should be made. I want the vast bulk of these changes to be in super mode support and subsystems. You seem additionally to want to make subtantial changes in the major mode "layer". I cannot see this as a good thing at the moment. [ .... ] -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-23 21:08 ` Alan Mackenzie @ 2016-04-24 6:29 ` Eli Zaretskii 2016-04-24 16:57 ` Alan Mackenzie 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2016-04-24 6:29 UTC (permalink / raw) To: Alan Mackenzie; +Cc: dgutov, emacs-devel > Date: Sat, 23 Apr 2016 21:08:07 +0000 > Cc: emacs-devel@gnu.org, dgutov@yandex.ru > From: Alan Mackenzie <acm@muc.de> > > I see three layers of software, here: Major modes, super modes, and > subsystems. What is the relationship of each of them to islands? > > Super modes essentially deal with islands - that is what their main > purpose is. They create islands, they destroy them, possibly they > coalesce them, they coordinate the rare interactions between islands > (yanking for example), they coordinate change hooks as they affect > islands. Most of the changes I have proposed is in features directly to > support super modes' handling of islands. > > Subsystems code, like redisplay, font locking, timers, ...., is going to > have to deal with islands incidentally - that is not its main purpose, > but there is no getting away from it. A redisplay action might act on > several islands, so might a font locking action. And so on. > > But major modes? The abstraction I propose is that major modes see their > own parts of the buffer as the entire buffer, and know nothing of > islands or gaps between them. This is a clean abstraction and will lead > to all the advantages enumerated a few paragraphs back. > > Eli, you seem to disagree with the above analysis. Would you like to > outline your scheme of abstractions on this topic? Most of my comments were not about the abstractions. I don't have any alternative scheme to offer, because I have no experience in using, let alone writing, multiple modes in the same buffer. > You say that extensive changes will be needed to support multiple modes > in a buffer, and this is clearly true. Where we seem to differ is where > these changes should be made. I want the vast bulk of these changes to > be in super mode support and subsystems. You seem additionally to want > to make subtantial changes in the major mode "layer". I cannot see this > as a good thing at the moment. I'm saying that worrying about the amount of changes in major modes at this stage is premature optimization. If major modes will have to adapt themselves in non-trivial ways, e.g. by changing their regexps or font-lock settings, it's not a big deal. It is much more important to make sure the design doesn't contradict more basic assumptions and design principles of Emacs, including the low-level code which implements searching, syntax, redisplay, etc., because if the contradiction does happen, you will at best have a bunch of hairy problems to solve, and at worst will simply fail to produce a workable solution. IOW, I suggest to forget for a while about the amount of changes major modes will need, and leave that for later. At this stage, you should be worried much more about how core design features of Emacs will work with islands, and make sure you have all that figured out, before you decide that the island design is valid. In practice, this means that, for example, I would expect you to study all the uses of search in the low-level code, before you decide that making [:space:] match an island edge is sound. E.g., did you know that even bidi.c, which is about as low-level as you can get, uses regexp search to look for a certain combination of whitespace characters? Did you consider how this will work when islands are in the way? What about basic features like find_newline -- did you look into that? You see, if any of these break due to islands, you have some major rewrites on your hands, and the ripples will probably be very far-reaching. The need to change major modes pales by comparison. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-24 6:29 ` Eli Zaretskii @ 2016-04-24 16:57 ` Alan Mackenzie 2016-04-24 19:59 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2016-04-24 16:57 UTC (permalink / raw) To: Eli Zaretskii; +Cc: dgutov, emacs-devel Hello, Eli. On Sun, Apr 24, 2016 at 09:29:58AM +0300, Eli Zaretskii wrote: > > Date: Sat, 23 Apr 2016 21:08:07 +0000 > > Cc: emacs-devel@gnu.org, dgutov@yandex.ru > > From: Alan Mackenzie <acm@muc.de> > > I see three layers of software, here: Major modes, super modes, and > > subsystems. What is the relationship of each of them to islands? > > Super modes essentially deal with islands - that is what their main > > purpose is. They create islands, they destroy them, possibly they > > coalesce them, they coordinate the rare interactions between islands > > (yanking for example), they coordinate change hooks as they affect > > islands. Most of the changes I have proposed is in features directly to > > support super modes' handling of islands. > > Subsystems code, like redisplay, font locking, timers, ...., is going to > > have to deal with islands incidentally - that is not its main purpose, > > but there is no getting away from it. A redisplay action might act on > > several islands, so might a font locking action. And so on. > > But major modes? The abstraction I propose is that major modes see their > > own parts of the buffer as the entire buffer, and know nothing of > > islands or gaps between them. This is a clean abstraction and will lead > > to all the advantages enumerated a few paragraphs back. > > Eli, you seem to disagree with the above analysis. Would you like to > > outline your scheme of abstractions on this topic? > Most of my comments were not about the abstractions. I don't have any > alternative scheme to offer, because I have no experience in using, > let alone writing, multiple modes in the same buffer. But you have undoubtedly suffered the frustration of bits of scripts, possibly html files, and the like, not being properly fontified/indented for lack of such multiple modes. > > You say that extensive changes will be needed to support multiple modes > > in a buffer, and this is clearly true. Where we seem to differ is where > > these changes should be made. I want the vast bulk of these changes to > > be in super mode support and subsystems. You seem additionally to want > > to make subtantial changes in the major mode "layer". I cannot see this > > as a good thing at the moment. > I'm saying that worrying about the amount of changes in major modes at > this stage is premature optimization. You seem to be saying I should abandon "abstraction A" (that major modes should remain unaware of islands) as a design principle. Without this principle, I'm not sure how much of my design notes make any sense. I certainly have no idea of what to replace it by. > If major modes will have to adapt themselves in non-trivial ways, e.g. > by changing their regexps or font-lock settings, it's not a big deal. How do you know? What I foresee happening is a lot of island handling code being duplicated many times over, over many major modes. I think that is a big deal. > It is much more important to make sure the design doesn't contradict > more basic assumptions and design principles of Emacs, including the > low-level code which implements searching, syntax, redisplay, etc., > because if the contradiction does happen, you will at best have a bunch > of hairy problems to solve, and at worst will simply fail to produce a > workable solution. The very basic assumption that each buffer has exactly one major mode is being superseded. That is bound to have repercussions on several other assumptions which are dependent on it, including in the ones you identify. Searching, syntax, redisply, etc., will all need to be adapted because that basic assumption (one major mode) will no longer hold. The challenge is to identify all the code that implicitly assumes that assumption. I think some of these other dependent assumptions will become ambiguous. For example, at the moment BEG and Z point to the start and end of the part of the buffer the current major mode administers (this being the entire buffer). Nobody up till now has bothered to separate those two meanings of BEG and Z. Such disambiguation will be necessary to support multiple major modes. I've already proposed doing this by means of the magic variable `in-islands'. > IOW, I suggest to forget for a while about the amount of changes major > modes will need, and leave that for later. At this stage, you should > be worried much more about how core design features of Emacs will work > with islands, and make sure you have all that figured out, before you > decide that the island design is valid. I have spent quite some time studying data.c, syntax.c, xdisp.c, buffer.[ch], lots of font locking code, and likely quite a few other relevant files. I haven't come across anything that would be difficult to adapt for the island mechanism - just there's a lot to adapt. > In practice, this means that, for example, I would expect you to study > all the uses of search in the low-level code, before you decide that > making [:space:] match an island edge is sound. [ Actually, it's the entire island I'm proposing be matched as WS. ] I tend to approach it from the other direction: is that handling of an island as whitespace a satisfactory abstraction or not? If it is, the code will follow. If it's not, attempts to apply it will collapse in confusion, probably quite quickly. > E.g., did you know that even bidi.c, which is about as low-level as you > can get, uses regexp search to look for a certain combination of > whitespace characters? No, but it doesn't surprise me. > Did you consider how this will work when islands are in the way? Yes. The bulk of the adaptation to bidi.c will be the generic changes in search.c, etc., so that the bidi.c regexps will continue to work despite the text it's matching over being two islands with a gap in the middle. I know little about bidi, but there might have to be design decisions made about how it should behave when the text it's dealing with isn't contiguous in the whole buffer. > What about basic features like find_newline -- did you look into that? > You see, if any of these break due to islands, you have some major > rewrites on your hands, and the ripples will probably be very > far-reaching. The need to change major modes pales by comparison. No, I hadn't looked at find_newline. But it will need looking at regardless of whether a space in a regexp matches an island. At the very least, it will have to behave differently for finding newlines in an island chain rather than finding them in the whole buffer. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-24 16:57 ` Alan Mackenzie @ 2016-04-24 19:59 ` Eli Zaretskii 2016-04-25 6:49 ` Andreas Röhler 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2016-04-24 19:59 UTC (permalink / raw) To: Alan Mackenzie; +Cc: dgutov, emacs-devel > Date: Sun, 24 Apr 2016 16:57:21 +0000 > Cc: emacs-devel@gnu.org, dgutov@yandex.ru > From: Alan Mackenzie <acm@muc.de> > > > Most of my comments were not about the abstractions. I don't have any > > alternative scheme to offer, because I have no experience in using, > > let alone writing, multiple modes in the same buffer. > > But you have undoubtedly suffered the frustration of bits of scripts, > possibly html files, and the like, not being properly fontified/indented > for lack of such multiple modes. No, not really. > You seem to be saying I should abandon "abstraction A" (that major modes > should remain unaware of islands) as a design principle. It's not an abstraction, it's a design goal. And yes, I think you need to forget about it for a while. > Without this principle, I'm not sure how much of my design notes > make any sense. I don't see how that invalidates your proposal. > I certainly have no idea of what to replace it by. I suggest replacing it with nothing. Minimizing changes in major modes (and elsewhere) is a simple economy principle; you don't need to worry about us forgetting it. > > If major modes will have to adapt themselves in non-trivial ways, e.g. > > by changing their regexps or font-lock settings, it's not a big deal. > > How do you know? What I foresee happening is a lot of island handling > code being duplicated many times over, over many major modes. I think > that is a big deal. If it is, we will cross that bridge when we get to it. > > It is much more important to make sure the design doesn't contradict > > more basic assumptions and design principles of Emacs, including the > > low-level code which implements searching, syntax, redisplay, etc., > > because if the contradiction does happen, you will at best have a bunch > > of hairy problems to solve, and at worst will simply fail to produce a > > workable solution. > > The very basic assumption that each buffer has exactly one major mode is > being superseded. That is bound to have repercussions on several other > assumptions which are dependent on it, including in the ones you > identify. Searching, syntax, redisply, etc., will all need to be adapted > because that basic assumption (one major mode) will no longer hold. The > challenge is to identify all the code that implicitly assumes that > assumption. There's exactly zero references to major mode in C sources. (There's a function to store the major mode in the corresponding slot of the buffer object, but I see no code looking that slot's value.) And for a good reason: the major mode is an entirely Lisp-land phenomenon, it does all of its work by setting local variables and hook functions. So I think your assumption that having more than one mode in a buffer is already a cataclysm is incorrect. > I think some of these other dependent assumptions will become ambiguous. > For example, at the moment BEG and Z point to the start and end of the > part of the buffer the current major mode administers (this being the > entire buffer). Nobody up till now has bothered to separate those two > meanings of BEG and Z. Such disambiguation will be necessary to support > multiple major modes. I've already proposed doing this by means of the > magic variable `in-islands'. Indeed, I'm much more worried by the effect of islands in BEGV and ZV than by the fact that there could be more than one major mode active. Unlike references to the major mode, the number of places that use BEGV and ZV is enormous, and the unwritten assumptions about them are abundant and well entrenched. > I have spent quite some time studying data.c, syntax.c, xdisp.c, > buffer.[ch], lots of font locking code, and likely quite a few other > relevant files. I haven't come across anything that would be difficult > to adapt for the island mechanism - just there's a lot to adapt. We should try to minimize that impact as much as we can. > I tend to approach it from the other direction: is that handling of an > island as whitespace a satisfactory abstraction or not? It's not an abstraction at all. It's a trick, a device to make adaptation to the island-world easier. That text between two islands of the same chain should be invisible for the mode that's active in the chain -- that is an abstraction. But no one says that text must be treated as whitespace -- this is simply a convenient means to reach your ends. However, other means towards the same end might be available, onces that don't overload [:space:] with an entirely alien meaning. > > Did you consider how [bidi.c search] will work when islands are in the way? > > Yes. The bulk of the adaptation to bidi.c will be the generic changes in > search.c, etc., so that the bidi.c regexps will continue to work despite > the text it's matching over being two islands with a gap in the middle. Doesn't that contradict your design to limit point-min and point-max, since redisplay must be island-aware? > I know little about bidi, but there might have to be design decisions > made about how it should behave when the text it's dealing with isn't > contiguous in the whole buffer. These design decisions should predate the island-as-whitespace discussion, IMNSHO. And if you are sure this feature cannot happen without affecting bidi.c and search.c, then yes, you should study those and understand what they do and how. > No, I hadn't looked at find_newline. But it will need looking at > regardless of whether a space in a regexp matches an island. At the very > least, it will have to behave differently for finding newlines in an > island chain rather than finding them in the whole buffer. See, the ripples are starting already. That is why I say we should try to find a design that doesn't require rethinking, redesigning, and rewriting every single piece of our infrastructure. If we don't, we are making the implementation and testing of this feature a much more complex and hard job than it must be. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-24 19:59 ` Eli Zaretskii @ 2016-04-25 6:49 ` Andreas Röhler 0 siblings, 0 replies; 45+ messages in thread From: Andreas Röhler @ 2016-04-25 6:49 UTC (permalink / raw) To: emacs-devel; +Cc: Alan Mackenzie, Eli Zaretskii On 24.04.2016 21:59, Eli Zaretskii wrote: > [ ... ] >> I tend to approach it from the other direction: is that handling of an >> island as whitespace a satisfactory abstraction or not? > It's not an abstraction at all. It's a trick, a device to make > adaptation to the island-world easier. That text between two islands > of the same chain should be invisible for the mode that's active in > the chain -- that is an abstraction. But no one says that text must > be treated as whitespace -- this is simply a convenient means to reach > your ends. However, other means towards the same end might be > available, onces that don't overload [:space:] with an entirely alien > meaning. > Sounds bug-sourcing. Hard to tell which conflicts might show up, but there is some probability. [ ... ] ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-21 22:19 ` Alan Mackenzie 2016-04-22 8:48 ` Eli Zaretskii @ 2016-04-22 13:42 ` Andy Moreton 2016-04-23 17:14 ` Alan Mackenzie 1 sibling, 1 reply; 45+ messages in thread From: Andy Moreton @ 2016-04-22 13:42 UTC (permalink / raw) To: emacs-devel On Thu 21 Apr 2016, Alan Mackenzie wrote: > Hello, Eli. > > On Thu, Apr 21, 2016 at 05:17:09PM +0300, Eli Zaretskii wrote: >> > Date: Wed, 20 Apr 2016 19:44:50 +0000 >> > From: Alan Mackenzie <acm@muc.de> >> > >> > This post describes my notion of how multiple major modes {c,sh}ould be >> > implemented. Key notions are "islands", "island chains", and "chain >> > local" variable bindings. > >> Thank you for publishing this. A few comments and questions below. >> Please keep in mind that I never had to write any Lisp that deals with >> these issues, so apologies in advance for possibly silly questions and >> misunderstandings. > >> > o - To the user, the current major mode will be that of the island where >> > point is. All familiar commands will work without restriction. > >> Does this mean the display of mode line, menu bar, and tool bar will >> change accordingly? > > Yes, please! > >> A more subtle issue is with point movements that are not shown to the >> user (those done by Lisp code of some command, before redisplay kicks >> in) -- what will be the effect of those? do they trigger redisplay, >> for example? > > They shouldn't trigger redisplay, no. > >> > o - An island chain will have @dfn{chain local} variable bindings. Such a >> > binding will become current and accessible when point is within one of the >> > chain's islands. When point is not in an island, the buffer local binding >> > of the variable will be current. > >> Emacs sometimes examines buffer text without moving point, and we >> generally expect for buffer-local bindings to be in effect regardless. >> A prominent example is the display engine. I will return to that >> later. > > OK. > >> > * - [Island] will be covered by the text property `island', whose value will be >> > the pertinent island or island chain (see section (ii)) (not yet >> > decided). Note that if islands are enclosed inside other islands, the >> > value is the innermost island. There is the possibility of using an >> > interval tree independent of the one for text properties to increase >> > performance. > >> I don't understand the notion of "enclosed" islands: wouldn't such >> "enclosing" simply break the "outer" island into two separate islands? > > If we mark island start and end with the syntax-table text properties > "{" and "}", we're going to have something like > > { a{ }b } > > . Simply to break the outer island into two pieces, we'd really need to > apply delimiters at a and b, giving: > > { }{ }{ } > > . This would overwrite the previous syntaxes at a and b, and this might > be a Bad Thing. Care will be needed to allow more than one island chain using the same inner mode, where the chains represent unrelated documents that are independently embedded in the larger document. >> > o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as >> > whitespace, much as they do comments. They will also treat as whitespace >> > the gap between two islands in a chain. > >> Why whitespace? why not some new category? By overloading whitespace, >> you make things harder on the underlying infrastructure, like regexp >> search and matching. > > I think it's clear that the "foreign" island's syntax has no interaction > with the current island. If we treat it as whitespace, that should > minimise the amount of adapting we need to do to existing major modes. There may be some interaction. The language used for the enclosing text (using the super mode) may require quoting and escaping to be performed on the content embedded in it. This means that the textual representation of the content in the island chain may depend on what it is embedded into. The inner mode for the island chain will either need to be aware of this quoting and escaping syntax (belonging to the super mode), or the text in the island chain will need to be unescaped and unquoted for the inner mode to make sense of it. AndyM ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-22 13:42 ` Andy Moreton @ 2016-04-23 17:14 ` Alan Mackenzie 0 siblings, 0 replies; 45+ messages in thread From: Alan Mackenzie @ 2016-04-23 17:14 UTC (permalink / raw) To: Andy Moreton; +Cc: emacs-devel Hello, Andy. On Fri, Apr 22, 2016 at 02:42:07PM +0100, Andy Moreton wrote: > On Thu 21 Apr 2016, Alan Mackenzie wrote: [ .... ] > Care will be needed to allow more than one island chain using the same > inner mode, where the chains represent unrelated documents that are > independently embedded in the larger document. This is built into the fabric of the mechanism, and shouldn't present a problem. > >> > o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as > >> > whitespace, much as they do comments. They will also treat as whitespace > >> > the gap between two islands in a chain. > >> Why whitespace? why not some new category? By overloading whitespace, > >> you make things harder on the underlying infrastructure, like regexp > >> search and matching. > > I think it's clear that the "foreign" island's syntax has no interaction > > with the current island. If we treat it as whitespace, that should > > minimise the amount of adapting we need to do to existing major modes. > There may be some interaction. The language used for the enclosing text > (using the super mode) may require quoting and escaping to be performed > on the content embedded in it. This means that the textual > representation of the content in the island chain may depend on what it > is embedded into. Good point! Thanks. > The inner mode for the island chain will either need to be aware of this > quoting and escaping syntax (belonging to the super mode), or the text > in the island chain will need to be unescaped and unquoted for the inner > mode to make sense of it. Umm. Yes, that could happen. Hopefully it won't be a big problem in practice. > AndyM -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-20 19:44 A vision for multiple major modes: some design notes Alan Mackenzie ` (2 preceding siblings ...) 2016-04-21 14:17 ` Eli Zaretskii @ 2016-04-22 14:33 ` Dmitry Gutov 2016-04-22 18:58 ` Richard Stallman 4 siblings, 0 replies; 45+ messages in thread From: Dmitry Gutov @ 2016-04-22 14:33 UTC (permalink / raw) To: Alan Mackenzie, emacs-devel Hello Alan, Thank you for writing this expansive summary. I'll comment on a couple of the items below, but overall, if you're asking my personal opinion, we should put a pin in this for now, and first see where the hard-widen feature gets us. After that, we could see whether it wouldn't be easier to extract certain individual pieces of this proposal in an independent fashion. The main two I see are: - A way to assign buffer boundaries that make certain core primitives treat some buffer regions as whitespace, maybe with support for nesting. I don't know if that should be via text properties. As long as this feature is only used dynamically, it could be a list structure stored in a dynamic variable. That way the `in-islands' variable would become redundant. - A way to quickly store and switch between sets of buffer-local variables. If you go ahead with this proposal, though, I think it should be implemented in close collaboration with an author of a related package. Vitalie Spinu and Christoph Wedler (polymode and andlr-mode maintainers) would be good candidates, and neither has shown up at this discussion yet. Unfortunately, I don't have a lot of time to dedicate to mmm-mode lately (and it probably has the highest backward compatibility expectations out of the three anyway). The main drawbacks of this, IMHO, are that it's big (like you mentioned yourself), and that it's fairly opinionated. Hence the two-item list above. On 04/20/2016 10:44 PM, Alan Mackenzie wrote: > * - The coordination of these bindings will be carried out by the > mechanisms described below, without explicit coding in the super mode. This seems a little too optimistic. For instance: > o - To the user, the current major mode will be that of the island where > point is. All familiar commands will work without restriction. Imenu, as one example, will require coordination from the super mode, or from the multi-mode framework. The user will normally want to see all of entries in the current buffer in the index, so something would have to merge them. > o - To the writer of major modes, a minimal set of restrictions will apply: > * - For some major mode commands, the mode will have to bind the variable > `in-islands' (see below) to non-nil. Ideally, the writers of the "island" major modes wouldn't do anything special to support multi-mode usage. It would be better if the "superior" major modes would have to do all the "special" things. I.e., it's fine to have to introduce a new major mode for a templating language if it can easily use existing major modes for the code regions inside. Here's a related question: would `indent-for-tab-command' bind `in-islands' to t, or not? > (iv) Islands. > o - An island will be delimited in two complementary ways: > * - It will be enclosed syntactically by characters with "open island" and > "close island" syntax (see section (v)). Both of these syntactic > markers will include a flag "chain" indicating whether there is a > previous/next island in the chain. The cdr of the syntax value will be > the island chain to which the island belongs. > * - It will be covered by the text property `island', whose value will be > the pertinent island or island chain (see section (ii)) (not yet > decided). Note that if islands are enclosed inside other islands, the > value is the innermost island. There is the possibility of using an > interval tree independent of the one for text properties to increase > performance. Going by the current implementation in mmm-mode, it would be handy if the islands could be distinguished using one text property only. Then we simply set it on all overlays that cover mmm-mode's subregions. But if all three elements are required, so be it. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-20 19:44 A vision for multiple major modes: some design notes Alan Mackenzie ` (3 preceding siblings ...) 2016-04-22 14:33 ` Dmitry Gutov @ 2016-04-22 18:58 ` Richard Stallman 2016-04-22 20:22 ` Alan Mackenzie 4 siblings, 1 reply; 45+ messages in thread From: Richard Stallman @ 2016-04-22 18:58 UTC (permalink / raw) To: Alan Mackenzie; +Cc: dgutov, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] The design seems to assume that every island starts with a one-character delimiter that always starts an island, and that there is anothehr one-character delimiter that always ends an island. Is that really the intention, or did I misunderstand? -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-22 18:58 ` Richard Stallman @ 2016-04-22 20:22 ` Alan Mackenzie 2016-04-23 12:27 ` Andreas Röhler 2016-04-23 12:38 ` Richard Stallman 0 siblings, 2 replies; 45+ messages in thread From: Alan Mackenzie @ 2016-04-22 20:22 UTC (permalink / raw) To: Richard Stallman; +Cc: emacs-devel, dgutov Hello, Richard. On Fri, Apr 22, 2016 at 02:58:53PM -0400, Richard Stallman wrote: > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > The design seems to assume that every island starts with a > one-character delimiter that always starts an island, and that there > is anothehr one-character delimiter that always ends an island. > Is that really the intention, or did I misunderstand? That's not quite how I see it working. There needs to be some sort of delimiter to start an island which must be at least 1 character wide. On this character/one of these characters, the "super mode" will set an "open island" syntax-table text property. Similarly, there must be some delimiter at the end of the island to set a "close island" property on. For example, in a shell script with an embedded AWK script: VARIABLE=$(gawk '<script>' < <input-file>) .... ^ ^ , the text properties would be set on the marked characters, making <script> an island which would be initialised to AWK Mode. As a minor point, the delimiters enclose an island, but aren't part of it - they are part of the surrounding text. > -- > Dr Richard Stallman > President, Free Software Foundation (gnu.org, fsf.org) > Internet Hall-of-Famer (internethalloffame.org) > Skype: No way! See stallman.org/skype.html. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-22 20:22 ` Alan Mackenzie @ 2016-04-23 12:27 ` Andreas Röhler 2016-04-23 12:38 ` Richard Stallman 1 sibling, 0 replies; 45+ messages in thread From: Andreas Röhler @ 2016-04-23 12:27 UTC (permalink / raw) To: emacs-devel; +Cc: Alan Mackenzie On 22.04.2016 22:22, Alan Mackenzie wrote: > Hello, Richard. > > On Fri, Apr 22, 2016 at 02:58:53PM -0400, Richard Stallman wrote: >> [[[ To any NSA and FBI agents reading my email: please consider ]]] >> [[[ whether defending the US Constitution against all enemies, ]]] >> [[[ foreign or domestic, requires you to follow Snowden's example. ]]] >> The design seems to assume that every island starts with a >> one-character delimiter that always starts an island, and that there >> is anothehr one-character delimiter that always ends an island. >> Is that really the intention, or did I misunderstand? > That's not quite how I see it working. There needs to be some sort of > delimiter to start an island which must be at least 1 character wide. What about keeping a simple index of modes instead? Just make major-mode work on current indexed chunk only instead of whole buffer. Cheers, Andreas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-22 20:22 ` Alan Mackenzie 2016-04-23 12:27 ` Andreas Röhler @ 2016-04-23 12:38 ` Richard Stallman 2016-04-23 17:31 ` Alan Mackenzie 1 sibling, 1 reply; 45+ messages in thread From: Richard Stallman @ 2016-04-23 12:38 UTC (permalink / raw) To: Alan Mackenzie; +Cc: dgutov, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] Are you saying that there won't be a character with thpe "open island" in the syntax table, but rather a text property will give a particular string in the buffer the "open island" syntax? That makes more sense. But I think that in some cases "separate islands" might be a better designation. For instance, consider the three sections of a Bison input file. They are separated by a delimiter. It would be artificial and arbitrary to try to divide up the delimiter into a string to end the previous island and a string to start the next one. Which reminds me that the first island in the Bison input file has no string to "open" it. It starts at the start of the buffer. And the third island ends and the end of the buffer. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-23 12:38 ` Richard Stallman @ 2016-04-23 17:31 ` Alan Mackenzie 2016-04-24 9:22 ` Richard Stallman 0 siblings, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2016-04-23 17:31 UTC (permalink / raw) To: Richard Stallman; +Cc: dgutov, emacs-devel Hello, Richard. On Sat, Apr 23, 2016 at 08:38:04AM -0400, Richard Stallman wrote: > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Are you saying that there won't be a character with the "open island" > in the syntax table, but rather a text property will give a particular > string in the buffer the "open island" syntax? Yes. I think it would most unwise to assign a character such a syntax in the syntax table - the island boundaries would be unstable, existing or not existing depending on the syntax table of the island one was currently in. I anticipate writing a warning about this in the Emacs Lisp manual. > That makes more sense. But I think that in some cases "separate > islands" might be a better designation. For instance, consider the > three sections of a Bison input file. They are separated by a > delimiter. It would be artificial and arbitrary to try to divide up > the delimiter into a string to end the previous island and a string to > start the next one. I think Bison and Lex are somewhat special cases; they each divide a file into three sections of equal status, rather than there being a containing mode and sections contained within it. > Which reminds me that the first island in the Bison input file > has no string to "open" it. It starts at the start of the buffer. > And the third island ends and the end of the buffer. A workaround for this would be to have the first section being the "super mode" and containing the second and third sections. The delimiter "%%" between sections 2 and 3 has space to hold both an island close and an island open, despite what you say about this being artificial, etc. I don't see there would be an absolute need for there to be a "close island" mark at the end of the buffer. > -- > Dr Richard Stallman > President, Free Software Foundation (gnu.org, fsf.org) > Internet Hall-of-Famer (internethalloffame.org) > Skype: No way! See stallman.org/skype.html. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: A vision for multiple major modes: some design notes 2016-04-23 17:31 ` Alan Mackenzie @ 2016-04-24 9:22 ` Richard Stallman 0 siblings, 0 replies; 45+ messages in thread From: Richard Stallman @ 2016-04-24 9:22 UTC (permalink / raw) To: Alan Mackenzie; +Cc: dgutov, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I think Bison and Lex are somewhat special cases; they each divide a > file into three sections of equal status, rather than there being a > containing mode and sections contained within it. You may be right that this kind of case is less common, but the system should still handle it. > A workaround for this would be to have the first section being the > "super mode" and containing the second and third sections. The > delimiter "%%" between sections 2 and 3 has space to hold both an island > close and an island open, despite what you say about this being > artificial, etc. That could work, but I think it would be cleaner if "island separator" were allowed too. > I don't see there would be an absolute need for there > to be a "close island" mark at the end of the buffer. I don't either. I just thought the design required one. If it doesn't, that is fine with me. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2016-06-14 16:27 UTC | newest] Thread overview: 45+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-04-20 19:44 A vision for multiple major modes: some design notes Alan Mackenzie 2016-04-20 21:06 ` Drew Adams 2016-04-20 23:00 ` Drew Adams 2016-04-21 12:43 ` Alan Mackenzie 2016-04-21 14:24 ` Stefan Monnier 2016-04-23 2:20 ` zhanghj 2016-04-23 22:36 ` Dmitry Gutov 2016-04-21 16:05 ` Drew Adams 2016-04-21 16:31 ` Eli Zaretskii [not found] ` <<64f1d39a-dfd0-44ca-86c1-b4d6104b5702@default> [not found] ` <<83oa926i0e.fsf@gnu.org> 2016-04-21 16:59 ` Drew Adams 2016-04-21 19:55 ` Eli Zaretskii [not found] ` <<<64f1d39a-dfd0-44ca-86c1-b4d6104b5702@default> [not found] ` <<<83oa926i0e.fsf@gnu.org> [not found] ` <<791d74d1-2b1d-4304-8e7e-d6c31af7aa41@default> [not found] ` <<83eg9y68jy.fsf@gnu.org> 2016-04-21 20:26 ` Drew Adams 2016-04-20 22:27 ` Phillip Lord 2016-04-21 9:14 ` Alan Mackenzie 2016-04-22 12:45 ` Phillip Lord 2016-04-21 14:17 ` Eli Zaretskii 2016-04-21 21:33 ` Alan Mackenzie 2016-04-21 22:01 ` Drew Adams 2016-04-22 8:13 ` Alan Mackenzie 2016-04-22 17:04 ` Drew Adams 2016-04-22 9:04 ` Eli Zaretskii 2016-06-13 21:17 ` John Wiegley 2016-06-14 13:13 ` Alan Mackenzie 2016-06-14 16:27 ` John Wiegley 2016-04-21 22:19 ` Alan Mackenzie 2016-04-22 8:48 ` Eli Zaretskii 2016-04-22 22:35 ` Alan Mackenzie 2016-04-23 7:39 ` Eli Zaretskii 2016-04-23 17:02 ` Alan Mackenzie 2016-04-23 18:12 ` Eli Zaretskii 2016-04-23 18:26 ` Dmitry Gutov 2016-04-23 21:08 ` Alan Mackenzie 2016-04-24 6:29 ` Eli Zaretskii 2016-04-24 16:57 ` Alan Mackenzie 2016-04-24 19:59 ` Eli Zaretskii 2016-04-25 6:49 ` Andreas Röhler 2016-04-22 13:42 ` Andy Moreton 2016-04-23 17:14 ` Alan Mackenzie 2016-04-22 14:33 ` Dmitry Gutov 2016-04-22 18:58 ` Richard Stallman 2016-04-22 20:22 ` Alan Mackenzie 2016-04-23 12:27 ` Andreas Röhler 2016-04-23 12:38 ` Richard Stallman 2016-04-23 17:31 ` Alan Mackenzie 2016-04-24 9:22 ` Richard Stallman
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.