From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: A vision for multiple major modes: some design notes Date: Fri, 22 Apr 2016 22:35:08 +0000 Message-ID: <20160422223507.GD1873@acm.fritz.box> References: <20160420194450.GA3457@acm.fritz.box> <8360vb6o7u.fsf@gnu.org> <20160421221943.GE1775@acm.fritz.box> <83a8km58qz.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1461364527 3704 80.91.229.3 (22 Apr 2016 22:35:27 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 22 Apr 2016 22:35:27 +0000 (UTC) Cc: dgutov@yandex.ru, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Apr 23 00:35:19 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1atjfT-0006Zn-5j for ged-emacs-devel@m.gmane.org; Sat, 23 Apr 2016 00:35:19 +0200 Original-Received: from localhost ([::1]:40856 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1atjfS-00069Q-Hf for ged-emacs-devel@m.gmane.org; Fri, 22 Apr 2016 18:35:18 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:36465) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1atjfN-00064q-Em for emacs-devel@gnu.org; Fri, 22 Apr 2016 18:35:15 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1atjfK-0007GY-6p for emacs-devel@gnu.org; Fri, 22 Apr 2016 18:35:13 -0400 Original-Received: from mail.muc.de ([193.149.48.3]:15626) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1atjfJ-0007GC-Tb for emacs-devel@gnu.org; Fri, 22 Apr 2016 18:35:10 -0400 Original-Received: (qmail 72170 invoked by uid 3782); 22 Apr 2016 22:35:08 -0000 Original-Received: from acm.muc.de (p548A5EE9.dip0.t-ipconnect.de [84.138.94.233]) by colin.muc.de (tmda-ofmipd) with ESMTP; Sat, 23 Apr 2016 00:35:07 +0200 Original-Received: (qmail 8026 invoked by uid 1000); 22 Apr 2016 22:35:08 -0000 Content-Disposition: inline In-Reply-To: <83a8km58qz.fsf@gnu.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x X-Received-From: 193.149.48.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:203195 Archived-At: Hello, Eli. On Fri, Apr 22, 2016 at 11:48:52AM +0300, Eli Zaretskii wrote: > > Date: Thu, 21 Apr 2016 22:19:43 +0000 > > Cc: dgutov@yandex.ru, emacs-devel@gnu.org > > From: Alan Mackenzie [ .... ] > > > > * - [Island] will be covered by the text property `island', whose value will be > > > > the pertinent island or island chain (see section (ii)) (not yet > > > > decided). Note that if islands are enclosed inside other islands, the > > > > value is the innermost island. There is the possibility of using an > > > > interval tree independent of the one for text properties to increase > > > > performance. > > > I don't understand the notion of "enclosed" islands: wouldn't such > > > "enclosing" simply break the "outer" island into two separate islands? > > If we mark island start and end with the syntax-table text properties > > "{" and "}", we're going to have something like > > { a{ }b } > > . Simply to break the outer island into two pieces, we'd really need to > > apply delimiters at a and b, giving: > > { }{ }{ } > > . This would overwrite the previous syntaxes at a and b, and this might > > be a Bad Thing. > We could design the stuff so that Bad Things won't happen. I consider > this nesting of islands a (possibly unnecessary) complication that we > shouldn't accept unless we have a very good reason. Nesting > immediately requires a plethora of operations that are otherwise not > necessary. OK. You're advocating, I think, not having well defined islands in a chain (i.e., every island having a defined and marked beginning and end), instead just having regions of text, each of which is associated with an island chain (via the `island' text property, say). This would make syntactic scanning more difficult (though not impossible). I can't judge at the moment which scheme is the better one. > > > > o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as > > > > whitespace, much as they do comments. They will also treat as whitespace > > > > the gap between two islands in a chain. > > > Why whitespace? why not some new category? By overloading whitespace, > > > you make things harder on the underlying infrastructure, like regexp > > > search and matching. > > I think it's clear that the "foreign" island's syntax has no interaction > > with the current island. > This is not a contradiction to what I suggested. The new category > could be treated the same as whitespace, in its effect on > syntax-related issues. By contrast, having whitespace regexp class be > indistinguishable from an island probably means complications on a > very low level of matching regular expressions and syntax constructs, > something that I fear will get in the way. > > If we treat it as whitespace, that should minimise the amount of > > adapting we need to do to existing major modes. > We need to consider the amount of adaptations in the low-level > infrastructure code as well, not only on the application level. I think the adaptations to the regexp engine would be far less work than adapting many thousands of regexps in major modes we want to use as sub-modes. For example there are 115 occurrences in CC Mode of just the exact string "[ \t". > > I envisage that a regexp element will match the "foreign" island if that > > element would match a space. I know this sounds horrible, but I haven't > > come up with a scenario where this wouldn't work well. > And I say this is a bomb waiting to go off. It is relatively easy to > add a new regexp construct for an island (e.g., we already support > categories in regexps, so just defining a category is one easy way), > and treat that as whitespace, while still keeping our options open to > make it behave slightly differently if needed, and still allowing the > applications to specify one, but not the other. Bear in mind that this matching of an island by a whitespace regexp element would happen ONLY whilst `in-islands' was bound to non-nil, i.e. when a major mode is working in its own island chain. Are there any circumstances in which we would not want the major mode to see the gap between its islands as WS? When `in-islands' is nil (i.e. when the super mode's code is running, or the user is typing commands) the islands would NOT match a WS regexp. > By contrast, if we decide that whitespace matches an island, we are > opening a giant can of worms. Here's one worm out of that can: some > low-level operations need to search the buffer using regexps > disregarding any narrowing -- what you suggest means these operations > cannot safely use whitespace in their regexps. This is something to > stay away of, IMO. It depends on whether these low level operations are working within an island chain (`in-islands' non-nil) or on the buffer as a whole (`in-islands' nil). I think such operations would typically be run with `in-islands' nil, hence would not run up against these problems. > > > Extending [:space:] that way seems to be an implementation detail > > > leaking to user level. I think we should avoid that at all costs. > > Why? I don't understand your last paragraph. > See above. [:space:] is something used a lot in Lisp applications, so > we leak the implementation of islands to that level: from now on, each > Lisp application will need to consider the possibility that searching > for [:space:] will find an island, something that might have no > relation to whitespace. I rather see it as major mode Lisp code not having to concern itself with the possibility of (foreign) islands or gaps. It merely has to do (search-forward-re "....[ \t]...." ...), and it will end up at the next valid place in its own island chain. The aim would be for the major mode to be as unaware of the island mechanism as possible. Of course the super mode or the user would have to be aware of the islands, and search for things like, e.g., "\\s{" and "\\s}" to match the island boundaries. > > > I'm not sure I understand the details. E.g., where will the > > > island-chain local values be stored? > > In a C struct chain, analogous to struct buffer, using much the same > > mechanisms. > What object(s) will that chain be rooted at? And how will it be > related to its buffer? The chain will be the value of the `island' text property set on all the islands of the chain. It would also occupy the "matching character" slot of the "open island" and "close island" syntax descriptors (though I'm having second thoughts about this bit). Both of these couple the chain with its buffer. > > > To remind you, buffer-local variables have a special object in their > > > symbol value cell, and BVAR only works for the few buffer-local > > > variables that are stored in the buffer object itself. I'm not sure I > > > understand how CVAR could solve the problem you need to solve, which > > > is keeping multiple chains per buffer, each one with its values of > > > these variables. > > CVAR would get the current chain from the `island' (or `chain') text > > property at the position. > If it is stored in the text property, then you will have to decide > what happens when text is copied and yanked elsewhere. It would be the job of the `island-after-change-function' to strip the unwanted text properties (both the `island' and `syntax-table' ones) and to apply any needed new ones to the yanked region. > > If this is nil, it would do what BVAR does. > Once again, BVAR only handles variables that are part of the buffer > object itself. The other buffer-local variables (which are the > majority) are handled as part of switching the buffer, and the C code > simply refers to them by name. So BVAR is not necessarily the correct > model for what you are designing. > > Otherwise it would access the appropriate named element in the struct > > chain. I think CVAR would take three parameters: the variable name, the > > buffer, and the buffer position. > Can you show a pseudo-code of CVAR? I'm afraid I'm missing something > here, because I don't see clearly what you have in mind. I'll try. Something like this: #define CVAR(var, buf, position) \ chain = read_text_property (Qisland, buf, position), \ chain ? chain.var \ : BVAR (var, buf) , but I don't think that would be a valid Lvalue in C. :-( > > Other chain local variables would be accessed through an alist in the > > struct chain holding miscellaneous variables, exactly as is done for > > the other buffer local variables in struct buffer. > There's no such alist in how we access buffer-local variables, not > AFAIK. Again, I must be missing something here. Or, maybe I am. I thought that the slot `local_var_alist_' in the struct buffer held the bindings of all the non-BVAR local variables, as an alist. I'm not at all clear on when and how buffer local variable bindings get swapped in and out of, say, C variables like Vfoo. > > > This actually sounds like a simple extension of narrowing, so I wonder > > > why do we need so many new object types and notions. > > I think it's more like a complicated extension of narrowing. :-) > It's simple because instead of one region you have more than one, and > the user-level commands don't affect them. All the other changes are > exact reproduction of what narrowing does. > > I think that chain local variables are essential to multiple major > > modes - you can't have m.m.m. without some sort of chain locality. > What is "chain locality"? Having things (variables) which are local to a chain, as opposed to global variables or buffer local variables or frame local variables. > > I also think that for a major mode to work transparently over > > several chained islands, all the irrelevant stuff between the > > islands needs to be made, er, transparent. > Yes, but how is that related to my comment about extending narrowing? Maybe it's not, very much. > > > I don't see any discussion of how redisplay will deal with islands. > > > To remind you, redisplay moves through portions of the buffer, without > > > moving point, and access buffer-local variables for its job. You need > > > to augment the design with something that will allow redisplay see the > > > correct values of variables depending on the buffer position it is at. > > > The same problem exists for any features that use display simulation > > > for making decisions about movement and layout, e.g. vertical-motion. > > I think redisplay is mostly controlled by variables (such as > > `scroll-margin') accessed by BVAR. These calls could be replaced by > > CVAR. > That's not the whole story; once again, you forget about buffer-local > variables that are not part of the buffer object; BVAR is not used for > those. I gave an example of one such variable: face-remapping-alist, > and I selected that variable for a reason. Here's how the display > engine refers to it in the current codebase: > base_face_id = it->string_from_prefix_prop_p > ? (!NILP (Vface_remapping_alist) > ? lookup_basic_face (it->f, DEFAULT_FACE_ID) > : DEFAULT_FACE_ID) > : underlying_face_id (it); > Another example (which I also mentioned) is standard-display-table: > /* Use the standard display table for displaying strings. */ > if (DISP_TABLE_P (Vstandard_display_table)) > it->dp = XCHAR_TABLE (Vstandard_display_table); > See? no BVAR anywhere in sight. OK. But `face-remapping-alist' can definitely be made buffer local, and `standard-display-table' most probably can. There will be some mechanism (which I don't currently understand) by which buffer local values are swapped into and out of Vface_remapping_alist when the current buffer changes. Surely a similar mechanism could be created for when the current island changes. > > Problems will arise if redisplay reads the variable once, and > > fails to read it again when its current position moves into or out of an > > island. Redisplay would have to be aware of island boundaries, and > > re-read the controlling variables on passing a boundary. Other than > > that, I can't see any big problems. Not yet, anyway. > To remind you, the display engine works by examining characters from > the buffer text one by one. Are you saying that it will have, for > each character it examines, to look up the island chain for possible > changes? That would make it abysmally slow, I think. > IOW, part of your design needs to provide some efficient means for > redisplay to "be aware of island boundaries, and re-read the > controlling variables on passing a boundary". Yes. > There's one more complication, which is related to redisplay, but not > only to it. You write: > > (ix) Miscellaneous commands and functions. > > o - `point-min' and `point-max' will, when `in-islands' is non-nil, return > > the max/min point in the visible region in the same chain of islands as > > point. > > o - `search-\(forward\|backward\)\(-regexp\)?' will restrict themselves to > > the current island chain when `in-islands' is non-nil. > > o - `skip-\(chars\|syntax\)-\(forward\|backward\)' will likewise operate in > > the current island chain (how?) when `in-islands' is non-nil. > > o - `\(next\|previous\)-\(single\|char\)-property-change', etc., will do the > > Right Thing in island chains when `in-islands' is non-nil. > > o - New functions `island-min', `island-max', `island-chain-min' and > > `island-chain-max' will do what their names say. > > o - There will be no restrictions on the use of widening/narrowing, as have > > been proposed for other support engines for multiple major modes. > > o - New commands like `beginning-of-island', `narrow-to-island', etc. will > > be wanted. More difficultly, bindings for them will be needed. > Something bothers me there. What will "M-<" and "M->" do, if > point-min and point-max are limited to the current island? Likewise > the search commands -- they cannot be limited to the current island, > unless the user explicitly says so (and personally, I don't envision > users to ask to be so limited). Those restrictions will only apply when `in-islands' is bound to non-nil, i.e. when major mode code is running. It will be nil when the user types in M-<, hence point will move to the beginning of the (visible region of the) buffer. So, for example, if the super mode is shell script, and the major mode in the current island is AWK Mode, (point-min) will return the start of the AWK Mode island chain (which is useful to AWK Mode), not the very start of the buffer. > There's a dichotomy here, between the underlying C-level variables > that currently are set to the limits of the narrowed region, and > affect all user commands and internal operations (e.g., the display > engine never looks beyond these limits); and the multi-mode > functionality that needs to narrow the view even more. If you > propagate the island-level limitations too deep, they will affect user > commands and features (like display) that have nothing to do with the > reason for which islands are being designed. E.g., a naïve > replacement of C macros BEGV and ZV with something that returns the > beginning and end of the current island will cause the display show > only the current island, as if you narrowed the buffer to that > island. I'm sure that's not what we want. No, it's not. I think BEGV will need to have different meanings depending on the value of `in-islands'. When it's nil, BEGV will have the current meaning. When it's non-nil, BEGV will mean "the lowest buffer position which is both within the current island-chain and not below the lowest visible position". Or something like that. -- Alan Mackenzie (Nuremberg, Germany).