From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: A vision for multiple major modes: some design notes Date: Wed, 20 Apr 2016 19:44:50 +0000 Message-ID: <20160420194450.GA3457@acm.fritz.box> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1461181708 6296 80.91.229.3 (20 Apr 2016 19:48:28 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 20 Apr 2016 19:48:28 +0000 (UTC) To: emacs-devel@gnu.org, Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Apr 20 21:48:19 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1asy6j-0003cD-3v for ged-emacs-devel@m.gmane.org; Wed, 20 Apr 2016 21:48:17 +0200 Original-Received: from localhost ([::1]:55322 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1asy6f-0003Bn-BB for ged-emacs-devel@m.gmane.org; Wed, 20 Apr 2016 15:48:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:36102) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1asy6Y-00034G-07 for emacs-devel@gnu.org; Wed, 20 Apr 2016 15:48:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1asy6U-0005kk-Nd for emacs-devel@gnu.org; Wed, 20 Apr 2016 15:48:05 -0400 Original-Received: from mail.muc.de ([193.149.48.3]:13615) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1asy6U-0005k1-EN for emacs-devel@gnu.org; Wed, 20 Apr 2016 15:48:02 -0400 Original-Received: (qmail 40281 invoked by uid 3782); 20 Apr 2016 19:48:00 -0000 Original-Received: from acm.muc.de (p548A5FCF.dip0.t-ipconnect.de [84.138.95.207]) by colin.muc.de (tmda-ofmipd) with ESMTP; Wed, 20 Apr 2016 21:47:58 +0200 Original-Received: (qmail 3519 invoked by uid 1000); 20 Apr 2016 19:44:50 -0000 Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x X-Received-From: 193.149.48.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:203133 Archived-At: Hello, Dmitry and Emacs. This post describes my notion of how multiple major modes {c,sh}ould be implemented. Key notions are "islands", "island chains", and "chain local" variable bindings. In this scheme, "super modes" will not have to do anything to swap in/out local variable bindings pertinent to islands; this will be done by the underlying C code. Narrowing/widening will not be (ab)used by the super mode mechanism. Major modes will continue to be able to use the entire range of Emacs facilities. Here are some design notes: (i) Overview and motivation. o - The aim is to support several major modes simultaneously in a single buffer. o - The "super mode" will set up "chains of islands" (see below). * - Each chain will have its own major mode, key map, syntax table, etc. * - In each chain, "chain local" variable bindings will exist. Such a binding will be current when point is within an island in the chain. * - The coordination of these bindings will be carried out by the mechanisms described below, without explicit coding in the super mode. o - To the user, the current major mode will be that of the island where point is. All familiar commands will work without restriction. o - To the writer of major modes, a minimal set of restrictions will apply: * - For some major mode commands, the mode will have to bind the variable `in-islands' (see below) to non-nil. * - For regexps which recognise whitespace, the regexp must contain "\\s-" or "\\s " or "[[:space:]]" so that the regexp engine will handle "foreign" islands and gaps between chained islands as whitespace. * - All other Emacs facilities will be available for use, being adapted as necessary for the island mechanism. (ii) Definitions and concepts. o - An @dfn{island} is a contiguous portion of a buffer marked at each end. Its attributes are those of the chain of islands of which it is an element. o - A @dfn{chain} of islands is a canonically ordered chain of islands in a single buffer. An island chain has its own major mode; it has its own syntax table, abbreviation table, font lock settings, etc. It has its own bindings of (most) "buffer" local variables. o - An island chain will have @dfn{chain local} variable bindings. Such a binding will become current and accessible when point is within one of the chain's islands. When point is not in an island, the buffer local binding of the variable will be current. Most variables which are currently buffer local in Emacs 25 will become chain local. Those (relatively few) variables which must retain a single value over an entire buffer will be marked as such with a non-nil value of the `entire-buffer' property. o - The variable `using-islands' will be set non-nil to indicate the current buffer is using the island mechanism. o - The variable `in-islands' will control island and island chain facilities. When this variable is bound to non-nil, the facilities described here (such as chain local variables) are active. When the variable is nil, (most of) the new facilities are inactive, and Emacs behaves as Emacs 25. (iii) Island Chains. o - An island chain will be a Lisp object which is a C struct similar to struct buffer. In particular, it will contain slots for common chain local variables, and an association list for bindings of other chain local variables. o - An island chain might contain pointers to the first and last of its islands (still to be decided). (iv) Islands. o - An island will be delimited in two complementary ways: * - It will be enclosed syntactically by characters with "open island" and "close island" syntax (see section (v)). Both of these syntactic markers will include a flag "chain" indicating whether there is a previous/next island in the chain. The cdr of the syntax value will be the island chain to which the island belongs. * - It will be covered by the text property `island', whose value will be the pertinent island or island chain (see section (ii)) (not yet decided). Note that if islands are enclosed inside other islands, the value is the innermost island. There is the possibility of using an interval tree independent of the one for text properties to increase performance. o - An island might be represented by a C or Lisp structure, it might not (not yet decided). This structure would hold the containing chain, markers pointing to the start and end of the chain, and the previous and next islands in the chain. (v) Syntax, etc. o - Two new syntax classes, "open island" and "close island" will be introduced. These will be designated by the characters "{" and "}". Their "matching character" slots will contain the island's chain. There will be an extra flag "chain" (denoted by "i") indicating whether there is a previous/next island in the chain. o - `scan-lists', `scan-sexps', etc. will treat a "foreign" island as whitespace, much as they do comments. They will also treat as whitespace the gap between two islands in a chain. o - The (currently 11 element) parser state will be enhanced to support islands as follows: * - A twelfth element will be introduced. This will contain an association list whose elements will have the form (island-chain . 12-element parse state); each element will contain the suspended state of parsing in the island chain which is the car of the element. An element with a car of nil will represent the suspended parsing state of the buffer outside of islands. * - Elements 12, 13, .... will be island chains of the enclosing islands, elt 12 being that of the innermost enclosing island, etc. An element with a value of nil indicates being outside all islands. o - `parse-partial-sexp' will create and use an enhanced parser state as described above. Note that a two character construct (such as a C comment opener) can not enclose an island, and special handling will be required to exclude this. The syntax table in use will change as the current position passes between islands. o - `syntax-ppss' will do the right thing with the extended parser state. Alternatively, `syntax-ppss' will have an independent 12-element state in each island chain, where elt. 11 is always nil. Its cache mechanism will be enhanced such that buffer changes outside of an island chain need not invalidate the stored cache pertaining to the chain. o - The facilities in this section are active even when `in-islands' is nil. (vi) Regexps. o - The regexp engine will be enhanced such that the regexps "\\s-", "\\s ", and "[[:space:]] will match an entire island. o - The gap between two islands in a chain will also be matched by the above regexps. o - This treatment of an island, and a gap between two islands, as WS will occur only when `in-islands' is non-nil. o - When `in-islands' is nil, there will be no reliable way of scanning over an island by regexps, since it is a potentially nested structure, and FSMs don't recognise arbitrarily nested structures. (vii) Variables. o - Island chain local variable bindings will come into existence. These bindings depend on the island point is in. There will be lower level routines that will have "position" parameters as an alternative to using point. o - All variables which are currently buffer local will become chain local except for those whose symbols are given a non-nil `entire-buffer' property. There will be no new functions like `make-chain-local-variable'. o - When the `entire-buffer' property is nil, the buffer local binding of a variable will hold the value pertinent to the areas of the buffer outside of islands. When that property is non-nil, the binding holds the value for the entire buffer. o - When `in-islands' is nil, the chain local mechanism described here is not used - instead the familiar buffer local binding is used. o - The current binding for a local variable will be the chain local binding of the island chain of the island containing point. If point is not in an island, the buffer local binding is current. o - If a chain local binding is current, and its value is unbound, the binding of an enclosing scope is NOT used in its place. Probably the variable's default-value should be used when reading. o - In buffer.h, a new macro CVAR ("island chain variable") analogous to BVAR will be introduced. It will use BVAR as a fall back. Most invocations of BVAR will be changed to CVAR. o - In data.c, the mechanism for accessing local variable bindings (e.g. `swap_in_symval_forwarding') will be enhanced to test `in-islands' and handle chain local bindings appropriately. (viii) Change hooks. o - There will be two additional abnormal hooks, `island-before-change-function' and `island-after-change-function', which will each hold a single function or nil. These will take the same parameters as `before-change-functions' and `after-change-functions' respectively. o - The return value of these functions will be an association list with members whose car is an island chain (or nil, meaning "outside all islands") and whose cdr is the list of parameters to supply to `before/after-change-functions for that chain. Usually, the alist will have just one member containing BEG, END, and for `after-..' OLD-LEN unchanged. o - After calling each of these functions, Emacs will invoke `before/after-change-functions' on each chain in the returned alist. This will be in place of the standard calls to `before/after-change-functions'. o - The intention of these hooks is that super modes will use them to detect the deletion and insertion of islands, and to do the "de-islandification" and "islandification" as needed. o - `before/after-change-functions' will be normal chain local variables. A chain local binding will hold functions for the individual chain. The buffer local binding will hold functions for the parts of the buffer outside of islands. (ix) Miscellaneous commands and functions. o - `point-min' and `point-max' will, when `in-islands' is non-nil, return the max/min point in the visible region in the same chain of islands as point. o - `search-\(forward\|backward\)\(-regexp\)?' will restrict themselves to the current island chain when `in-islands' is non-nil. o - `skip-\(chars\|syntax\)-\(forward\|backward\)' will likewise operate in the current island chain (how?) when `in-islands' is non-nil. o - `\(next\|previous\)-\(single\|char\)-property-change', etc., will do the Right Thing in island chains when `in-islands' is non-nil. o - New functions `island-min', `island-max', `island-chain-min' and `island-chain-max' will do what their names say. o - There will be no restrictions on the use of widening/narrowing, as have been proposed for other support engines for multiple major modes. o - New commands like `beginning-of-island', `narrow-to-island', etc. will be wanted. More difficultly, bindings for them will be needed. o - ??? Other commands to be amended. (x) Emacs subsystems and `in-islands'. o - Redisplay will bind `in-islands' to non-nil, but will successfully display all islands wholly or partially in windows being displayed. o - Font Lock will bind `in-islands' to non-nil, but will successfully fontify all pertinent islands. o - `island-before/after-change-function' will be called with `in-islands' nil. o - `before/after-change-functions' will be called with `in-islands' bound to non-nil. o - Major modes will need to bind `in-islands' to non-nil for such things as indentation. o - For normal user interaction, `in-islands' will be nil. -- Alan Mackenzie (Nuremberg, Germany).