From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Oliver Scholz Newsgroups: gmane.emacs.devel Subject: Re: enriched-mode and switching major modes. Date: Wed, 22 Sep 2004 12:01:27 +0200 Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Message-ID: References: <200409042358.i84Nwjt19152@raven.dms.auburn.edu> <87llfn5ihw.fsf@emacswiki.org> <01c49c75$Blat.v2.2.2$7a37cb00@zahav.net.il> <01c49d70$Blat.v2.2.2$f7cfb860@zahav.net.il> <01c49da7$Blat.v2.2.2$cd5f7160@zahav.net.il> <01c49dc6$Blat.v2.2.2$3b624d40@zahav.net.il> Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1CA3yp-00056O-00 for ; Wed, 22 Sep 2004 12:04:07 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1CA44n-0008UP-Pd for ged-emacs-devel@m.gmane.org; Wed, 22 Sep 2004 06:10:17 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1CA42x-00081F-6p for emacs-devel@gnu.org; Wed, 22 Sep 2004 06:08:23 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1CA42v-00080m-KV for emacs-devel@gnu.org; Wed, 22 Sep 2004 06:08:22 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1CA42v-00080C-BQ for emacs-devel@gnu.org; Wed, 22 Sep 2004 06:08:21 -0400 Original-Received: from [213.165.64.20] (helo=mail.gmx.net) by monty-python.gnu.org with smtp (Exim 4.34) id 1CA3wL-0003pS-KL for emacs-devel@gnu.org; Wed, 22 Sep 2004 06:01:34 -0400 Original-Received: (qmail 13105 invoked by uid 65534); 22 Sep 2004 10:01:28 -0000 Original-Received: from dsl-084-057-030-048.arcor-ip.net (EHLO USER-2MOEN8BWBA.gmx.de) (84.57.30.48) by mail.gmx.net (mp026) with SMTP; 22 Sep 2004 12:01:28 +0200 X-Authenticated: #1497658 Original-To: rms@gnu.org In-Reply-To: (Richard Stallman's message of "Tue, 21 Sep 2004 14:30:53 -0400") X-Attribution: os X-Face: "HgH2sgK|bfH$; PiOJI6|qUCf.ve<51_Od(%ynHr?=>znn#~#oS>",F%B8&\vus),2AsPYb -n>PgddtGEn}s7kH?7kH{P_~vu?]OvVN^qD(L)>G^gDCl(U9n{:d>'DkilN!_K"eNzjrtI4Ya6; Td% IZGMbJ{lawG+'J>QXPZD&TwWU@^~A}f^zAb[Ru;CT(UA]c& User-Agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3.50 (windows-nt) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:27425 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:27425 Richard Stallman writes: > When rendered by a graphical, CSS2-enabled browser, you'll see two > paragraphs on a gray background sourounded by a dashed border. Those > two paragraphs are again contained in a larger paragraph on a purple > background surounded by a solid border. > > It will be very hard to implement this in a way that fits in with > Emacs. Okay, maybe this is the time to lay out the design on which I am spending thought and code since I ditched the approach that I already mentioned (as in `wp-example.el'). I fear, though, that you won't like it. The idea crossed my mind when I thought about how to implement a data structure fit for XML + CSS in Emacs Lisp. In other words: how to make Emacs a /rendering/ XML editor. XML is by nature a tree-like format. The W3C has specified the structure of information contained in XML documents in a way that abstracts from the pointy brackets syntax; this abstract data set is called the "XML Information Set": http://www.w3.org/TR/xml-infoset/ For simplicity, I focus on elements and character data here and talk about them as "nodes" in a tree; thus we have "element nodes" and "text nodes". An XHTML fragment like

Some meaningless text

Would be regarded as a `h1' element node which has three children: a text node "Some ", an element node `em' (which has itself a text node as its single child) an another text node " text". I found out that I can translate any RTF document into an instance of the XML info set. So I can reduce the problem of designing a data structure for word processing in Emacs to the question of how to implement the XML info set in a way that text nodes are stored in a buffer rather than in a string and that they are /editable/. And that question I can reduce to: how can I implement a tree-like data structure with text properties? So far I have considered two ways to do this. Both have specific disadvantages. But I'll come to that in a minute. [If desired, I have prototype code for each of those two approaches to experiment with. :-/ ] One way is to have a single, unique Lisp object, a vector for example, stored in a text property, say `text-node'. That vector (or list) would store a reference to its immeditate parent (which is always an element node). That parent would have a reference both to its children and to its own parent and so on. In addition, a buffer-local variable would store the root element. This has the advantage that I have two views of the document: One as a Lisp Object, a tree of vectors or of lists, stored in a variable; the other one as the content of a buffer with specific text properties. The former allows to implement an API for accessing the contents of the document and modifying it---I am thinking of XPath, DOM and other W3C standards here that many people are familiar with. If I have a text node (the vector or list), then I can find its text in the buffer with `(text-property-any (point-min) (point-max) 'text-node TEXT-NODE)' This should be fast. But this solution has an undesirable fragility: care must be taken, when killing and yanking, that both the text properties and the tree be updated accordingly (for example if the killing results in the entire deletion of a text node). And if the tree is modified directly (via the API), then the buffer contents need to be updated, too (for example when this leads to transfering text nodes to another place in the tree). Basically this is again the problem of keeping two structures in sync again. The other way is to have a text property `parents' on each text node in the buffer. This would hold a list of all ancestor nodes in the tree, starting with the immediate parent. The disadvantage here is that finding nodes takes much more time. Especially finding all the children or descendants of a node takes time. Whereas in #1 I have a reference to the children in the node, here I have to scan several ranges of text properties to determine the children, e.g. 1. Find the first position in the buffer where NODE is a member of the value of the text property `parents'. 2. Push the value of `parents' to a list. 3. Find the next single property change of `parents'. 4. Determine if NODE is a member of the value of `parents'. If yes, goto 2. If no, got 5. 5. Determine children or descendants from the collected values. Some care is necessary with copying and inserting text. But we avoid to keep to separate structures in sync at the cost that the access of nodes (and thus the API) is inefficient. So how to handle formatting in the buffer? The element nodes would store formatting information---either after applying a CSS stylesheet to the tree, or, in the case of RTF, right away when parsing the file. Functions that apply the formatting in the buffer (i.e. filling and jit-lock) scan the tree upwards until they find the information they need. I have not yet determined whether #2 requires too much time for this. The idea of #2 is rather new and not fully thought out and tested. I am not certain if I am aware of all possible pitfalls. Moreover, I have not yet figured out every detail of handling formatting information in general. I am still in the process of reading specifications in order to get an overview. I have to admit that I have also been wondering, whether something could be done on the C level to provide for such tree-like documents in an Emacs buffer. I don't have a clue here, though. [Yesterday or so, a third way how to handle nested blocks crossed my mind. Maybe each paragraph could have a `nesting-level' text property whose value is an integer. For each nesting-level N, with N > 0, the first preceeding block with a nesting level N - 1 is the immediate parent. I have no idea yet how, if at all, that would translate to the XML info set, though.] Oliver -- Oliver Scholz 1 Vendémiaire an 213 de la Révolution Ostendstr. 61 Liberté, Egalité, Fraternité! 60314 Frankfurt a. M.