From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Oliver Scholz Newsgroups: gmane.emacs.devel Subject: Re: enriched-mode and switching major modes. Date: Thu, 16 Sep 2004 19:04:53 +0200 Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Message-ID: References: <200409042358.i84Nwjt19152@raven.dms.auburn.edu> <87llfn5ihw.fsf@emacswiki.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1095354371 28321 80.91.229.6 (16 Sep 2004 17:06:11 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 16 Sep 2004 17:06:11 +0000 (UTC) Cc: boris@gnu.org, emacs-devel@gnu.org, alex@emacswiki.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Sep 16 19:05:57 2004 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1C7zhf-00012m-00 for ; Thu, 16 Sep 2004 19:05:53 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1C7znK-0007IJ-DQ for ged-emacs-devel@m.gmane.org; Thu, 16 Sep 2004 13:11:42 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1C7znC-0007Hf-CJ for emacs-devel@gnu.org; Thu, 16 Sep 2004 13:11:34 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1C7zn9-0007Gp-UA for emacs-devel@gnu.org; Thu, 16 Sep 2004 13:11:33 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1C7zn9-0007GQ-QP for emacs-devel@gnu.org; Thu, 16 Sep 2004 13:11:31 -0400 Original-Received: from [213.165.64.20] (helo=mail.gmx.net) by monty-python.gnu.org with smtp (Exim 4.34) id 1C7zh2-0006aN-FI for emacs-devel@gnu.org; Thu, 16 Sep 2004 13:05:13 -0400 Original-Received: (qmail 27493 invoked by uid 65534); 16 Sep 2004 17:04:59 -0000 Original-Received: from dsl-084-057-025-038.arcor-ip.net (EHLO USER-2MOEN8BWBA.gmx.de) (84.57.25.38) by mail.gmx.net (mp019) with SMTP; 16 Sep 2004 19:04:59 +0200 X-Authenticated: #1497658 Original-To: rms@gnu.org In-Reply-To: (Richard Stallman's message of "Wed, 15 Sep 2004 11:42:52 -0400") X-Attribution: os X-Face: "HgH2sgK|bfH$; PiOJI6|qUCf.ve<51_Od(%ynHr?=>znn#~#oS>",F%B8&\vus),2AsPYb -n>PgddtGEn}s7kH?7kH{P_~vu?]OvVN^qD(L)>G^gDCl(U9n{:d>'DkilN!_K"eNzjrtI4Ya6; Td% IZGMbJ{lawG+'J>QXPZD&TwWU@^~A}f^zAb[Ru;CT(UA]c& User-Agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3.50 (windows-nt) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:27168 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:27168 Richard Stallman writes: > I mean "the document's character data" here. The important point is > that formats suitable for WP (RTF, HTML ...) separate character data > from formatting information entirely. > > My point is that this is exactly what we must not do in Emacs, lest it > ruin everything in a subtle way. [I moved that quote to the top, because the issue it adresses seems to me to be at the root of our disagreement. Everything I am going to write below needs to refer to the distinction between the abstract document and its visual rendering. I am sorry if I am stealing your time by flooding you with unsolicited information; but since we discuss this prematurely, i.e. without me showing the code implementing it, I want at least to clearly express my points.] The distinction between character data and formatting information, mentioned in the paragraph you quoted, is inherent in the type of documents we discuss here. You can enter as many newline characters in the source of an RTF document as you want, an RTF reader will simply remove them. You can also put as many newline characters and spaces into a HTML document as you want, a browser will just treat each sequence of such whitespace control characters as a single space. But probably you challenge my statement that it is important to keep the distinction between the abstract document and its visual representation in mind when discussion the implementation of WP functionality. This distinction is also inherent in the types of documents we discuss here. But first I should make clear what I mean by those terms. I say that we need to distinguish, at least conceptually, between: the abstract document, the data structure that represents the abstract document, the encoded file and the visual representation of the abstract document. 1) the abstract document The abstract document is what the user of a word processor wants. She doesn't specify: "I want four space characters before each line in this area." nor "I want an tag before and an tag after this piece of text." The former would refer to the way the visual representation could happen to be implemented in Emacs, the latter would refer to the document file format. The user may be---and users of word processors typically are---blissfully ignorant of both. Instead the user says: "This piece of text is a paragraph, because I hit RET when I finished writing it. I want it to be a paragraph of the type "Standard Text". I want "Standard Text" paragraphs to be indented on the left by 1 cm and to have a font size of 10 pt. Exept here, I want this paragraph to have an indentation of 2 cm." I call the ensemble of character data and this latter specification of how the character data should "be like" the abstract document. It is what the user has expressed by typing text and interacting with the UI. (It is thus important to design a good UI so that a user may express her wishes most clearly.) 2) the data structure used to represent that abstract document internally I list this explicetly, because there are quite some glitches here when implementing WP functionality in Emacs. Among other things I suspect that I have failed to make one or two requirements for implementing the data structure in Emacs clear, if that data structure is meant to express abstract documents that are suitable for processing RTF (HTML ...). More on this below. 3) the encoded document file This is the document as it would be written to a file: the concrete RTF or HTML file. Not every file format is capable of expressing every abstract document. The expressiveness of the available file formats would limit---or even shape---the possible abstract documents that a user may specify in a word processor. 4) the visual (aural) representation of the abstract document This is how the document is rendered on screen (or by a speech synthesizer; I won't further discuss aural rendering for now.) It is important to keep two things in mind: a) For a given word processor application, two different documents could look exactly the same when rendered on the display. For instance, in the document there could be specified as a paragraph property that the first line of a paragraph should be indented by 1 cm. While in another document a user could get exactly the same visual effect by entering a number of space characters. (Notabene: for a given application; when transfered to another application or to another machine, the appearances to those two documents could -- and probably would -- differ.) b) For a given document, two different applications or the same application on two different machines/operating systems might render two different visual representations. For example the font for a paragraph could be "Times New Roman" on MS Windows, but "Times" or even "New Century Schoolbook" on XFree. Or the rendering device could be a tty. In that case word wrapping would happen at different places. Or the user might have not specified a width for paragraphs at all, in that case the rendering could depend on application specific defaults or even on the size the application window happens to have. +---------------------+ | abstract document | +---------------------+ ^ | v +-----------------------------+ | data structure | +-----------------------------+ ^ | | | | | | v v +-------------+ +-----------------+ +------------+ |user commands| |visual appearance| |encoded file| +-------------+ +-----------------+ +------------+ Even if we put these things together in Emacs, for one reason or another, we need to keep said distinctions in mind---again: at least conceptually---or else confusion will arise. Any implementation of WP functionality will have to account for how it deals with each of these aspects or else Emacs would become infamous as the "so-called word processor that provides a lot of surprises for its users." The implementation that I suggested in my last posting (that one which would work without a box model supported by the display engine), strives to preserve the distinction by a) implementing a data structure that stores those parts of the abstract document which affect the formatting properties of paragraphs in a special text property of the paragraph text, b) using font-lock and filling to make these formatting properties visible. Since filling would add characters to the buffer content, my proposed implementation would distinguish those characters added for visual appearance from the characters which are part of the abstract document by means of another special text property. This is very important: If a user enters space characters into an Emacs buffers, she wants there to be space characters. Those characters would have to become part of the character data in the encoded file. But if a user just specifies: I want this paragraph to be indented, then the space characters used to display the left margin _must_not_ become part of the encoded file. I can not put too much emphasis on this. Imagine the space characters and the newlines as they appear in the buffer would be preserved. A encoded RTF file could look like this: {\rtf\ansi This is just a paragraph. It contains pointless example text. Habe nun ach, Philosophie, Juristerei und Medizin und leider auch Theologie durchaus studiert mit heissem Bemuehn. Da steh' ich nun ich armer Tor und bin so klug als wie zuvor. To be or not to be this is the question. Whether it is nobler in the mind ...} If the user visits that file again with Emacs on the same system and has not changed any of her customisations, then it would look the same again. So she would be content. But try it! Write that text above to a file and open it with OpenOffice or AbiWord or whatever. If I open it in OpenOffice on my system, it looks like this (copied & pasted line by line, adding a newline after each line): This is just a paragraph. It contains pointless example text. Habe nun ach, Philosophie, Juristerei und Medizin und leider auch Theologie durchaus studiert mit heissem Bemuehn. Da steh' ich nun ich armer Tor und bin so klug als wie zuvor. To be or not to be this is the question. Whether it is nobler in the mind ... Obviously this is not an option. `enriched-mode' has a nice UI bug here. If a range of text has a `left-margin' text property, say with a value of 4, it removes the spaces on the left when saving and puts according " ... " tags around it. But the fill function of text-mode also allows the user to insert just four spaces at the beginning of a paragraph, auto-fill will then fill it with four spaces indentation. In the file, this will result exactly in what I described for RTF above. This is by no means distinguishable from the case where there is a `left-margin' text property, except by carefully examining several pieces of the buffer text with `C-u C-x ='. This bug has a great change to go unnoticed for a while or even forever, because a) text/enriched is an extremely simple markup format, b) there is probably not other application that deals with text/enriched files (contrary to text/enriched e-mail) and c) the text/enriched files of Emacs specify the value of fill-column in their "header". So even if the document is transfered to another user who has a different default value for fill-column, the buffer's fill-column will be the same as in the Emacs were the document was created. The circumstances under which the bug could show up are probably rather rare. We can by absolutely no means rely on things like this for RTF or HTML or DocBook or TEI XML. Those formats are far too complex. Those formats are widely used. Such inconsitencies in the user interface would inevitably result in more bugs of this kind which would inevitably show up. We do have to design WP functionality very carefully in order to distinguish between the abstract document and its visual appearance. And the user interface must at each point provide enough feedback about what is part of which. This bug is due to an ambiguous user interface in enriched-mode. There is an easy fix here: let the fill function indent a paragraph if and only if there is a `left-margin' text property. Thank goodness, text/enriched has no concept of character formatting properties applying to a paragraph (a block box), and thus the abstract document that can be produced in enriched-mode does not provide that either; which fits well to Emacs, because the display engine also does not know that concept. In an Emacs buffer as well as in a text/enriched file paragraph boundaries are defined by the newline characters that precede and follow it; character formatting properties in an Emacs buffer are specified with text properties and in a text/enriched file with tags. So these fit together. Not so RTF. Not so HTML. A paragraph is specified not by newline characters before or after it, but by syntactical markup. Character formatting properties may apply not only to a range of characters (an "inline box") but also to a whole paragraph (a "block box"). So where am I supposed to store those character formatting properties applying to the whole paragraph? The only option I have are text properties. But since text properties do not define a paragraph visually (the display engine does not support this), I have to make sure that a range of text defined to be a paragraph by means of text properties, does /appear/ as a paragraph in the buffer. That means that I have to define a fill-function as described in my last posting. Perhaps I did not make one point clear enough: this fill function must have the _total_control_ over the whitespace formatting in the buffer. There must not any other Lisp function interfere here. Or else the abstract document/the data structure and the appearance on the screen would get out of sync. If the user wrongly thinks that the visual appearance matches the abstract document as specified in the data structure and saves the file, she will be painfully surprised the next time she visits it. In other words: we would have introduced an indirect way of corrupting the user's files. We would have introduced a myriad of possibilities for bugs like the one in enriched-mode described above. > The hairy part is whitespace formatting. The problems arise from the > fact that I can't tell Emacs: "Display this text from position POS1 to > POS2 as a paragraph with a left margin of 20 pt and a right margin of > 40 pt with 20 pt above and below -- *without* adding any character to > the buffer." > > The idea of Emacs is that the appearance is represented by text in the > buffer. We designed text properties so that the text in th ebuffer > could be something more than a sequence of characters. > > Any extensions have to preserve this principle, or we will lose > the benefits of Emacs. I obviously disagree, since I argued for an enhancement that goes well beyond this principle. I fail too see at all how we would lose the benefits of Emacs this way. As I see it, it would only add to Emacs capabilities, extend the domain of documents it is suitable for and enlarge its benefits this way. > If Emacs' display engine would support this, e.g. as a `block' text > property, then I could write: > > (progn (switch-to-buffer (generate-new-buffer "*tmp*")) > (insert "Example text. Example paragraph. Example text.") > (put-text-property 15 33 > 'block > '(:margin (4 1 1 1) :border nil :padding nil))) > > If the block parameters are specified as a text property on the entire > contents of the block, that might solve the problem. However, there > are some details here that are nontrivial problems. > > 1. How to distinguish between two similar boxes with the same specs > and a single longer box. I have not spent much thought on this; I have only spent quite some time trying to grok the display engine (in vain, so far) and have not come to a stage where I could make some reasonably funded proposals for the design of that feature. Maybe they could programatically be distinguished by checking for `eq'uality rather than `equal'ity. Maybe for the user this wouldn't suffice, because they would look the same after `C-x =' which could or could not be confusing. I have no idea how to specify /nested/ block boxes with text properties, either. This is also a problem. > 2. How to represent line breaks. Saying "break one long line at > display time" would work ok for display, but all the commands that > operate on lines would see just one long line there. `C-a', `C-p' and the like would need to be enhanced to deal with this. Since it should of course be possible to check from Lisp whether some buffer text is the content of a box, I don't see any problems here which would be greater then the problem of implementing the display feature itself. In fact, there are at least three packages that already implement this for wrapped long lines: screen-lines.el, screenline.el and window-lines.el. (Ouch! They stopped to work with the current CVS version ...) > 3. How to represent indentation. If the indentation appears only > in redisplay, Lisp code that looks at the text will think it is not > indented at all. I actually regard this as a feature. In WP documents the left margin has no more significance than the right margin, which is not currently implemented by adding space characters, either. Functions that, for instance, match on the space characters used to display a left margin, would match on something which is not part of the character data of the document: if the user sends that document to somebody else using another word processor or uses another word processor herself, they simply won't be there. And with the implementation that I proposed in my last mail: if an arbitrary Lisp function not part of the WP mode actually changes the contents of a buffer containing a WP document, they better change only the character data, not any spaces or newlines added programatically for whitespace formatting or else they could wreak havoc in an intolerable way. But the display feature I proposed would greatly increase the chance that existing Emacs Lisp functions would seamlessly do the right thing in a buffer containing a such a WP document. Right now they work on character data only and they should continue to work on character data only in a WP document. > I think we need to look for a hybrid solution where there could be a > text property saying how this box is supposed to look, but that works > by inserting newlines, indentation, etc., so that the text can be seen > in Lisp without need to decode the box. You mean so that those characters are added and updated automagically? Hmm, I seem to recall that I have experimented a bit with using jit-lock to do the whitespace formatting, when I first started to think about WP in Emacs. But I don't remember how well that worked. I did not follow that approach further, because it seemed unorthodox to me at that time. However, something like this could at least deal with some of the problems of keeping visual representation and the abstract document in sync, so it would be at any rate a big improvement. However, the implementation strategy that I described is meant as a temporary solution to implement a box model in Lisp without support from the display engine. If you see it as a final state, moreover, if you think that it is fundamentally different from a box model, then I have failed to explain it. > If Emacs display engine would support a block model, we would just > tell the display engine how to render the paragraphs. There is not a > single newline chars and no space between paragraphs that would be > part of the character data. I.e. > `(buffer-substring-no-properties (point-min) (point-max))' would > return: > > "Lirum larum (A headline)\"Mariage is the chief cause of divorce.\"\ > This is just ordinary paragraph text. Nothing special here. This is\ > a list item. It contains two subitems:One and Two This is another \ > list item." > > This model fails to address those problems. It would work as a way > of grafting a separate word processing facility into Emacs, but it > would not integrate well with the existing Emacs Lisp world. I don't understand why you say that. And I don't know which parts of the existing Lisp world you mean. `query-replace' and isearch, for example, would do the right thing. Not so with the replacement for a real box model that I described. `query-replace' could even indirectly lead to file corruption, as explained above. Please note also, that a real box model supported by the display engine is the only way to get tables with the capabilities that people will expect (I am talking about columns with different font heights here and about table borders). It is not possible to mimic it without. > However, later you talk about an implementation more like what I have > in mind, where the boxes and lists would be rendered by changing the > buffer text; therefore, the buffer text would show what's really > there. Erm, what does the concept of "what's really there" in that context mean? In the buffer, or more generally spoken: in the data structure a containing block box, or a text property storing formatting information is, of course, no less there than any space or newline character added for whitespace formatting. But when Emacs writes the WP document out to a file a again, say an RTF file, then it would /remove/ those space and newline characters. It definitely has to; everything else would be a most serious bug. So, spoken from the point of view of the file format, those space and newline characters are not "there" at all. > However, about one thing I am positiv: there is absolutely no room > for a minor mode here. That's why I say that enriched-mode (as a > minor mode) is a dead end. > > I don't see that that follows from the rest of your points. I don't understand why you say that. I described an implementation that needs to be in full control of every aspect of whitespace formatting (at the very least). I don't see how that could justifiedly be implemented as a minor mode. And if it were, I don't see with what major modes it should be able to work together. I can think of a third implementation strategy for editing RTF files that /could/ be implemented as a minor mode; and if enriched-mode would be slightly changed, it could even be part of enriched-the-minor-mode. I don't want to undermine my own position, but here it is: It would be an heuristical approach. The encoding function (when saving to a file) could determin whether a paragraph (defined by its preceding and following newline characters) is indented by space characters. It could then write the nessary markup to the file accordingly. The problem of character formatting properties applying to a paragraph could be solved in a similar, heuristical way: determine whether---let's say---a `paragraph-stylesheet' with a particular property predominates a paragraph; if so, write the according markup again. There are two important consequences implied in this approach: About the first one I am not 100% sure. But it is quite possible that We would not even try to come close to the visual appearance of the document in other word processing applications. One might say: we would try to preserve only the syntactial properties of the document when reading it. The other consequence is that this approach will invariable fail sometimes; this is implied in the fact that this approach is heuristical. If it is carefully implemented, it might work well enough in common cases to be useful. I don't know; that area would need to be explored. Bulleted and numbered lists would probably have to be implemented as an exeption to the heuristical approach in a way that is similar to what table.el does for tables. However: it would be an incredible exaggeration to call Emacs a "word processor" if it followed such an heuristical approach. And personally I am not interested in pursuing it. Not the least reason is that some particular aspects of what I personally want for WP in Emacs would not be feasible this way. This is not the time to discuss the whole abyss of my ideas. Yet I want, for instance, an API that lets people access the contents of a WP document syntactically by a query language (to name the devil: XPath); so that people could specify in Lisp: give me all text areas that are of the type "headline 2" etc. I mention this approach only for completeness. Oliver -- Oliver Scholz 30 Fructidor an 212 de la Révolution Ostendstr. 61 Liberté, Egalité, Fraternité! 60314 Frankfurt a. M.