From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Oliver Scholz Newsgroups: gmane.emacs.devel Subject: Re: Saving markup formats Date: Tue, 19 Jun 2007 09:43:08 +0200 Message-ID: References: <871wgi9jzb.fsf@jidanni.org> <87odjlwpu1.fsf@jurta.org> <87ir9r1m99.fsf@jurta.org> <87myz2i9tj.fsf@jurta.org> <87r6ocx0tk.fsf_-_@jurta.org> <87d4zuyvn6.fsf@gmx.de> <87fy4prmdf.fsf@jurta.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1182240689 10546 80.91.229.12 (19 Jun 2007 08:11:29 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 19 Jun 2007 08:11:29 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jun 19 10:11:27 2007 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1I0Yo7-00025a-1S for ged-emacs-devel@m.gmane.org; Tue, 19 Jun 2007 10:11:23 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1I0Yo6-0003Kr-C3 for ged-emacs-devel@m.gmane.org; Tue, 19 Jun 2007 04:11:22 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1I0Ynm-0003EE-46 for emacs-devel@gnu.org; Tue, 19 Jun 2007 04:11:02 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1I0Ynl-0003Dm-Aa for emacs-devel@gnu.org; Tue, 19 Jun 2007 04:11:01 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1I0Ynl-0003Df-5I for emacs-devel@gnu.org; Tue, 19 Jun 2007 04:11:01 -0400 Original-Received: from main.gmane.org ([80.91.229.2] helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1I0Ynk-00015T-Bi for emacs-devel@gnu.org; Tue, 19 Jun 2007 04:11:00 -0400 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1I0YYf-0004Rs-Nx for emacs-devel@gnu.org; Tue, 19 Jun 2007 09:55:25 +0200 Original-Received: from dslb-084-058-047-130.pools.arcor-ip.net ([84.58.47.130]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 19 Jun 2007 09:55:25 +0200 Original-Received: from alkibiades by dslb-084-058-047-130.pools.arcor-ip.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 19 Jun 2007 09:55:25 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 110 Original-X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: dslb-084-058-047-130.pools.arcor-ip.net User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux) X-detected-kernel: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:73286 Archived-At: Juri Linkov writes: >> This is a very old project of mine, and an abandoned one, I am afraid. >> Of course, anybody is free to make use of the codebase, but I for >> myself am convinced that it is the wrong approach. > > Could you tell why do you think it is the wrong approach? This would help > someone who will do something similar to avoid mistakes you think you made. Basically the approach was too naive. Basically I started like this: "Hey, implementing RTF can't be too hard. Let's just take the RTF spec, write a parser for it, get the text with some text properties into a buffer and write a major mode for editing it." Until I realised, a) if you want word processing in Emacs this SHOULD be designed with different target formats right from the start and b) even for RTF alone this is not sufficient. IIRC I was originally planning for flat data structure: RTF's paragraph formatting properties and character formatting properties each stored as lists or vectors in a separate text property. Font-lock then would resolve character formatting properties and apply faces, Fill-paragraph would resolve whitespace formatting properties. This would work for simple cases. But it wouldn't in all cases preserve the logical structure of the original document, if you got it from somebody using a different word processor. This is a very bad thing; a _reliable_ word processor---as opposed to an unreliable hack---shouldn't make any changes to the logical structure of a document unless explicitly ordered to do it. Also, while it is o.k. to implement only a subset of RTF in the beginning, the design (or lack thereof) of the data structure would eventually lead to a dead end. > How would you do things over if you had enough time? I'd start designing the data structure. I would do it with an eye on the various specifications for XML (most notably: the XML info set and the style properties in CSS), for the simple reason that they were designed to cover a wide range of needs for text/data representation, formatting, text processing etc. and that in this area they are tested by a lot of people. So looking at XML right from the start could help avoiding shortcomings in the design that lead to dead ends or crude kludges later. Also, for a word processing suite in Emacs, XML file formats would be the major target formats besides RTF: XHTML, TEI XML, DocBook, eventually techinfo XML. So, IMNSHO spending thought about text representation and rendering in Emacs is the _very first_ thing to do. Once you have a capable data structure, parsing RTF is not too hard. You can regard an RTF document as some sort of weird s-expression, with "{" and "}" instead of parentheses. It is still a dreadful file format, because of its lack of constraints, but, again, if and only if you have a well designed data structure, you have a fighting chance to deal with those dreads. I'd start with designing a data structure that is a realisation of the XML info set. (This has nothing to do with pointy brackets. The XML info set is a specification of requirements for a data structure. IIRC there is a W3C technical report out there.) This doesn't have to be DOM. I am pretty confident, that it is rather straight forward to parse RTF into an instance of the XML info set. Unfortunately, this is were the trouble starts. With the XML info set the logical structure of the internal data structure is clear (except for style properties). But the specifics depend on Emacs' ability to render text on-screen. Of course, there are ways to implement a tree-like data structure even with CDATA as text in a buffer right now---I was experimenting with overlays, for instance. But eventually, if you really go for word processing, you'd have to enhance the display engine anyways to deal with certain style properties. So you'd might as well design both, the tree-like data structure and how the display engine deals with it, right from the beginning---thus gaining, possibly, maximum reliability. Also, after I discarded the naive approach I have spent a lot of thought on UI issues and I would advice anybody to do the same---again, right from the beginning, before you write a single line of code. What is word processing exactly? Current word processors are hybrid beasts, undecided between DTP software (like Adobe Indesign, Quark Express, Macromedia Freehand, Inkscape, Scribus) and programmes dealing the logical structure of documents. Word processors have an ambiguous editing model, resulting from a long history starting with their origin as replacements for typewriters. Emacs' editing model on the other hand is mostly about dealing with text/plain, even were it goes beyond that. So if you come from the Emacs world alone, you are bound to unwillingly design your UI according to your best known editing model (especially because it is easiest to implement in Emacs) and maybe you even make decisions for the design of the data structure which make that mandatory. Thus, you'd add to the confusion that's already there. The result being probably the worst word processor ever, not the best. In that case, would be better to stick with packages like Muse or emacs-wiki or something similar, which are one-way (no reading of other file formats), for your own documents (that's what I do); and use Abiword or OpenOffice if you receive documents from somebody else. Word processors are good for document exchange, that's why they are more popular than DTP software. So, should the UI make the logical structure which would be stored in the file explicit? What are the objects a user wants to interact with? I.e. if there are four spaces at the right margin, and if the user can put the cursor on those spaces, copy them etc., then those spaces are an object, the use can interact with. They are _there_. But is this what he or she wants? Or does she just want a left margin that is 4em wide? How do you distinguish a 4em margin from four spaces at the left? If somebody is seriously going to implement WP with a good large-scale battle plan, then please drop me a line. I'd might find a little time to contribute. Oliver -- 1 Messidor an 215 de la Révolution Liberté, Egalité, Fraternité!