From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Charles Muller Newsgroups: gmane.emacs.help Subject: Re: How to make emacs auto-recognize utf-8 encoded files upon visiting Date: Fri, 27 Sep 2002 23:10:07 +0900 (JST) Sender: help-gnu-emacs-admin@gnu.org Message-ID: <20020927.231007.59465175.acmuller@gol.com> References: NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: main.gmane.org 1033135761 3128 127.0.0.1 (27 Sep 2002 14:09:21 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Fri, 27 Sep 2002 14:09:21 +0000 (UTC) Cc: help-gnu-emacs@gnu.org Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17uvny-0000o8-00 for ; Fri, 27 Sep 2002 16:09:18 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17uvo8-0008Mw-00; Fri, 27 Sep 2002 10:09:28 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 17uvmi-0007bj-00 for help-gnu-emacs@gnu.org; Fri, 27 Sep 2002 10:08:00 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 17uvmf-0007Xv-00 for help-gnu-emacs@gnu.org; Fri, 27 Sep 2002 10:07:59 -0400 Original-Received: from smtp02.fields.gol.com ([203.216.5.132]) by monty-python.gnu.org with esmtp (Exim 4.10) id 17uvme-0007WR-00 for help-gnu-emacs@gnu.org; Fri, 27 Sep 2002 10:07:56 -0400 Original-Received: from 203-216-96-056.dsl.gol.ne.jp ([203.216.96.56] helo=localhost) by smtp02.fields.gol.com with esmtp (Magnetic Fields) id 17uvmT-0000Uk-00; Fri, 27 Sep 2002 23:07:45 +0900 Original-To: Kai.Grossjohann@CS.Uni-Dortmund.DE Original-Newsgroups: gnu.emacs.help In-Reply-To: X-Mailer: Mew version 2.2 on Emacs 21.2 / Mule 5.0 (SAKAKI) X-Abuse-Complaints: abuse@gol.com Errors-To: help-gnu-emacs-admin@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.help:1938 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:1938 Kai wrote: > Maybe it is sufficient to install Mule-UCS? I guess that TEI Emacs > is Emacs with Mule-UCS pre-installed (plus some other packages). TEI-Emacs does install Mule-UCS, but the reason for its ability to do what it does must be more than that, because I always install Mule-UCS with my Emacs, and they never render CJK fonts in Unicode until I install the TEI package. Since all of my internet publication and data compilation has to be done in Unicode, that's always the first thing I check. But I don't know enough about Lisp programming to tell you exactly *what* the TEI package does to get this working. No doubt Sebastian Rahtz, Christian Wittern, and some of the others who wrote the package would be happy to tell you what the key routines are. The first priority of the TEI people is to make sure that the SGML/XML/HTML modes are working more precisely and comprehensively than in the standard Emacs package, since the target audience is mainly humanities scholars who are using TEI-XML to mark up literary texts. For example, the way the PSGML is set up in the standard package, it is hard to get it to determine the difference between XML and SGML. They have also added a whole array of DTD's for various purposes, including distinctions in XHTML/strict/transitional. There is also an XSLT mode added that allows for adjustments and debugging. Then, on top of that, because there are so many of us working with mixed international scripts (including CJK), apparently someone decided to figure out how to get all the fonts properly recognized. I am guessing that part of the problem facing the standard installation of Emacs is that with any other traditional encoding outside of Unicode, such as Big5, JIS, or KSC, you always have at least one full font set that is traditionally mapped to the encoding. With Unicode, I don't know of a font that is designed to work readily in Linux/Emacs, that covers all codepoints (the way MS Arial Unicode does in Windows, for example). So a function needs to be added which goes through the document and properly plugs in fonts for each given codepoint. I am not well-enough versed at the technical end to be able to explain how they have accomplished this over in Oxford. I noticed a good bit of negative reaction toward TEI-Emacs when I first mentioned it, where people expressed alarm about the TEI people not caring about the about the GPL and not reporting to the GNU development team. I think that these concerns come as a result of people not really checking into what the package is, and what it does. It is not a new version of Emacs, such as XEmacs. It is simply an add-on, that contains mode enhancements, and some of its own new modes--just the way people are accustomed to adding on calendar modes, e-mail packages, or whatever. I know many of the dedicated people in the Text Encoding Initiative very well, and there is not a bunch around who are more concerned about free software and donating code. But when you really need to get a certain type of application set up for a certain use, I don't think you can just write to the GNU development team and then wait for a future version for it to be implemented. And after all, the Lisp code for TEI-Emacs is just as openly available as any other development based on Emacs. And they have also made a concerted effort to have their add-on support GNU Emacs, rather than XEmacs, so I really see it as a very positive development that should be learned from, rather than disparaged. Chuck --------------------------- Charles Muller Faculty of Humanities, Toyo Gakuen University Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net] Mobile Phone: 090-9310-1787