From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Implementing charset-aware X font names [was: Cyrillic vs UTF-8] Date: Sat, 26 Apr 2003 12:55:49 +0900 Organization: The XEmacs Project Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <87ist17vzu.fsf_-_@tleepslib.sk.tsukuba.ac.jp> References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1051329424 1248 80.91.224.249 (26 Apr 2003 03:57:04 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sat, 26 Apr 2003 03:57:04 +0000 (UTC) Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Sat Apr 26 05:57:03 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 199GoB-0000Jz-00 for ; Sat, 26 Apr 2003 05:57:03 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 199GvC-0001Yv-00 for ; Sat, 26 Apr 2003 06:04:18 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 199Go7-0003PK-06 for emacs-devel@quimby.gnus.org; Fri, 25 Apr 2003 23:56:59 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 199Gnn-0003Oy-00 for emacs-devel@gnu.org; Fri, 25 Apr 2003 23:56:39 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 199Gnm-0003Oh-00 for emacs-devel@gnu.org; Fri, 25 Apr 2003 23:56:39 -0400 Original-Received: from tleepslib.sk.tsukuba.ac.jp ([130.158.98.109]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 199Gnl-0003OM-00 for emacs-devel@gnu.org; Fri, 25 Apr 2003 23:56:37 -0400 Original-Received: from steve by tleepslib.sk.tsukuba.ac.jp with local (Exim 3.36 #1 (Debian)) id 199Gn0-00087b-00 for ; Sat, 26 Apr 2003 12:55:50 +0900 Original-To: emacs-devel@gnu.org In-Reply-To: (Simon Josefsson's message of "Fri, 25 Apr 2003 18:54:21 +0200") User-Agent: Gnus/5.090016 (Oort Gnus v0.16) XEmacs/21.5 (cabbage) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:13463 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:13463 >>>>> "Simon" == Simon Josefsson writes: PROBLEMS> * Characters from the mule-unicode charsets aren't PROBLEMS> displayed under X. PROBLEMS> XFree86 4 contains many fonts in iso10646-1 encoding PROBLEMS> which have minimal character repertoires (whereas the PROBLEMS> encoding is meant to be a reasonable indication of the PROBLEMS> repertoire). *sigh* "iso10646" is not meant to be an indication of repertoire. See section 13 of the ISO 10646 standard. It's intended to fix the ISO 8859 ambiguity. There is a deficiency in XFree86, but it's not that the fonts are incomplete (note the word "implicit" in the XLFD standard, that refers to current national encoding practice at definition time, not to UCSes); that's gonna happen. Why should a Russian font designer provide Thai glyphs? And what Thai in her right mind would prefer those over native-designed fonts (without looking at them)? Instead, the font names and properties should provide encoding range specifications instead of the useless "1" (which in ISO 10646-1 is not an encoding specification, really). As a first take, I think a reasonable way to do this would be to specify that for the iso10646 registry the encoding field of an XLFD name should contain a comma-separated list of Unicode block names, or a comma-separated list of hex ranges xxxx..yyyy (can't use hyphens for the ranges, obviously). As long as the XLFD is otherwise fully-qualified (ie, contains 14 hyphens), the block name format allows you to query with "-*-*-*-*-*-*-*-*-*-*-*-*-iso10646-*CYRILLIC*" and guarantee sane results. Mostly "*-iso10646-*CYRILLIC*" should work OK, too. With the hex range format, the app has to work harder, querying with "-*-*-*-*-*-*-*-*-*-*-*-*-iso10646-*" and checking for the ranges it needs. IIRC, since the actual font loaded is known to the server, you could even have multiple such aliases, one for each block, and with languages using multiple blocks (basically, all of them, since everybody uses ASCII), you'd just want to be careful to query for the "rare" blocks first. This would also allow Emacs and other smart apps to create virtual fonts (ie, in faces) by requesting Ryumin Light for the Han and Kana blocks and Times-Roman for the Basic Latin and Latin-1 Supplement blocks, as an alternative to X Font Sets. (This would be nearly trivial to implement in XEmacs since we use specifiers to implement faces, and specifiers already do magic to connect charsets to font registries. I suppose it would be more work in GNU Emacs, but I haven't looked at Emacs's font set code.) Does this look like something reasonable for Emacs (and XEmacs) to implement on the client side? If so, I'll play with it a bit (note that implementing this server-side is simply a matter of editing fonts.aliases) and then put it in play with the X11 and XFree86 people. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.