From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: X11 Compound Text vs ISO 2022 Date: Wed, 07 Jul 2010 09:36:41 +0900 Message-ID: <87r5jgnn52.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4C338FB2.3060900@harpegolden.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1278463339 26290 80.91.229.12 (7 Jul 2010 00:42:19 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 7 Jul 2010 00:42:19 +0000 (UTC) Cc: emacs-devel@gnu.org, David De La Harpe Golden To: James Cloos Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Jul 07 02:42:16 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OWIiN-0008IL-Og for ged-emacs-devel@m.gmane.org; Wed, 07 Jul 2010 02:42:16 +0200 Original-Received: from localhost ([127.0.0.1]:55239 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OWIiM-0000eC-OA for ged-emacs-devel@m.gmane.org; Tue, 06 Jul 2010 20:42:14 -0400 Original-Received: from [140.186.70.92] (port=54904 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OWIiE-0000e7-Sk for emacs-devel@gnu.org; Tue, 06 Jul 2010 20:42:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OWIiD-000369-Fv for emacs-devel@gnu.org; Tue, 06 Jul 2010 20:42:06 -0400 Original-Received: from mtps01.sk.tsukuba.ac.jp ([130.158.97.223]:49174) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OWIiC-00035j-SM for emacs-devel@gnu.org; Tue, 06 Jul 2010 20:42:05 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mtps01.sk.tsukuba.ac.jp (Postfix) with ESMTP id C56A21535A8; Wed, 7 Jul 2010 09:42:00 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id B4F421A291A; Wed, 7 Jul 2010 09:36:41 +0900 (JST) In-Reply-To: X-Mailer: VM 8.0.12-devo-585 under 21.5 (beta29) "garbanzo" 6466bc9ebf15 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:126853 Archived-At: James Cloos writes: > I think utf8 is the only significant difference between the upstream > Xorg spec and the Xfree86 modification. I vaguely recall the > discussions on the xfree86 list(s) when it was introduced (too many > years ago, [SIGH]). The EWMH spec and the UTF8_STRING fromat came > about, in part, out of that discussion, IIRC. As of about 2004, the XFree86 spec was totally bogus (internally contradictory on the subject of encoding some ISO 8859 coded character sets), and the XFree86 implementation ignored it anyway in many cases. > Emacs does need to limit what it is willing to encode in COMPOUND_TEXT, > and to use utf8-in-ctext for everything which is not in the 8859, GB, > JISX, KSC, CNS or BIG5 varients libX11 supports. I'd go a bit further > and prefer utf8 over the CJK encodings for characters which are not > part of a CJK string. But that goes against the spec, which AFAIK still provides that in COMPOUND_TEXT the escape to non-ISO-2022 should only be used for characters not in the repertoires of the registered charsets: Extended segments are not to be used for any character set encoding that can be constructed from a GL/GR pair of approved standard encodings. For example, it is incorrect to use an extended segment for any of the ISO 8859 family of encodings. I would argue that you have two choices here: consider the whole string to be Unicode, and used an extended segment for the whole thing; or consider the string to be pieced together from segments in approved standard encodings, in which case a character that can be represented in those encodings should be. BTW, for the case of MIDDLE DOT using JIS X 0213, the most recent spec I could find on the web doesn't admit JIS X 0213 (or JIS X 0212 for that matter). > The question, then, is how best to do that? Wouldn't it be better to avoid use of COMPOUND_TEXT targets? How many apps prefer it to UTF8_STRING? So, for example, when asked for supported targets Emacs could list UTF8_STRING first.