From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: James Cloos Newsgroups: gmane.emacs.devel Subject: Re: X11 Compound Text vs ISO 2022 Date: Tue, 06 Jul 2010 18:30:36 -0400 Message-ID: References: <4C338FB2.3060900@harpegolden.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1278455489 7241 80.91.229.12 (6 Jul 2010 22:31:29 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 6 Jul 2010 22:31:29 +0000 (UTC) Cc: emacs-devel@gnu.org To: David De La Harpe Golden Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Jul 07 00:31:27 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OWGfl-0001oz-Kv for ged-emacs-devel@m.gmane.org; Wed, 07 Jul 2010 00:31:25 +0200 Original-Received: from localhost ([127.0.0.1]:45609 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OWGfk-0001oe-Na for ged-emacs-devel@m.gmane.org; Tue, 06 Jul 2010 18:31:24 -0400 Original-Received: from [140.186.70.92] (port=51640 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OWGfd-0001nW-L4 for emacs-devel@gnu.org; Tue, 06 Jul 2010 18:31:18 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OWGfc-0008Tl-AS for emacs-devel@gnu.org; Tue, 06 Jul 2010 18:31:17 -0400 Original-Received: from eagle.jhcloos.com ([207.210.242.212]:33059) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OWGfc-0008Tb-2Z for emacs-devel@gnu.org; Tue, 06 Jul 2010 18:31:16 -0400 Original-Received: by eagle.jhcloos.com (Postfix, from userid 10) id 3ABC04016D; Tue, 6 Jul 2010 22:30:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jhcloos.com; s=eagle; t=1278455475; bh=SpSp8jPlDhbaEfzHg8fVLJXpNqEniWB8hYMizpyuog8=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=qpi3t2oqFSShtuS+CLKZJwsPayG3vr/Vq8qjcbQysUFvV/qGxUPjejlbOg4v1dIX9 3IxGMHKKQCIS3Gc/lws+j8E3YF1HaZBYrlidXE/fINECwYKVcV5SpTLVKHTHmkaI1Y feNS0N+2uB1a+EOTKllJsYhr+PDrrWOdNJ5MMkFw= Original-Received: from carbon.jhcloos.org (localhost [127.0.0.1]) by carbon.jhcloos.org (Postfix) with ESMTP id 3764B1C81FD; Tue, 6 Jul 2010 22:30:37 +0000 (UTC) In-Reply-To: <4C338FB2.3060900@harpegolden.net> (David De La Harpe Golden's message of "Tue, 06 Jul 2010 21:18:58 +0100") User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux) Face: iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAABHNCSVQICAgIfAhkiAAAAI1J REFUOE+lU9ESgCAIg64P1y+ngUdxhl5H8wFbbM0OmUiEhKkCYaZThXCo6KE5sCbA1DDX3genvO4d eBQgEMaM5qy6uWk4SfBYfdu9jvBN9nSVDOKRtwb+I3epboOsOX5pZbJNsBJFvmQQ05YMfieIBnYX FK2N6dOawd97r/e8RjkTLzmMsiVgrAoEugtviCM3v2WzjgAAAABJRU5ErkJggg== Copyright: Copyright 2009 James Cloos OpenPGP: ED7DAEA6; url=http://jhcloos.com/public_key/0xED7DAEA6.asc OpenPGP-Fingerprint: E9E9 F828 61A4 6EA9 0F2B 63E7 997A 9F17 ED7D AEA6 Original-Lines: 49 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:126851 Archived-At: >>>>> "DDLHG" == David De La Harpe Golden writes: DDLHG> But anyway, if emacs isn't using one of the character sets listed in DDLHG> the table in sect. 4/5 of "the" spec [1] or utf-8 as per sect.7, DDLHG> presumably it's an emacs bug unless emacs has successfully "registered DDLHG> the encoding with the X consortium" as per sect. 6 (and I don't see DDLHG> that happening...). Exactly. Xorg libX11 supports the what is in that spec (including the utf8 which was first added by XFree86, but was not added to the upstream spec), a couple of other charsets "for compatability with Xfree86 3.1" and two sets which are "used by Emacs, but not backed by ISO-IR". Xorg's luid app has its own 2022 encoder/decoder which supports a couple of additional charsets, such as "DEC Special", "DEC Technical", four KOI8 variations, cp125[012], cp437, cp850 and cp866. But it does not use those for COMPOUND_TEXT, only as its internal encoding, much like Emacs used to do. DDLHG> Conversely, if emacs is sending a charset that IS listed in the table DDLHG> in sect. 4/5 or utf-8 as per sect. 7, then libX11 and other apps are DDLHG> "at fault" if they don't recognise them. Emacs sends as COMPOUND_TEXT a 2022 encoding which appears to be exactly what it used to use internally, rather than keeping to the ctext spec. DDLHG> But err... the spec on freedesktop.org seems a lot older, not even DDLHG> mentioning utf-8 ??? I think utf8 is the only significant difference between the upstream Xorg spec and the Xfree86 modification. I vaguely recall the discussions on the xfree86 list(s) when it was introduced (too many years ago, [SIGH]). The EWMH spec and the UTF8_STRING fromat came about, in part, out of that discussion, IIRC. Emacs does need to limit what it is willing to encode in COMPOUND_TEXT, and to use utf8-in-ctext for everything which is not in the 8859, GB, JISX, KSC, CNS or BIG5 varients libX11 supports. I'd go a bit further and prefer utf8 over the CJK encodings for characters which are not part of a CJK string. (As an example, Emacs uses japanese-jisx0213-1 for U+2022 MIDDLE DOT; it would be better to use utf-8 unless the MIDDLE DOT is in a string which was entered via the Japanese input method, or LANG is ja_JA, or something of that sort.) The question, then, is how best to do that? -JimC -- James Cloos OpenPGP: 1024D/ED7DAEA6