From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: "Stephen J. Turnbull" <stephen@xemacs.org>
Newsgroups: gmane.emacs.devel
Subject: Re: X11 Compound Text vs ISO 2022
Date: Wed, 07 Jul 2010 09:36:41 +0900
Message-ID: <87r5jgnn52.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <m3zky4wpgw.fsf@carbon.jhcloos.org>
	<4C338FB2.3060900@harpegolden.net>
	<m3aaq4w8d7.fsf@carbon.jhcloos.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: dough.gmane.org 1278463339 26290 80.91.229.12 (7 Jul 2010 00:42:19 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Wed, 7 Jul 2010 00:42:19 +0000 (UTC)
Cc: emacs-devel@gnu.org, David De La Harpe Golden <david@harpegolden.net>
To: James Cloos <cloos@jhcloos.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Jul 07 02:42:16 2010
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1OWIiN-0008IL-Og
	for ged-emacs-devel@m.gmane.org; Wed, 07 Jul 2010 02:42:16 +0200
Original-Received: from localhost ([127.0.0.1]:55239 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1OWIiM-0000eC-OA
	for ged-emacs-devel@m.gmane.org; Tue, 06 Jul 2010 20:42:14 -0400
Original-Received: from [140.186.70.92] (port=54904 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OWIiE-0000e7-Sk
	for emacs-devel@gnu.org; Tue, 06 Jul 2010 20:42:07 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <stephen@xemacs.org>) id 1OWIiD-000369-Fv
	for emacs-devel@gnu.org; Tue, 06 Jul 2010 20:42:06 -0400
Original-Received: from mtps01.sk.tsukuba.ac.jp ([130.158.97.223]:49174)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <stephen@xemacs.org>) id 1OWIiC-00035j-SM
	for emacs-devel@gnu.org; Tue, 06 Jul 2010 20:42:05 -0400
Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp
	[130.158.99.156])
	by mtps01.sk.tsukuba.ac.jp (Postfix) with ESMTP id C56A21535A8;
	Wed,  7 Jul 2010 09:42:00 +0900 (JST)
Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000)
	id B4F421A291A; Wed,  7 Jul 2010 09:36:41 +0900 (JST)
In-Reply-To: <m3aaq4w8d7.fsf@carbon.jhcloos.org>
X-Mailer: VM 8.0.12-devo-585 under 21.5 (beta29) "garbanzo" 6466bc9ebf15
	XEmacs Lucid (x86_64-unknown-linux)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6, seldom 2.4 (older,
	4)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:126853
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/126853>

James Cloos writes:

 > I think utf8 is the only significant difference between the upstream
 > Xorg spec and the Xfree86 modification.  I vaguely recall the
 > discussions on the xfree86 list(s) when it was introduced (too many
 > years ago, [SIGH]).  The EWMH spec and the UTF8_STRING fromat came
 > about, in part, out of that discussion, IIRC.

As of about 2004, the XFree86 spec was totally bogus (internally
contradictory on the subject of encoding some ISO 8859 coded character
sets), and the XFree86 implementation ignored it anyway in many cases.

 > Emacs does need to limit what it is willing to encode in COMPOUND_TEXT,
 > and to use utf8-in-ctext for everything which is not in the 8859, GB,
 > JISX, KSC, CNS or BIG5 varients libX11 supports.  I'd go a bit further
 > and prefer utf8 over the CJK encodings for characters which are not
 > part of a CJK string.

But that goes against the spec, which AFAIK still provides that in
COMPOUND_TEXT the escape to non-ISO-2022 should only be used for
characters not in the repertoires of the registered charsets:

    Extended segments are not to be used for any character set
    encoding that can be constructed from a GL/GR pair of approved
    standard encodings. For example, it is incorrect to use an
    extended segment for any of the ISO 8859 family of encodings.

I would argue that you have two choices here: consider the whole
string to be Unicode, and used an extended segment for the whole
thing; or consider the string to be pieced together from segments in
approved standard encodings, in which case a character that can be
represented in those encodings should be.

BTW, for the case of MIDDLE DOT using JIS X 0213, the most recent spec
I could find on the web doesn't admit JIS X 0213 (or JIS X 0212 for
that matter).

 > The question, then, is how best to do that?

Wouldn't it be better to avoid use of COMPOUND_TEXT targets?  How many
apps prefer it to UTF8_STRING?  So, for example, when asked for
supported targets Emacs could list UTF8_STRING first.