From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: X11 Compound Text vs ISO 2022
Date: Thu, 29 Jul 2010 21:36:31 +0900
Message-ID: <tl7wrsesbw0.fsf@m17n.org>
References: <m3zky4wpgw.fsf@carbon.jhcloos.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Trace: dough.gmane.org 1280407018 17204 80.91.229.12 (29 Jul 2010 12:36:58 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Thu, 29 Jul 2010 12:36:58 +0000 (UTC)
Cc: david@harpegolden.net, emacs-devel@gnu.org
To: James Cloos <cloos@jhcloos.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jul 29 14:36:54 2010
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1OeSM1-0005sK-Ih
	for ged-emacs-devel@m.gmane.org; Thu, 29 Jul 2010 14:36:54 +0200
Original-Received: from localhost ([127.0.0.1]:46649 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1OeSM0-0006ha-Kr
	for ged-emacs-devel@m.gmane.org; Thu, 29 Jul 2010 08:36:52 -0400
Original-Received: from [140.186.70.92] (port=52116 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OeSLs-0006fX-3c
	for emacs-devel@gnu.org; Thu, 29 Jul 2010 08:36:47 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <handa@m17n.org>) id 1OeSLm-0007r7-Nz
	for emacs-devel@gnu.org; Thu, 29 Jul 2010 08:36:43 -0400
Original-Received: from mx1.aist.go.jp ([150.29.246.133]:64206)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <handa@m17n.org>) id 1OeSLm-0007pd-6U
	for emacs-devel@gnu.org; Thu, 29 Jul 2010 08:36:38 -0400
Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123])
	by mx1.aist.go.jp  with ESMTP id o6TCaW0Y002922;
	Thu, 29 Jul 2010 21:36:32 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from smtp4.aist.go.jp
	by rqsmtp2.aist.go.jp  with ESMTP id o6TCaWht012656;
	Thu, 29 Jul 2010 21:36:32 +0900 (JST) env-from (handa@m17n.org)
Original-Received: by smtp4.aist.go.jp  with ESMTP id o6TCaV6d020485;
	Thu, 29 Jul 2010 21:36:31 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from handa by etlken with local (Exim 4.71)
	(envelope-from <handa@m17n.org>)
	id 1OeSLf-0002lF-EX; Thu, 29 Jul 2010 21:36:31 +0900
In-Reply-To: <m3zky4wpgw.fsf@carbon.jhcloos.org> (message from James Cloos on
	Tue, 06 Jul 2010 12:21:11 -0400)
X-detected-operating-system: by eggs.gnu.org: Solaris 9
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:127959
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/127959>

Very sorry for the late response on this matter.

In article <m3zky4wpgw.fsf@carbon.jhcloos.org>, James Cloos <cloos@jhcloos.=
com> writes:

> While testing my recently applied patch, I've discovered that Emacs will
> product ISO-2022 output for COMPOUND_TEXT which other libs and apps --
> notably including libX11 -- cannot decode.

> As an example, (encode-coding-string "=E2=80=A2" 'compound-text) ; U+2022=
 BULLET
> produces "^[$(O#@^[(B".  '$(O' is ISO-IR 228=C2=B9, JIS X 2013:2000.  But
> libX11 only knows about the $( charsets:  0, 1, A-D and G-M.

> A number of characters are output in '^[$-1'; such as:

> (encode-coding-string "=E2=84=9C" 'compound-text) ; U+211C BLACK-LETTER C=
APITAL R
> "^[$-1\365\334^[-A"
> (encode-coding-string "=CA=BB" 'compound-text) ; U+02BB MODIFIER LETTER T=
URNED COMMA
> "^[$-1\244\333^[-A"

> That is encoded in mule-unicode-0100-24ff, essentially unknown outside
> Emacs.

I admit that those behaviour is not good now.  When I at
first implemented ctext in Emacs, there wasn't UTF8_STRING
nor CTEXT_with_UTF8_extended_segment.  So, I added more
character sets to it for cut&paste between two running
Emacses.  As Emacs was the only application that supported
many character sets at that time, no one complained about
that behaviour of ctext.  The other applications anyway
couldn't handle those many characters.

> Other libs/apps prefer to use utf-8=C2=B3 in compound_text for such chars.

> I understand *why* this happens, given that Emacs used to use 2022
> internally, but it confuses other X11 apps.

Actually the latest Emacs (Emacs 23 and the later) uses
unicode internally.

> I am not fully fluent in Emacs' internal charset conversion routines;
> is there an easy way to tell it to limit which 2022 charsets it will
> use when converting a string into a 2022 encoding?  A better way?

It's fairly easy to limit charsets of ctext.  But, I care
the backward compatibility.  As ctext is the only coding
system that is compatible with iso-8859-1 and can encode
many other character sets, there will be old users who still
uses it for file/process encodings.

And, anyway ctext is not used for selection, I'd rather just
document that ctext is not fully compatible X's
COMPOUND_TEXT spec, but is the extended vesion.

For WM_NAME, etc, yes, we should use ctext-with-extensions,
and as ctext-with-extensions is not intended to be used
directly by users, I think it won't cause actual problems
even if we change it so that more characters are encoded
using UTF8-extended-segment.  So, I'll work on it soon.

The only problem with ctext-with-extensions is that it is
now implemented by Elisp, and thus it may cause GC.  I'm not
sure it is safe to call Lisp at the place we convert WM_NAME
etc.  If it is not safe, I'll implement
ctext-with-extensions in C.

---
Kenichi Handa
handa@m17n.org