From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.pretest.bugs,gmane.emacs.devel
Subject: Re: UTF-8 paste from xterm picks Chinese charset
Date: Tue, 06 Mar 2007 15:24:04 +0900
Message-ID: <E1HOT5g-0004vU-Gd@etlken.m17n.org>
References: <87bqjkr60c.fsf@gmail.com>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya")
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Trace: sea.gmane.org 1173162273 29321 80.91.229.12 (6 Mar 2007 06:24:33 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Tue, 6 Mar 2007 06:24:33 +0000 (UTC)
Cc: emacs-pretest-bug@gnu.org, emacs-devel@gnu.org
To: eb001@sanevision.com
Original-X-From: emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org Tue Mar 06 07:24:26 2007
Return-path: <emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org>
Envelope-to: gebp-emacs-pretest-bug@gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1HOT62-00043P-C0
	for gebp-emacs-pretest-bug@gmane.org; Tue, 06 Mar 2007 07:24:26 +0100
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1HOT62-0002Ph-TY
	for gebp-emacs-pretest-bug@gmane.org; Tue, 06 Mar 2007 01:24:26 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1HOT5t-0002MZ-FW
	for emacs-pretest-bug@gnu.org; Tue, 06 Mar 2007 01:24:17 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1HOT5s-0002KZ-9g
	for emacs-pretest-bug@gnu.org; Tue, 06 Mar 2007 01:24:16 -0500
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1HOT5s-0002KA-5Z; Tue, 06 Mar 2007 01:24:16 -0500
Original-Received: from mx1.aist.go.jp ([150.29.246.133])
	by monty-python.gnu.org with esmtp (Exim 4.52)
	id 1HOT5r-0006Z0-A2; Tue, 06 Mar 2007 01:24:15 -0500
Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123])
	by mx1.aist.go.jp  with ESMTP id l266OBrU011340;
	Tue, 6 Mar 2007 15:24:11 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from smtp2.aist.go.jp
	by rqsmtp2.aist.go.jp  with ESMTP id l266O9vd010029;
	Tue, 6 Mar 2007 15:24:09 +0900 (JST) env-from (handa@m17n.org)
Original-Received: by smtp2.aist.go.jp  with ESMTP id l266O4aG003288;
	Tue, 6 Mar 2007 15:24:04 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from handa by etlken.m17n.org with local (Exim 4.63)
	(envelope-from <handa@m17n.org>)
	id 1HOT5g-0004vU-Gd; Tue, 06 Mar 2007 15:24:04 +0900
In-reply-to: <87bqjkr60c.fsf@gmail.com> (message from Martins Krikis on Fri,
	23 Feb 2007 22:08:51 +0200)
User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2
	Emacs/22.0.95 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI)
X-detected-kernel: Solaris 8 (1)
X-BeenThere: emacs-pretest-bug@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Bug reports for CVS Emacs." <emacs-pretest-bug.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug>,
	<mailto:emacs-pretest-bug-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-pretest-bug>
List-Post: <mailto:emacs-pretest-bug@gnu.org>
List-Help: <mailto:emacs-pretest-bug-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug>,
	<mailto:emacs-pretest-bug-request@gnu.org?subject=subscribe>
Original-Sender: emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org
Errors-To: emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.pretest.bugs:17388 gmane.emacs.devel:67406
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/67406>

Sorry for the late response on this matter.

In article <87bqjkr60c.fsf@gmail.com>, Martins Krikis <eb001@sanevision.com=
> writes:

> Upon testing the new Emacs behavior on Latvian characters encoded in UTF-=
8,
> I noticed that pasting them out of Emacs and into, say, xterm works.  How=
ever,
> pasting them back does not quite work---all the lowercase vowels with mac=
rons
> get understood as Chinese characters and lose their previous looks. These=
 are
> the offending characters: "=C4=81=C4=93=C4=AB=C5=8D=C5=AB" (UTF-8 encodin=
g 0xc481, 0xc493, 0xc4ab,
> 0xc58d, 0xc5ab). Saving the text encodes them in UTF-8 again, so the dama=
ge
> is limited, but working with such text is still a torture.

That is because your xterm (or X library) sends them encoded
in Chinese (or Japanese) character when COMPOUND_TEXT is
requested from Emacs.  It itself is not a bug, but a bad
feature.  I remember that some version of xterm (or X
library) uses "UTF-8 extended segments" to embded Unicode
characters in COMPOUND_TEXT in such a case.  But it seems
that that is not true in their latest versions.  :-(

Anyway, I've just improved the function
x-select-utf8-or-ctext to prefer UTF-8 in such a case.
Please try with the latest CVS code.

> I tried setting the coding-system for X selection to
> utf-8, but then pasting produces complete gibberish. (And
> I'd say that's a different bug!) Changing language
> environments does not seem to have any effect on either of
> these bugs (tried Latvian, English, UTF-8).

It's not a bug.  Setting selection-coding-system just
changes a way how to decode a selection data, it doesn't
change which data-type (UTF8_STRING, COMPOUND_TEXT, or just
STRING) to request.  The latter is controlled by the
variable x-select-request-type.  I've just added more words
in the documentation of selection-coding-system.

> I've turned the utf-translate-cjk-mode off but this does not
> improve things, contrary to the very promising sounding help-text about i=
t.
> (Not a word about it in info pages, BTW, that's another wishlist item.)

Which part makes you think so?  It also doesn't affect which
data-type to request.  Anyway, it's bad that
utf-translate-cjk-mode is not in Info.  Could someone put it
in Info?  I'm not good at writing Info.

---
Kenichi Handa
handa@m17n.org