From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Martins Krikis <eb001@sanevision.com>
Newsgroups: gmane.emacs.pretest.bugs,gmane.emacs.devel
Subject: Re: UTF-8 paste from xterm picks Chinese charset
Date: Tue, 27 Mar 2007 18:32:30 +0300
Message-ID: <87y7livh0x.fsf@gmail.com>
Reply-To: Martins Krikis <eb001@sanevision.com>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=gb2312
Content-Transfer-Encoding: quoted-printable
X-Trace: sea.gmane.org 1175009568 10308 80.91.229.12 (27 Mar 2007 15:32:48 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Tue, 27 Mar 2007 15:32:48 +0000 (UTC)
Cc: emacs-pretest-bug@gnu.org, eb001@sanevision.com, emacs-devel@gnu.org
To: Kenichi Handa <handa@m17n.org>
Original-X-From: emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org Tue Mar 27 17:32:39 2007
Return-path: <emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org>
Envelope-to: gebp-emacs-pretest-bug@gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1HWDf3-0006Py-EC
	for gebp-emacs-pretest-bug@gmane.org; Tue, 27 Mar 2007 17:32:37 +0200
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1HWDhP-0006me-S9
	for gebp-emacs-pretest-bug@gmane.org; Tue, 27 Mar 2007 10:35:03 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1HWDhG-0006mM-52
	for emacs-pretest-bug@gnu.org; Tue, 27 Mar 2007 11:34:54 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1HWDhE-0006mA-Nb
	for emacs-pretest-bug@gnu.org; Tue, 27 Mar 2007 11:34:52 -0400
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1HWDhC-0006lf-89; Tue, 27 Mar 2007 10:34:50 -0500
Original-Received: from smtp1.apollo.lv ([80.232.168.211])
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <mkrikis@gmail.com>)
	id 1HWDeo-0004eM-Is; Tue, 27 Mar 2007 11:32:23 -0400
X-Cloudmark-Score: 0.000000 []
X-Virusscan: Clamd
Original-Received: from [195.13.206.114] (HELO mkbox)
	by smtp1.apollo.lv (CommuniGate Pro SMTP 5.0.10)
	with ESMTP id 193052503; Tue, 27 Mar 2007 18:32:20 +0300
X-detected-kernel: Linux 2.6 (newer, 1)
X-BeenThere: emacs-pretest-bug@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Bug reports for CVS Emacs." <emacs-pretest-bug.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug>,
	<mailto:emacs-pretest-bug-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-pretest-bug>
List-Post: <mailto:emacs-pretest-bug@gnu.org>
List-Help: <mailto:emacs-pretest-bug-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug>,
	<mailto:emacs-pretest-bug-request@gnu.org?subject=subscribe>
Original-Sender: emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org
Errors-To: emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.pretest.bugs:17810 gmane.emacs.devel:68670
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/68670>

Sorry for responding so late---had no time to test a newer version.

Now I have done it (22.0.96.1) and must say that without additional
steps it still behaves as before (badly). If I do set the request type
explicitly, then it works fine now. A few more notes below.

>> Upon testing the new Emacs behavior on Latvian characters encoded in UTF=
-8,
>> I noticed that pasting them out of Emacs and into, say, xterm works.  Ho=
wever,
>> pasting them back does not quite work---all the lowercase vowels with ma=
crons
>> get understood as Chinese characters and lose their previous looks. Thes=
e are
>> the offending characters: "=A8=A1=A8=A5=A8=A9=A8=AD=A8=B1" (UTF-8 encodi=
ng 0xc481, 0xc493, 0xc4ab,
>> 0xc58d, 0xc5ab). Saving the text encodes them in UTF-8 again, so the dam=
age
>> is limited, but working with such text is still a torture.
>
> That is because your xterm (or X library) sends them encoded
> in Chinese (or Japanese) character when COMPOUND_TEXT is
> requested from Emacs.  It itself is not a bug, but a bad
> feature.  I remember that some version of xterm (or X
> library) uses "UTF-8 extended segments" to embded Unicode
> characters in COMPOUND_TEXT in such a case.  But it seems
> that that is not true in their latest versions.  :-(

I had no clue that there are different ways of requesting text
from an already made selection. Is there a reason that emacs
uses COMPOUND_TEXT when other X applications must be using
some other way because pasting into them seems to work?

> Anyway, I've just improved the function
> x-select-utf8-or-ctext to prefer UTF-8 in such a case.
> Please try with the latest CVS code.

I can't figure out whether I need to do something explicitly
with this function, but the paste test I made didn't seem to
work (unless I change the request type).

>> I tried setting the coding-system for X selection to
>> utf-8, but then pasting produces complete gibberish. (And
>> I'd say that's a different bug!) Changing language
>> environments does not seem to have any effect on either of
>> these bugs (tried Latvian, English, UTF-8).
>
> It's not a bug.  Setting selection-coding-system just
> changes a way how to decode a selection data, it doesn't
> change which data-type (UTF8_STRING, COMPOUND_TEXT, or just
> STRING) to request.  The latter is controlled by the
> variable x-select-request-type.  I've just added more words
> in the documentation of selection-coding-system.

OK, so it seems I should try setting x-select-request-type explicitly...
When I set it to UTF8_STRING, then pasting back into emacs works
indeed. So the obvious questions are, why isn't that the default?
What am I sacrificing if I set it explicitly and thus lose the
ability to request COMPOUND_TEXT?

>> I've turned the utf-translate-cjk-mode off but this does not
>> improve things, contrary to the very promising sounding help-text about =
it.
>> (Not a word about it in info pages, BTW, that's another wishlist item.)
>
> Which part makes you think so?  It also doesn't affect which
> data-type to request.  Anyway, it's bad that
> utf-translate-cjk-mode is not in Info.  Could someone put it
> in Info?  I'm not good at writing Info.

Well, not knowing that the problem of converting Latvian characters to
Chinese by Emacs is due to it using COMPOUND_TEXT for the X selection
and due to X qualifying the selection with something that makes it
be more Chinese than I could have imagined, I was literally reading
the help text for this function and believing that Emacs was looking
at UTF8 text but just choosing to convert some characters to this
Chineses charset they also belonged to...

My point is, without having intimate knowledge of X and Emacs internals,
it's hard to pinpoint why I can paste UTF8 text between most X apps,
but not into Emacs. The help text for this function seemed to provide
an explanation, but upon acting on it, I didn't achieve the result
I hoped for and therefore I filed a bug.=20
And if this function is mentioned in info, I haven't been able to find it.
Of course, I personally don't need it anymore.


Anyway, thanks very much for your help. You have provided one way to
make Emacs behave as I expected it should (by changing the request
method) and you have made me realize that I need to seriously read
up on the various methods X uses for passing the copy/paste text
around. Unfortunately, I am too busy right now for studying this.

Best regards,

  Martins Krikis

=20