From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.pretest.bugs,gmane.emacs.devel Subject: Re: UTF-8 paste from xterm picks Chinese charset Date: Tue, 06 Mar 2007 15:24:04 +0900 Message-ID: References: <87bqjkr60c.fsf@gmail.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1173162273 29321 80.91.229.12 (6 Mar 2007 06:24:33 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 6 Mar 2007 06:24:33 +0000 (UTC) Cc: emacs-pretest-bug@gnu.org, emacs-devel@gnu.org To: eb001@sanevision.com Original-X-From: emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org Tue Mar 06 07:24:26 2007 Return-path: Envelope-to: gebp-emacs-pretest-bug@gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1HOT62-00043P-C0 for gebp-emacs-pretest-bug@gmane.org; Tue, 06 Mar 2007 07:24:26 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1HOT62-0002Ph-TY for gebp-emacs-pretest-bug@gmane.org; Tue, 06 Mar 2007 01:24:26 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1HOT5t-0002MZ-FW for emacs-pretest-bug@gnu.org; Tue, 06 Mar 2007 01:24:17 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1HOT5s-0002KZ-9g for emacs-pretest-bug@gnu.org; Tue, 06 Mar 2007 01:24:16 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1HOT5s-0002KA-5Z; Tue, 06 Mar 2007 01:24:16 -0500 Original-Received: from mx1.aist.go.jp ([150.29.246.133]) by monty-python.gnu.org with esmtp (Exim 4.52) id 1HOT5r-0006Z0-A2; Tue, 06 Mar 2007 01:24:15 -0500 Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id l266OBrU011340; Tue, 6 Mar 2007 15:24:11 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp2.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id l266O9vd010029; Tue, 6 Mar 2007 15:24:09 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp2.aist.go.jp with ESMTP id l266O4aG003288; Tue, 6 Mar 2007 15:24:04 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken.m17n.org with local (Exim 4.63) (envelope-from ) id 1HOT5g-0004vU-Gd; Tue, 06 Mar 2007 15:24:04 +0900 In-reply-to: <87bqjkr60c.fsf@gmail.com> (message from Martins Krikis on Fri, 23 Feb 2007 22:08:51 +0200) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.95 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) X-detected-kernel: Solaris 8 (1) X-BeenThere: emacs-pretest-bug@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for CVS Emacs." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org Errors-To: emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.pretest.bugs:17388 gmane.emacs.devel:67406 Archived-At: Sorry for the late response on this matter. In article <87bqjkr60c.fsf@gmail.com>, Martins Krikis writes: > Upon testing the new Emacs behavior on Latvian characters encoded in UTF-= 8, > I noticed that pasting them out of Emacs and into, say, xterm works. How= ever, > pasting them back does not quite work---all the lowercase vowels with mac= rons > get understood as Chinese characters and lose their previous looks. These= are > the offending characters: "=C4=81=C4=93=C4=AB=C5=8D=C5=AB" (UTF-8 encodin= g 0xc481, 0xc493, 0xc4ab, > 0xc58d, 0xc5ab). Saving the text encodes them in UTF-8 again, so the dama= ge > is limited, but working with such text is still a torture. That is because your xterm (or X library) sends them encoded in Chinese (or Japanese) character when COMPOUND_TEXT is requested from Emacs. It itself is not a bug, but a bad feature. I remember that some version of xterm (or X library) uses "UTF-8 extended segments" to embded Unicode characters in COMPOUND_TEXT in such a case. But it seems that that is not true in their latest versions. :-( Anyway, I've just improved the function x-select-utf8-or-ctext to prefer UTF-8 in such a case. Please try with the latest CVS code. > I tried setting the coding-system for X selection to > utf-8, but then pasting produces complete gibberish. (And > I'd say that's a different bug!) Changing language > environments does not seem to have any effect on either of > these bugs (tried Latvian, English, UTF-8). It's not a bug. Setting selection-coding-system just changes a way how to decode a selection data, it doesn't change which data-type (UTF8_STRING, COMPOUND_TEXT, or just STRING) to request. The latter is controlled by the variable x-select-request-type. I've just added more words in the documentation of selection-coding-system. > I've turned the utf-translate-cjk-mode off but this does not > improve things, contrary to the very promising sounding help-text about i= t. > (Not a word about it in info pages, BTW, that's another wishlist item.) Which part makes you think so? It also doesn't affect which data-type to request. Anyway, it's bad that utf-translate-cjk-mode is not in Info. Could someone put it in Info? I'm not good at writing Info. --- Kenichi Handa handa@m17n.org