From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Martins Krikis Newsgroups: gmane.emacs.pretest.bugs,gmane.emacs.devel Subject: Re: UTF-8 paste from xterm picks Chinese charset Date: Tue, 27 Mar 2007 18:32:30 +0300 Message-ID: <87y7livh0x.fsf@gmail.com> Reply-To: Martins Krikis NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=gb2312 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1175009568 10308 80.91.229.12 (27 Mar 2007 15:32:48 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 27 Mar 2007 15:32:48 +0000 (UTC) Cc: emacs-pretest-bug@gnu.org, eb001@sanevision.com, emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org Tue Mar 27 17:32:39 2007 Return-path: Envelope-to: gebp-emacs-pretest-bug@gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1HWDf3-0006Py-EC for gebp-emacs-pretest-bug@gmane.org; Tue, 27 Mar 2007 17:32:37 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1HWDhP-0006me-S9 for gebp-emacs-pretest-bug@gmane.org; Tue, 27 Mar 2007 10:35:03 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1HWDhG-0006mM-52 for emacs-pretest-bug@gnu.org; Tue, 27 Mar 2007 11:34:54 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1HWDhE-0006mA-Nb for emacs-pretest-bug@gnu.org; Tue, 27 Mar 2007 11:34:52 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1HWDhC-0006lf-89; Tue, 27 Mar 2007 10:34:50 -0500 Original-Received: from smtp1.apollo.lv ([80.232.168.211]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1HWDeo-0004eM-Is; Tue, 27 Mar 2007 11:32:23 -0400 X-Cloudmark-Score: 0.000000 [] X-Virusscan: Clamd Original-Received: from [195.13.206.114] (HELO mkbox) by smtp1.apollo.lv (CommuniGate Pro SMTP 5.0.10) with ESMTP id 193052503; Tue, 27 Mar 2007 18:32:20 +0300 X-detected-kernel: Linux 2.6 (newer, 1) X-BeenThere: emacs-pretest-bug@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for CVS Emacs." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org Errors-To: emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.pretest.bugs:17810 gmane.emacs.devel:68670 Archived-At: Sorry for responding so late---had no time to test a newer version. Now I have done it (22.0.96.1) and must say that without additional steps it still behaves as before (badly). If I do set the request type explicitly, then it works fine now. A few more notes below. >> Upon testing the new Emacs behavior on Latvian characters encoded in UTF= -8, >> I noticed that pasting them out of Emacs and into, say, xterm works. Ho= wever, >> pasting them back does not quite work---all the lowercase vowels with ma= crons >> get understood as Chinese characters and lose their previous looks. Thes= e are >> the offending characters: "=A8=A1=A8=A5=A8=A9=A8=AD=A8=B1" (UTF-8 encodi= ng 0xc481, 0xc493, 0xc4ab, >> 0xc58d, 0xc5ab). Saving the text encodes them in UTF-8 again, so the dam= age >> is limited, but working with such text is still a torture. > > That is because your xterm (or X library) sends them encoded > in Chinese (or Japanese) character when COMPOUND_TEXT is > requested from Emacs. It itself is not a bug, but a bad > feature. I remember that some version of xterm (or X > library) uses "UTF-8 extended segments" to embded Unicode > characters in COMPOUND_TEXT in such a case. But it seems > that that is not true in their latest versions. :-( I had no clue that there are different ways of requesting text from an already made selection. Is there a reason that emacs uses COMPOUND_TEXT when other X applications must be using some other way because pasting into them seems to work? > Anyway, I've just improved the function > x-select-utf8-or-ctext to prefer UTF-8 in such a case. > Please try with the latest CVS code. I can't figure out whether I need to do something explicitly with this function, but the paste test I made didn't seem to work (unless I change the request type). >> I tried setting the coding-system for X selection to >> utf-8, but then pasting produces complete gibberish. (And >> I'd say that's a different bug!) Changing language >> environments does not seem to have any effect on either of >> these bugs (tried Latvian, English, UTF-8). > > It's not a bug. Setting selection-coding-system just > changes a way how to decode a selection data, it doesn't > change which data-type (UTF8_STRING, COMPOUND_TEXT, or just > STRING) to request. The latter is controlled by the > variable x-select-request-type. I've just added more words > in the documentation of selection-coding-system. OK, so it seems I should try setting x-select-request-type explicitly... When I set it to UTF8_STRING, then pasting back into emacs works indeed. So the obvious questions are, why isn't that the default? What am I sacrificing if I set it explicitly and thus lose the ability to request COMPOUND_TEXT? >> I've turned the utf-translate-cjk-mode off but this does not >> improve things, contrary to the very promising sounding help-text about = it. >> (Not a word about it in info pages, BTW, that's another wishlist item.) > > Which part makes you think so? It also doesn't affect which > data-type to request. Anyway, it's bad that > utf-translate-cjk-mode is not in Info. Could someone put it > in Info? I'm not good at writing Info. Well, not knowing that the problem of converting Latvian characters to Chinese by Emacs is due to it using COMPOUND_TEXT for the X selection and due to X qualifying the selection with something that makes it be more Chinese than I could have imagined, I was literally reading the help text for this function and believing that Emacs was looking at UTF8 text but just choosing to convert some characters to this Chineses charset they also belonged to... My point is, without having intimate knowledge of X and Emacs internals, it's hard to pinpoint why I can paste UTF8 text between most X apps, but not into Emacs. The help text for this function seemed to provide an explanation, but upon acting on it, I didn't achieve the result I hoped for and therefore I filed a bug.=20 And if this function is mentioned in info, I haven't been able to find it. Of course, I personally don't need it anymore. Anyway, thanks very much for your help. You have provided one way to make Emacs behave as I expected it should (by changing the request method) and you have made me realize that I need to seriously read up on the various methods X uses for passing the copy/paste text around. Unfortunately, I am too busy right now for studying this. Best regards, Martins Krikis =20