From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel,gmane.emacs.gnus.general Subject: Re: MML charset tag regression Date: Mon, 28 Apr 2003 20:58:34 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200304281158.UAA10974@etlken.m17n.org> References: <8465p3kgpl.fsf@lucy.is.informatik.uni-duisburg.de> <84bryuogke.fsf@lucy.is.informatik.uni-duisburg.de> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: main.gmane.org 1051531133 3028 80.91.224.249 (28 Apr 2003 11:58:53 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 28 Apr 2003 11:58:53 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Mon Apr 28 13:58:50 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19A7HW-0000mb-00 for ; Mon, 28 Apr 2003 13:58:50 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 19A7Pf-0003FU-00 for ; Mon, 28 Apr 2003 14:07:16 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 19A7I0-000077-06 for emacs-devel@quimby.gnus.org; Mon, 28 Apr 2003 07:59:20 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 19A7Hg-0008Vh-00 for emacs-devel@gnu.org; Mon, 28 Apr 2003 07:59:00 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 19A7He-0008SV-00 for emacs-devel@gnu.org; Mon, 28 Apr 2003 07:58:59 -0400 Original-Received: from tsukuba.m17n.org ([192.47.44.130]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 19A7Hd-0008OI-00 for emacs-devel@gnu.org; Mon, 28 Apr 2003 07:58:57 -0400 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])h3SBwao12352; Mon, 28 Apr 2003 20:58:36 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) h3SBwZA19859; Mon, 28 Apr 2003 20:58:35 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id UAA10974; Mon, 28 Apr 2003 20:58:34 +0900 (JST) Original-To: cloos@jhcloos.com, jas@extundo.com In-reply-to: User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) Original-cc: ding@gnus.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:13510 gmane.emacs.gnus.general:51915 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:13510 In article , "James H. Cloos Jr." writes: >>>>>> "Simon" == Simon Josefsson writes: Simon> For me, when I yanked the string into emacs from galeon it Simon> becomes double-width. It is single-width in galeon though. > I also see that; any pasting of cyrillic text via pasting X's > primary or from the clipboard. The wide cyrillic is from the > japanese-jisx0208 charset. [...] In article , Simon Josefsson writes: > That may be interesting by itself. Go to > http://www.nns.ru/persons/gorbach.html using galeon (or mozilla, I > think). Cut'n'paste the first word and yank it in Emacs. It looks as > single-width in galeon, but when yanked into emacs it becomes double > width. Yanking it into xterm or gnome-terminal doesn't change the > string, it looks like single-width. Save the HTML file and open it in > emacs as a koi8 file (note that emacs doesn't auto detect it as koi8 > so you to do that manually), then it is single-width too. > I guess it is the emacs X cut'n'paste code that somehow makes the > string into double width japanese characters. I don't think so. There's no such code in Emacs that does such a conversion. I think galeon sends Emacs those cyrillic characters by encoding into COMPOUND_TEXT as a charset of JISX0208. Please try this: At first, select a cyrillic text on galeon. Then type this in Emacs: C-x RET X raw-text RET C-y. You'll see something like this; "ESC $ ( B ...". Next, try this: At first, select a cyrillic text on galeon. Then evalute this in Emacs: (decode-coding-string (x-get-selecion 'PRIMARY 'UTF8_STRING) 'utf-8) I think you'll see single width cyrillic chars (you have to have a iso10646-1 font containing cyrillic glyphs). The selection problem is very deep. :-( Ideally, the requester should be able to request of the type 'TEXT instead of the specific 'COMPOUND_TEXT or 'UTF8_STRING, and the requestee should return a text by one of these appropriate types that can endocde the text; STRING, COMPOUND_TEXT, or UTF8_STRING (in this priority order). But, unfortunetely, many X clients (requestee) don't behaves like that. If 'TEXT is requested, many returns just "?????" even if the text can be correctly encoded by COMPOUND_TEXT or UTF8_STRING. So, it is necessary for Emacs to request by a specific type 'COMPOUND_TEXT ('UTF8_STRING has been recently introduced in XFree86, and there are many clients that still doesn't support it). Recently, many gtk clients start supporting UTF8_STRING without making COMPOUND_TEXT support better. It may cause no problem between gtk clients because they will request only the type UTF8_STING. But, it's a too shortsighted manner. :-( The new encoding method using "Non-Standard Character Set Encodings" of COMPOUND_TEXT makes the cyrillic case much more complicated. In some case (perhaps only in KOI8 locale), X clients recently start to encode cyrillic characters in "ESC % / 0 ...". They don't consider the situation that the requester is running in a different locale. :-( Perhaps, we should make Emacs to request UTF8_STRING at first if the locale is UTF8, and if that request fails, request COMPOUND_TEXT. --- Ken'ichi HANDA handa@m17n.org