From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Simon Josefsson Newsgroups: gmane.emacs.gnus.general,gmane.emacs.devel Subject: Re: MML charset tag regression Date: Tue, 29 Apr 2003 01:05:12 +0200 Sender: ding-owner@lists.math.uh.edu Message-ID: References: <8465p3kgpl.fsf@lucy.is.informatik.uni-duisburg.de> <84bryuogke.fsf@lucy.is.informatik.uni-duisburg.de> <200304281158.UAA10974@etlken.m17n.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1051571081 11822 80.91.224.249 (28 Apr 2003 23:04:41 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 28 Apr 2003 23:04:41 +0000 (UTC) Cc: cloos@jhcloos.com, emacs-devel@gnu.org, ding@gnus.org Original-X-From: ding-owner+M472@lists.math.uh.edu Tue Apr 29 01:04:36 2003 Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19AHfo-00034I-00 for ; Tue, 29 Apr 2003 01:04:36 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 19AHgg-0003Ye-00; Mon, 28 Apr 2003 18:05:30 -0500 Original-Received: from sclp3.sclp.com ([64.157.176.121]) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 19AHgY-0003YY-00 for ding@lists.math.uh.edu; Mon, 28 Apr 2003 18:05:22 -0500 Original-Received: (qmail 57142 invoked by alias); 28 Apr 2003 23:05:22 -0000 Original-Received: (qmail 57137 invoked from network); 28 Apr 2003 23:05:21 -0000 Original-Received: from 178.230.13.217.in-addr.dgcsystems.net (HELO yxa.extundo.com) (217.13.230.178) by sclp3.sclp.com with SMTP; 28 Apr 2003 23:05:21 -0000 Original-Received: from latte.josefsson.org (yxa.extundo.com [217.13.230.178]) (authenticated bits=0) by yxa.extundo.com (8.12.9/8.12.9) with ESMTP id h3SN5DbU004379 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK); Tue, 29 Apr 2003 01:05:13 +0200 Original-To: Kenichi Handa Mail-Copies-To: nobody X-Payment: hashcash 1.2 0:030428:handa@m17n.org:8399192059cf0034 X-Hashcash: 0:030428:handa@m17n.org:8399192059cf0034 X-Payment: hashcash 1.2 0:030428:cloos@jhcloos.com:1c3bacd6b60fc812 X-Hashcash: 0:030428:cloos@jhcloos.com:1c3bacd6b60fc812 X-Payment: hashcash 1.2 0:030428:emacs-devel@gnu.org:497825dfd1a736c7 X-Hashcash: 0:030428:emacs-devel@gnu.org:497825dfd1a736c7 X-Payment: hashcash 1.2 0:030428:ding@gnus.org:78d7dc42fbc76719 X-Hashcash: 0:030428:ding@gnus.org:78d7dc42fbc76719 In-Reply-To: <200304281158.UAA10974@etlken.m17n.org> (Kenichi Handa's message of "Mon, 28 Apr 2003 20:58:34 +0900 (JST)") User-Agent: Gnus/5.09002 (Oort Gnus v0.20) XEmacs/21.4 (Portable Code, linux) Precedence: bulk Xref: main.gmane.org gmane.emacs.gnus.general:51929 gmane.emacs.devel:13521 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:13521 Kenichi Handa writes: >> I guess it is the emacs X cut'n'paste code that somehow makes the >> string into double width japanese characters. > > I don't think so. There's no such code in Emacs that does > such a conversion. Emacs behaves different from xterm, gnome-terminal, gedit, etc though. > I think galeon sends Emacs those cyrillic characters by > encoding into COMPOUND_TEXT as a charset of JISX0208. > > Please try this: > > At first, select a cyrillic text on galeon. Then type this > in Emacs: C-x RET X raw-text RET C-y. You'll see something > like this; "ESC $ ( B ...". I see ^[$(B'$'`'b'R'Q'i'V'S... > Next, try this: > > At first, select a cyrillic text on galeon. Then evalute > this in Emacs: > (decode-coding-string (x-get-selecion 'PRIMARY 'UTF8_STRING) 'utf-8) > I think you'll see single width cyrillic chars (you have to > have a iso10646-1 font containing cyrillic glyphs). Yes, this works fine. > Ideally, the requester should be able to request of the type > 'TEXT instead of the specific 'COMPOUND_TEXT or > 'UTF8_STRING, and the requestee should return a text by one > of these appropriate types that can endocde the text; > STRING, COMPOUND_TEXT, or UTF8_STRING (in this priority > order). > > But, unfortunetely, many X clients (requestee) don't behaves > like that. If 'TEXT is requested, many returns just "?????" > even if the text can be correctly encoded by COMPOUND_TEXT > or UTF8_STRING. Is this a bug in that client? Or maybe emacs can detect that the TEXT request failed? Is "?????" some magic string emacs can test for? If it could detect this, it could continue and try to ask for COMPOUND_TEXT or UTF8_STRING. This isn't the problem I'm seeing though. > So, it is necessary for Emacs to request by a specific type > 'COMPOUND_TEXT ('UTF8_STRING has been recently introduced in > XFree86, and there are many clients that still doesn't > support it). What do XFree86 recommend applications to use? UTF8_STRING with fallback to COMPOUND_TEXT? Or TEXT? Unless there is some well-agreed on non-controversial recommendation on how internationalized X11 cut'n'paste should work, all attempts to get a complete system working seems futile. > Recently, many gtk clients start supporting UTF8_STRING > without making COMPOUND_TEXT support better. It may cause > no problem between gtk clients because they will request > only the type UTF8_STING. But, it's a too shortsighted > manner. :-( Ouch. Some people claim GTK2 support both UTF8_STRING and COMPOUND_TEXT though , but Galeon uses GTK2 and obviously it doesn't produce a good COMPOUND_TEXT. > The new encoding method using "Non-Standard Character Set > Encodings" of COMPOUND_TEXT makes the cyrillic case much > more complicated. In some case (perhaps only in KOI8 > locale), X clients recently start to encode cyrillic > characters in "ESC % / 0 ...". They don't consider the > situation that the requester is running in a different > locale. :-( Do you mean the client sends data in a locale-specific charset via COMPOUND_TEXT? Ouch. > Perhaps, we should make Emacs to request UTF8_STRING at > first if the locale is UTF8, and if that request fails, > request COMPOUND_TEXT. This sounds like a good idea to me.