unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
To: eb001@sanevision.com
Cc: emacs-pretest-bug@gnu.org, emacs-devel@gnu.org
Subject: Re: UTF-8 paste from xterm picks Chinese charset
Date: Tue, 06 Mar 2007 15:24:04 +0900	[thread overview]
Message-ID: <E1HOT5g-0004vU-Gd@etlken.m17n.org> (raw)
In-Reply-To: <87bqjkr60c.fsf@gmail.com> (message from Martins Krikis on Fri, 23 Feb 2007 22:08:51 +0200)

Sorry for the late response on this matter.

In article <87bqjkr60c.fsf@gmail.com>, Martins Krikis <eb001@sanevision.com> writes:

> Upon testing the new Emacs behavior on Latvian characters encoded in UTF-8,
> I noticed that pasting them out of Emacs and into, say, xterm works.  However,
> pasting them back does not quite work---all the lowercase vowels with macrons
> get understood as Chinese characters and lose their previous looks. These are
> the offending characters: "āēīōū" (UTF-8 encoding 0xc481, 0xc493, 0xc4ab,
> 0xc58d, 0xc5ab). Saving the text encodes them in UTF-8 again, so the damage
> is limited, but working with such text is still a torture.

That is because your xterm (or X library) sends them encoded
in Chinese (or Japanese) character when COMPOUND_TEXT is
requested from Emacs.  It itself is not a bug, but a bad
feature.  I remember that some version of xterm (or X
library) uses "UTF-8 extended segments" to embded Unicode
characters in COMPOUND_TEXT in such a case.  But it seems
that that is not true in their latest versions.  :-(

Anyway, I've just improved the function
x-select-utf8-or-ctext to prefer UTF-8 in such a case.
Please try with the latest CVS code.

> I tried setting the coding-system for X selection to
> utf-8, but then pasting produces complete gibberish. (And
> I'd say that's a different bug!) Changing language
> environments does not seem to have any effect on either of
> these bugs (tried Latvian, English, UTF-8).

It's not a bug.  Setting selection-coding-system just
changes a way how to decode a selection data, it doesn't
change which data-type (UTF8_STRING, COMPOUND_TEXT, or just
STRING) to request.  The latter is controlled by the
variable x-select-request-type.  I've just added more words
in the documentation of selection-coding-system.

> I've turned the utf-translate-cjk-mode off but this does not
> improve things, contrary to the very promising sounding help-text about it.
> (Not a word about it in info pages, BTW, that's another wishlist item.)

Which part makes you think so?  It also doesn't affect which
data-type to request.  Anyway, it's bad that
utf-translate-cjk-mode is not in Info.  Could someone put it
in Info?  I'm not good at writing Info.

---
Kenichi Handa
handa@m17n.org

       reply	other threads:[~2007-03-06  6:24 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87bqjkr60c.fsf@gmail.com>
2007-03-06  6:24 ` Kenichi Handa [this message]
2007-03-27 15:32 UTF-8 paste from xterm picks Chinese charset Martins Krikis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E1HOT5g-0004vU-Gd@etlken.m17n.org \
    --to=handa@m17n.org \
    --cc=eb001@sanevision.com \
    --cc=emacs-devel@gnu.org \
    --cc=emacs-pretest-bug@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).