Re: UTF-8 paste from xterm picks Chinese charset

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* Re: UTF-8 paste from xterm picks Chinese charset
       [not found] <87bqjkr60c.fsf@gmail.com>
@ 2007-03-06  6:24 ` Kenichi Handa
  0 siblings, 0 replies; 2+ messages in thread
From: Kenichi Handa @ 2007-03-06  6:24 UTC (permalink / raw)
  To: eb001; +Cc: emacs-pretest-bug, emacs-devel

Sorry for the late response on this matter.

In article <87bqjkr60c.fsf@gmail.com>, Martins Krikis <eb001@sanevision.com> writes:

> Upon testing the new Emacs behavior on Latvian characters encoded in UTF-8,
> I noticed that pasting them out of Emacs and into, say, xterm works.  However,
> pasting them back does not quite work---all the lowercase vowels with macrons
> get understood as Chinese characters and lose their previous looks. These are
> the offending characters: "āēīōū" (UTF-8 encoding 0xc481, 0xc493, 0xc4ab,
> 0xc58d, 0xc5ab). Saving the text encodes them in UTF-8 again, so the damage
> is limited, but working with such text is still a torture.

That is because your xterm (or X library) sends them encoded
in Chinese (or Japanese) character when COMPOUND_TEXT is
requested from Emacs.  It itself is not a bug, but a bad
feature.  I remember that some version of xterm (or X
library) uses "UTF-8 extended segments" to embded Unicode
characters in COMPOUND_TEXT in such a case.  But it seems
that that is not true in their latest versions.  :-(

Anyway, I've just improved the function
x-select-utf8-or-ctext to prefer UTF-8 in such a case.
Please try with the latest CVS code.

> I tried setting the coding-system for X selection to
> utf-8, but then pasting produces complete gibberish. (And
> I'd say that's a different bug!) Changing language
> environments does not seem to have any effect on either of
> these bugs (tried Latvian, English, UTF-8).

It's not a bug.  Setting selection-coding-system just
changes a way how to decode a selection data, it doesn't
change which data-type (UTF8_STRING, COMPOUND_TEXT, or just
STRING) to request.  The latter is controlled by the
variable x-select-request-type.  I've just added more words
in the documentation of selection-coding-system.

> I've turned the utf-translate-cjk-mode off but this does not
> improve things, contrary to the very promising sounding help-text about it.
> (Not a word about it in info pages, BTW, that's another wishlist item.)

Which part makes you think so?  It also doesn't affect which
data-type to request.  Anyway, it's bad that
utf-translate-cjk-mode is not in Info.  Could someone put it
in Info?  I'm not good at writing Info.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: UTF-8 paste from xterm picks Chinese charset
@ 2007-03-27 15:32 Martins Krikis
  0 siblings, 0 replies; 2+ messages in thread
From: Martins Krikis @ 2007-03-27 15:32 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-pretest-bug, eb001, emacs-devel

Sorry for responding so late---had no time to test a newer version.

Now I have done it (22.0.96.1) and must say that without additional
steps it still behaves as before (badly). If I do set the request type
explicitly, then it works fine now. A few more notes below.

>> Upon testing the new Emacs behavior on Latvian characters encoded in UTF-8,
>> I noticed that pasting them out of Emacs and into, say, xterm works.  However,
>> pasting them back does not quite work---all the lowercase vowels with macrons
>> get understood as Chinese characters and lose their previous looks. These are
>> the offending characters: "āēīōū" (UTF-8 encoding 0xc481, 0xc493, 0xc4ab,
>> 0xc58d, 0xc5ab). Saving the text encodes them in UTF-8 again, so the damage
>> is limited, but working with such text is still a torture.
>
> That is because your xterm (or X library) sends them encoded
> in Chinese (or Japanese) character when COMPOUND_TEXT is
> requested from Emacs.  It itself is not a bug, but a bad
> feature.  I remember that some version of xterm (or X
> library) uses "UTF-8 extended segments" to embded Unicode
> characters in COMPOUND_TEXT in such a case.  But it seems
> that that is not true in their latest versions.  :-(

I had no clue that there are different ways of requesting text
from an already made selection. Is there a reason that emacs
uses COMPOUND_TEXT when other X applications must be using
some other way because pasting into them seems to work?

> Anyway, I've just improved the function
> x-select-utf8-or-ctext to prefer UTF-8 in such a case.
> Please try with the latest CVS code.

I can't figure out whether I need to do something explicitly
with this function, but the paste test I made didn't seem to
work (unless I change the request type).

>> I tried setting the coding-system for X selection to
>> utf-8, but then pasting produces complete gibberish. (And
>> I'd say that's a different bug!) Changing language
>> environments does not seem to have any effect on either of
>> these bugs (tried Latvian, English, UTF-8).
>
> It's not a bug.  Setting selection-coding-system just
> changes a way how to decode a selection data, it doesn't
> change which data-type (UTF8_STRING, COMPOUND_TEXT, or just
> STRING) to request.  The latter is controlled by the
> variable x-select-request-type.  I've just added more words
> in the documentation of selection-coding-system.

OK, so it seems I should try setting x-select-request-type explicitly...
When I set it to UTF8_STRING, then pasting back into emacs works
indeed. So the obvious questions are, why isn't that the default?
What am I sacrificing if I set it explicitly and thus lose the
ability to request COMPOUND_TEXT?

>> I've turned the utf-translate-cjk-mode off but this does not
>> improve things, contrary to the very promising sounding help-text about it.
>> (Not a word about it in info pages, BTW, that's another wishlist item.)
>
> Which part makes you think so?  It also doesn't affect which
> data-type to request.  Anyway, it's bad that
> utf-translate-cjk-mode is not in Info.  Could someone put it
> in Info?  I'm not good at writing Info.

Well, not knowing that the problem of converting Latvian characters to
Chinese by Emacs is due to it using COMPOUND_TEXT for the X selection
and due to X qualifying the selection with something that makes it
be more Chinese than I could have imagined, I was literally reading
the help text for this function and believing that Emacs was looking
at UTF8 text but just choosing to convert some characters to this
Chineses charset they also belonged to...

My point is, without having intimate knowledge of X and Emacs internals,
it's hard to pinpoint why I can paste UTF8 text between most X apps,
but not into Emacs. The help text for this function seemed to provide
an explanation, but upon acting on it, I didn't achieve the result
I hoped for and therefore I filed a bug. 
And if this function is mentioned in info, I haven't been able to find it.
Of course, I personally don't need it anymore.

Anyway, thanks very much for your help. You have provided one way to
make Emacs behave as I expected it should (by changing the request
method) and you have made me realize that I need to seriously read
up on the various methods X uses for passing the copy/paste text
around. Unfortunately, I am too busy right now for studying this.

Best regards,

  Martins Krikis

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-03-27 15:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <87bqjkr60c.fsf@gmail.com>
2007-03-06  6:24 ` UTF-8 paste from xterm picks Chinese charset Kenichi Handa
2007-03-27 15:32 Martins Krikis

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.