unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* getting Mule, Unicode & X selection to play together
@ 2002-12-15  0:09 Michael Livshin
  2002-12-15  6:14 ` Eli Zaretskii
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Michael Livshin @ 2002-12-15  0:09 UTC (permalink / raw)



so the day had come and I decided to explore the wonderful world of
Emacs 21 and Mule, what with all the nice Debian packaging of them out
there.

so I installed Emacs 21 and mule-ucs (it seemed like a good idea, or
was it?), and I've put the following into .emacs:

(set-language-environment "Cyrillic-KOI8")      ; I want my cyrillics
(define-coding-system-alias 'mule-utf-8 'utf-8)  ; per mule-ucs README
(set-keyboard-coding-system 'utf-8) ; my keyboard generates
                                    ; Unicode-encoded cyrillic chars

now, I'm mostly interested in making X selection play well between
Emacs and several Unicode-based apps (mainly Mozilla and a couple of
GTK2-based critters).

I had to play with the locale settings, to get the X clipboard to
approach at least some sanity.  so they ended up like this
(considering that I don't want the programs to speak Russian at me and
I live in Israel):

LANG=ru_RU.UTF-8
LC_CTYPE=ru_RU.UTF-8
LC_NUMERIC=he_IL
LC_TIME=he_IL
LC_COLLATE=ru_RU.UTF-8
LC_MONETARY=he_IL
LC_MESSAGES=C
LC_PAPER=C
LC_NAME=C
LC_ADDRESS=he_IL
LC_TELEPHONE=he_IL
LC_MEASUREMENT=he_IL
LC_IDENTIFICATION=ru_RU.UTF-8
LC_ALL=

if I select a chunk of cyrillic text in Emacs and paste it into
Mozilla, all is well.

now, if I select a chunk of cyrillic text in Mozilla and paste it into
Emacs, I do indeed get the same-looking text.  however, the char codes
are different from whatever Emacs itself chooses for the same entities
if I type them into it (which is just weird, but no biggie), and (as a
consequence, probably) the pasted text is shown in a different font
(which is butt ugly).

so basically I'd like Emacs to somehow recognize the cyrillic
characters in the X selection it receives, and to convert them into
the codes it itself uses for the same characters.  how do I do that?

-- 
There are few personal problems which can't be solved by the suitable
application of high explosives.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getting Mule, Unicode & X selection to play together
  2002-12-15  0:09 getting Mule, Unicode & X selection to play together Michael Livshin
@ 2002-12-15  6:14 ` Eli Zaretskii
  2002-12-15 21:33 ` Tatsuya Kinoshita
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2002-12-15  6:14 UTC (permalink / raw)



On Sun, 15 Dec 2002, Michael Livshin wrote:

> so the day had come and I decided to explore the wonderful world of
> Emacs 21 and Mule, what with all the nice Debian packaging of them out
> there.
> 
> so I installed Emacs 21 and mule-ucs (it seemed like a good idea, or
> was it?)

It's not necessarily a good idea to bring Mule-UCS into this equation.  In 
any case, you will be much better off using the CVS version of Emacs 
where several related bugs were fixed lately.

> now, if I select a chunk of cyrillic text in Mozilla and paste it into
> Emacs, I do indeed get the same-looking text.  however, the char codes
> are different from whatever Emacs itself chooses for the same entities
> if I type them into it (which is just weird, but no biggie), and (as a
> consequence, probably) the pasted text is shown in a different font
> (which is butt ugly).

I suspect that Emacs converts the pasted text into Unicode codepoints, 
and that your Unicode font is ugly.  What does "C-u C-x =" tell about the 
cyrillic characters you paste this way?

> so basically I'd like Emacs to somehow recognize the cyrillic
> characters in the X selection it receives, and to convert them into
> the codes it itself uses for the same characters.

The problem is, Emacs 21 uses two different codepoints for Cyrillic 
characters: one based on ISO-8859-5, the other based on Unicode.  
Conversion between them is not supported in stock Emacs distributions, 
AFAIK you need either add-on packages (such as ucs-tables you can find
on gnu.emacs.sources) or the latest development code from CVS.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getting Mule, Unicode & X selection to play together
  2002-12-15  0:09 getting Mule, Unicode & X selection to play together Michael Livshin
  2002-12-15  6:14 ` Eli Zaretskii
@ 2002-12-15 21:33 ` Tatsuya Kinoshita
  2002-12-16  9:30 ` Roman Belenov
  2002-12-16 11:21 ` Tatsuya Kinoshita
  3 siblings, 0 replies; 9+ messages in thread
From: Tatsuya Kinoshita @ 2002-12-15 21:33 UTC (permalink / raw)


On December 15, 2002 at 2:09AM +0200,
Michael Livshin <usenet@cmm.kakpryg.net> wrote:

> so the day had come and I decided to explore the wonderful world of
> Emacs 21 and Mule, what with all the nice Debian packaging of them out
> there.
> 
> so I installed Emacs 21 and mule-ucs

> now, if I select a chunk of cyrillic text in Mozilla and paste it into
> Emacs, I do indeed get the same-looking text.  however, the char codes
> are different from whatever Emacs itself chooses for the same entities
> if I type them into it (which is just weird, but no biggie), and (as a
> consequence, probably) the pasted text is shown in a different font
> (which is butt ugly).

Did you install the xfonts-base-transcoded package?

In Debian, fonts in several ISO 8859 encodings transcoded from
ISO 10646-1 are contained with the xfonts-*-transcoded packages.
See also `apt-cache show xfonts-base-transcoded' and
`apt-cache search transcoded'.

-- 
Tatsuya Kinoshita

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getting Mule, Unicode & X selection to play together
  2002-12-15  0:09 getting Mule, Unicode & X selection to play together Michael Livshin
  2002-12-15  6:14 ` Eli Zaretskii
  2002-12-15 21:33 ` Tatsuya Kinoshita
@ 2002-12-16  9:30 ` Roman Belenov
  2002-12-16 11:21 ` Tatsuya Kinoshita
  3 siblings, 0 replies; 9+ messages in thread
From: Roman Belenov @ 2002-12-16  9:30 UTC (permalink / raw)


Michael Livshin <usenet@cmm.kakpryg.net> writes:

> so basically I'd like Emacs to somehow recognize the cyrillic
> characters in the X selection it receives, and to convert them into
> the codes it itself uses for the same characters.  how do I do that?

I've patched utf-8.el (located in lisp/international) to make it use
cyrillic-iso8859-5 character set (the patch is against file from
GNU Emacs 21.2); I guess it should solve your problem.

CVS version already has support for it (although I never tried it,
just looked through sources).


==============================================================================
--- utf-8.el.orig	2002-12-16 12:06:57.000000000 +0300
+++ utf-8.el	2002-09-04 17:39:50.000000000 +0400
@@ -116,14 +116,18 @@
 		      ((r0 = ,(charset-id 'latin-iso8859-1))
 		       (r1 -= 128)
 		       (write-multibyte-character r0 r1))
-
+            ((r2 = (r1 <= #x045f))
+             (if ((r1 >= #x0400) & r2)
+                 ((r0 = ,(charset-id 'cyrillic-iso8859-5))
+                  (r1 -= #x03e0)
+                  (write-multibyte-character r0 r1))
 		    ;; mule-unicode-0100-24ff (< 0800)
 		    ((r0 = ,(charset-id 'mule-unicode-0100-24ff))
 		     (r1 -= #x0100)
 		     (r2 = (((r1 / 96) + 32) << 7))
 		     (r1 %= 96)
 		     (r1 += (r2 + 32))
-		     (write-multibyte-character r0 r1)))))))
+            (write-multibyte-character r0 r1)))))))))
 
 	  ;; 3byte encoding
 	  ;; zzzzyyyyyyxxxxxx = 1110zzzz 10yyyyyy 10xxxxxx
@@ -246,6 +250,12 @@
 	     (r1 &= #x3f)
 	     (r1 |= #x80)
 	     (write r0 r1))
+      (if (r0 == ,(charset-id 'cyrillic-iso8859-5))
+          ((r0 = (((r1 - #x20) >> 6) | #xd0))
+           (r1 -= #x20)
+           (r1 &= #x3f)
+           (r1 |= #x80)
+           (write r0 r1))
 
 	  (if (r0 == ,(charset-id 'mule-unicode-0100-24ff))
 	      ((r0 = ((((r1 & #x3f80) >> 7) - 32) * 96))
@@ -327,7 +337,7 @@
 		    ;; Output U+FFFD, which is `ef bf bd' in UTF-8.
 		    ((write #xef)
 		     (write #xbf)
-		     (write #xbd)))))))))
+            (write #xbd))))))))))
       (repeat)))
     (if (r1 >= #xa0)
 	(write r1)
@@ -348,6 +358,7 @@
    eight-bit-control
    eight-bit-graphic
    latin-iso8859-1
+   cyrillic-iso8859-5
    mule-unicode-0100-24ff
    mule-unicode-2500-33ff
    mule-unicode-e000-ffff
@@ -367,6 +378,7 @@
     eight-bit-control
     eight-bit-graphic
     latin-iso8859-1
+    cyrillic-iso8859-5
     mule-unicode-0100-24ff
     mule-unicode-2500-33ff
     mule-unicode-e000-ffff)
==============================================================================

-- 
 							With regards, Roman.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getting Mule, Unicode & X selection to play together
  2002-12-15  0:09 getting Mule, Unicode & X selection to play together Michael Livshin
                   ` (2 preceding siblings ...)
  2002-12-16  9:30 ` Roman Belenov
@ 2002-12-16 11:21 ` Tatsuya Kinoshita
  3 siblings, 0 replies; 9+ messages in thread
From: Tatsuya Kinoshita @ 2002-12-16 11:21 UTC (permalink / raw)


On December 15, 2002 at 2:09AM +0200,
Michael Livshin <usenet@cmm.kakpryg.net> wrote:

> LANG=ru_RU.UTF-8
> LC_CTYPE=ru_RU.UTF-8

> now, if I select a chunk of cyrillic text in Mozilla and paste it into
> Emacs, I do indeed get the same-looking text.  however, the char codes
> are different from whatever Emacs itself chooses for the same entities
> if I type them into it (which is just weird, but no biggie), and (as a
> consequence, probably) the pasted text is shown in a different font
> (which is butt ugly).

How about `/usr/bin/env LC_ALL=ru_RU.ISO-8859-5 /usr/bin/mozilla'?
(The UTF-8 bug might exist in Mozilla or in Xlib...)

-- 
Tatsuya Kinoshita

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getting Mule, Unicode & X selection to play together
       [not found] <mailman.224.1039932883.19936.help-gnu-emacs@gnu.org>
@ 2002-12-17 23:02 ` Michael Livshin
  2002-12-18  5:47   ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Livshin @ 2002-12-17 23:02 UTC (permalink / raw)


Eli Zaretskii <eliz@is.elta.co.il> writes:

> The problem is, Emacs 21 uses two different codepoints for Cyrillic 
> characters: one based on ISO-8859-5, the other based on Unicode.  
> Conversion between them is not supported in stock Emacs distributions, 
> AFAIK you need either add-on packages (such as ucs-tables you can find
> on gnu.emacs.sources) or the latest development code from CVS.

thanks!  I installed the CVS version and got everything to work.

to possible future sufferers: the _key_ thing about getting anything
MULE-related to work seems to be /letting go/.  by any means, don't
try logic!  you'll spend hours in pain, you'll pull half your hair
out, and it just won't work for you.

my satori found me the minute I happened upon the following, in
fontset.el:

(defvar x-font-name-charset-alist
  '(...
    ("koi8" ascii cyrillic-iso8859-5)
    ...))

after seeing the above, it was a matter of slapping self on whatever
passes for forehead, setting the global locale to ru_RU.KOI8-R,
setting the Emacs language environment to "Cyrillic-ISO", not
forgetting to explicitly map the "cyrillic-iso8859-5" encoding to an
iso8859-5 font in the fontset (no sir, Emacs *won't* grok it by
itself, how could it?), and voila!

(OK, so as a consequence of setting the global locale to something
 un-Unicodelly, I won't be able to cut-n-paste in more than 2
 languages at the same time.  no biggie, at least cyrillics are
 working.)

bitter unfunny sarcasm aside, Emacs 21.3.50 seems to be a *really*
nice piece of work, so far.

thanks all,
--m

-- 
This program posts news to billions of machines throughout the galaxy.  Your
message will cost the net enough to bankrupt your entire planet.  As a result
your species will be sold into slavery.  Be sure you know what you are doing.
Are you absolutely sure you want to do this? [yn] y

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getting Mule, Unicode & X selection to play together
  2002-12-17 23:02 ` Michael Livshin
@ 2002-12-18  5:47   ` Eli Zaretskii
  0 siblings, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2002-12-18  5:47 UTC (permalink / raw)



On Wed, 18 Dec 2002, Michael Livshin wrote:

> to possible future sufferers: the _key_ thing about getting anything
> MULE-related to work seems to be /letting go/.  by any means, don't
> try logic!  you'll spend hours in pain, you'll pull half your hair
> out, and it just won't work for you.

OTOH, you _can_ try logic after you read the sources ;-)

> not
> forgetting to explicitly map the "cyrillic-iso8859-5" encoding to an
> iso8859-5 font in the fontset (no sir, Emacs *won't* grok it by
> itself, how could it?)

I'm surprised this is so, but if it is, it sounds like a bug, so please 
report it (with a precise test case to reproduce) to 
emacs-pretest-bug@gnu.org.  Thanks.

Btw, cyrillic-iso8859-5 is not an encoding, it's a character set.  But 
that's nitpicking.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getting Mule, Unicode & X selection to play together
       [not found] <mailman.369.1040190440.19936.help-gnu-emacs@gnu.org>
@ 2002-12-18 10:44 ` Michael Livshin
  2002-12-18 10:56   ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Livshin @ 2002-12-18 10:44 UTC (permalink / raw)


Eli Zaretskii <eliz@is.elta.co.il> writes:

>> not forgetting to explicitly map the "cyrillic-iso8859-5" encoding
>> to an iso8859-5 font in the fontset (no sir, Emacs *won't* grok it
>> by itself, how could it?)
>
> I'm surprised this is so, but if it is, it sounds like a bug, so please 
> report it (with a precise test case to reproduce) to 
> emacs-pretest-bug@gnu.org.  Thanks.

well, let me clarify.  Emacs *did* find an appropriately-encoded font,
the problem was that it took the iso8859-5 font from the "standard"
fontset and not from my fontset, even though there surely *it* a
matching font in my fontset (and it works great once I map
"cyrillic-iso8859-5" to it explicitly).

perhaps it's a kind of feature?

it would be nice to simply be able to tell Emacs to forget the
standard fontset altogether, or at least to completely ignore it.

> Btw, cyrillic-iso8859-5 is not an encoding, it's a character set.
> But that's nitpicking.

it's kind of subtle, so it's certainly worth pointing out.  thank you.

-- 
Incrementally extended heuristic algorithms tend inexorably toward the
incomprehensible.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getting Mule, Unicode & X selection to play together
  2002-12-18 10:44 ` Michael Livshin
@ 2002-12-18 10:56   ` Eli Zaretskii
  0 siblings, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2002-12-18 10:56 UTC (permalink / raw)



On Wed, 18 Dec 2002, Michael Livshin wrote:

> well, let me clarify.  Emacs *did* find an appropriately-encoded font,
> the problem was that it took the iso8859-5 font from the "standard"
> fontset and not from my fontset, even though there surely *it* a
> matching font in my fontset (and it works great once I map
> "cyrillic-iso8859-5" to it explicitly).
> 
> perhaps it's a kind of feature?

I still think it's worth reporting as a possible misfeature or bug.  (I 
myself don't know enough about fontsets to give an opinion.)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2002-12-18 10:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-15  0:09 getting Mule, Unicode & X selection to play together Michael Livshin
2002-12-15  6:14 ` Eli Zaretskii
2002-12-15 21:33 ` Tatsuya Kinoshita
2002-12-16  9:30 ` Roman Belenov
2002-12-16 11:21 ` Tatsuya Kinoshita
     [not found] <mailman.224.1039932883.19936.help-gnu-emacs@gnu.org>
2002-12-17 23:02 ` Michael Livshin
2002-12-18  5:47   ` Eli Zaretskii
     [not found] <mailman.369.1040190440.19936.help-gnu-emacs@gnu.org>
2002-12-18 10:44 ` Michael Livshin
2002-12-18 10:56   ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).