all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* UTF-8 character question
@ 2008-05-12  6:49 horatio
  2008-05-12  7:14 ` Harald Hanche-Olsen
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: horatio @ 2008-05-12  6:49 UTC (permalink / raw)
  To: help-gnu-emacs

I downloaded Emacs 22.2.1 for Windows, and I was pleased to find that
Chinese characters work "out of the box" on my computer.  However, I
have a weird visualization problem for some characters.  One example
is 你你.  These two characters appear the same in Firefox, in Notepad,
in the file system (ie Explorer), and in various other places.
However, in Emacs, the character on the left appears as an empty
square, but the character on the right shows up as the Chinese
character for "you".  Is there some way to make Emacs correctly
display both versions of this character?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UTF-8 character question
  2008-05-12  6:49 UTF-8 character question horatio
@ 2008-05-12  7:14 ` Harald Hanche-Olsen
  2008-05-12  7:51   ` horatio
  2008-05-12  7:17 ` Harald Hanche-Olsen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Harald Hanche-Olsen @ 2008-05-12  7:14 UTC (permalink / raw)
  To: help-gnu-emacs

+ horatio@gmail.com:

> I downloaded Emacs 22.2.1 for Windows, and I was pleased to find that
> Chinese characters work "out of the box" on my computer.  However, I
> have a weird visualization problem for some characters.  One example
> is 你你.  These two characters appear the same in Firefox, in Notepad,
> in the file system (ie Explorer), and in various other places.
> However, in Emacs, the character on the left appears as an empty
> square, but the character on the right shows up as the Chinese
> character for "you".

I am confused. They not only /look/ the same, they /are/ the same
character (U+4F60). Maybe your news posting software knows what emacs
doesn't, and has changed one of those so they are equal?

I'm afraid you will have to describe the difference between the two
characters somehow.

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
  when there is no ground whatsoever for supposing it is true.
  -- Bertrand Russell


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UTF-8 character question
  2008-05-12  6:49 UTF-8 character question horatio
  2008-05-12  7:14 ` Harald Hanche-Olsen
@ 2008-05-12  7:17 ` Harald Hanche-Olsen
  2008-05-12  9:21 ` Jason Rumney
  2008-05-12 15:42 ` Peter Dyballa
  3 siblings, 0 replies; 10+ messages in thread
From: Harald Hanche-Olsen @ 2008-05-12  7:17 UTC (permalink / raw)
  To: help-gnu-emacs

I should have made the following addition to my previous post:

+ horatio@gmail.com:

> However, in Emacs, the character on the left appears as an empty
> square, but

The empty box is emacs' way of displaying a character it doesn't know
how to display, meaning it is not present in the current fontset. This
doesn't tell you how to solve the problem of course, but it tells you
something about where to look for a solution.

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
  when there is no ground whatsoever for supposing it is true.
  -- Bertrand Russell


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UTF-8 character question
  2008-05-12  7:14 ` Harald Hanche-Olsen
@ 2008-05-12  7:51   ` horatio
  2008-05-12  8:07     ` horatio
  0 siblings, 1 reply; 10+ messages in thread
From: horatio @ 2008-05-12  7:51 UTC (permalink / raw)
  To: help-gnu-emacs

On May 12, 12:14 am, Harald Hanche-Olsen <han...@math.ntnu.no> wrote:
> + hora...@gmail.com:
>
> > I downloaded Emacs 22.2.1 for Windows, and I was pleased to find that
> > Chinese characters work "out of the box" on my computer.  However, I
> > have a weird visualization problem for some characters.  One example
> > is 你你.  These two characters appear the same in Firefox, in Notepad,
> > in the file system (ie Explorer), and in various other places.
> > However, in Emacs, the character on the left appears as an empty
> > square, but the character on the right shows up as the Chinese
> > character for "you".
>
> I am confused. They not only /look/ the same, they /are/ the same
> character (U+4F60). Maybe your news posting software knows what emacs
> doesn't, and has changed one of those so they are equal?
>
> I'm afraid you will have to describe the difference between the two
> characters somehow.

I used Firefox to post, and yes, it replaced one of the characters for
me.  I don't know how to figure out what the encoding is for the
character Emacs is correctly displaying, but the character U+4F60 does
not display correctly in my version of Emacs.  Instead, it shows up as
the empty square.  There's another version of the same character that
does show up correctly in Emacs, but unfortunately it's not the one
used elsewhere.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UTF-8 character question
  2008-05-12  7:51   ` horatio
@ 2008-05-12  8:07     ` horatio
  2008-05-12  8:16       ` Harald Hanche-Olsen
  0 siblings, 1 reply; 10+ messages in thread
From: horatio @ 2008-05-12  8:07 UTC (permalink / raw)
  To: help-gnu-emacs

On May 12, 12:51 am, hora...@gmail.com wrote:
> On May 12, 12:14 am, Harald Hanche-Olsen <han...@math.ntnu.no> wrote:
>
>
>
> > + hora...@gmail.com:
>
> > > I downloaded Emacs 22.2.1 for Windows, and I was pleased to find that
> > > Chinese characters work "out of the box" on my computer.  However, I
> > > have a weird visualization problem for some characters.  One example
> > > is 你你.  These two characters appear the same in Firefox, in Notepad,
> > > in the file system (ie Explorer), and in various other places.
> > > However, in Emacs, the character on the left appears as an empty
> > > square, but the character on the right shows up as the Chinese
> > > character for "you".
>
> > I am confused. They not only /look/ the same, they /are/ the same
> > character (U+4F60). Maybe your news posting software knows what emacs
> > doesn't, and has changed one of those so they are equal?
>
> > I'm afraid you will have to describe the difference between the two
> > characters somehow.
>
> I used Firefox to post, and yes, it replaced one of the characters for
> me.  I don't know how to figure out what the encoding is for the
> character Emacs is correctly displaying, but the character U+4F60 does
> not display correctly in my version of Emacs.  Instead, it shows up as
> the empty square.  There's another version of the same character that
> does show up correctly in Emacs, but unfortunately it's not the one
> used elsewhere.

Fascinating.  I just found something else out.  When I save the file,
and then reload it, the character that was successfully displayed
earlier is now displayed as an empty box.  Maybe there is only one 你
character, and sometimes Emacs can show it, and sometimes it can't.

Furthermore, when I use Options->Mule->List Character Sets, some of
the supported character sets are entirely empty boxes.  The strange
thing about that is there are definitely some characters that it shows
fine, with none of these issues.  It's pretty strange that for some
characters, it can show the Chinese characters, and for others it
can't.

My guess is there's some basic option or package that I'm missing that
will make the problem go away.  Can you (or anyone else) copy and
paste that character into an Emacs buffer?  If it works, can you think
of anything in your setup that I might not have done?  I'll take a
look myself in the meantime.

Thanks for the help.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UTF-8 character question
  2008-05-12  8:07     ` horatio
@ 2008-05-12  8:16       ` Harald Hanche-Olsen
  2008-05-12  8:35         ` David Kastrup
  0 siblings, 1 reply; 10+ messages in thread
From: Harald Hanche-Olsen @ 2008-05-12  8:16 UTC (permalink / raw)
  To: help-gnu-emacs

+ horatio@gmail.com:

> My guess is there's some basic option or package that I'm missing that
> will make the problem go away.  Can you (or anyone else) copy and
> paste that character into an Emacs buffer?  If it works, can you think
> of anything in your setup that I might not have done?  I'll take a
> look myself in the meantime.

I can copy and paste it just fine.  However, you said you're running
emacs 22 on windows, right? I am running various versions of emacs 23
(the development version) on unix, so I very much doubt that you can
learn anything useful from my setup. I don't do anything out of the
ordinary with font setup anyway (other than using the Vera Sans Mono
font, which will affect only the latin characters). I think some other
users of emacs on windows will have to step in.

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
  when there is no ground whatsoever for supposing it is true.
  -- Bertrand Russell


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UTF-8 character question
  2008-05-12  8:16       ` Harald Hanche-Olsen
@ 2008-05-12  8:35         ` David Kastrup
  2008-05-12  9:37           ` horatio
  0 siblings, 1 reply; 10+ messages in thread
From: David Kastrup @ 2008-05-12  8:35 UTC (permalink / raw)
  To: help-gnu-emacs

Harald Hanche-Olsen <hanche@math.ntnu.no> writes:

> + horatio@gmail.com:
>
>> My guess is there's some basic option or package that I'm missing
>> that will make the problem go away.  Can you (or anyone else) copy
>> and paste that character into an Emacs buffer?  If it works, can you
>> think of anything in your setup that I might not have done?  I'll
>> take a look myself in the meantime.
>
> I can copy and paste it just fine.  However, you said you're running
> emacs 22 on windows, right? I am running various versions of emacs 23
> (the development version) on unix, so I very much doubt that you can
> learn anything useful from my setup. I don't do anything out of the
> ordinary with font setup anyway (other than using the Vera Sans Mono
> font, which will affect only the latin characters). I think some other
> users of emacs on windows will have to step in.

If he is using Chinese or other CJK stuff a lot, he might want to bite
the bullet and switch to Emacs 23.

Almost all Emacs implementations that are around use MULE as an internal
encoding.  Emacs>=23 and XEmacs starting from some 21.5 quite instable
version use utf-8 as an internal encoding.

The "problem" with MULE is that it represents characters as a
charset/character pair, and characters from different charsets are
basically different.  But character sets are coupled with encodings, and
so some characters exist in quite a number of charsets (like the basic
accented letters).  This necessitated functions for "charset
unification" which do a better or worse job depending on what they are
working with, and how muc code have written for the charsets.

Now Emacs 23 loses this information and keeps around only the Unicode
codepoint.  That means that you can't represent as much information as
previously, but usually the information you lose is that which you would
want to have disregarded, anyway.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UTF-8 character question
  2008-05-12  6:49 UTF-8 character question horatio
  2008-05-12  7:14 ` Harald Hanche-Olsen
  2008-05-12  7:17 ` Harald Hanche-Olsen
@ 2008-05-12  9:21 ` Jason Rumney
  2008-05-12 15:42 ` Peter Dyballa
  3 siblings, 0 replies; 10+ messages in thread
From: Jason Rumney @ 2008-05-12  9:21 UTC (permalink / raw)
  To: help-gnu-emacs

On May 12, 7:49 am, hora...@gmail.com wrote:
> I downloaded Emacs 22.2.1 for Windows, and I was pleased to find that
> Chinese characters work "out of the box" on my computer.  However, I
> have a weird visualization problem for some characters.  One example
> is 你你.  These two characters appear the same in Firefox, in Notepad,
> in the file system (ie Explorer), and in various other places.
> However, in Emacs, the character on the left appears as an empty
> square, but the character on the right shows up as the Chinese
> character for "you".  Is there some way to make Emacs correctly
> display both versions of this character?

What is your default language set to in Windows?  If it is not
Chinese, then Emacs might be making the wrong decision about how to
interpret 你 when it is encoded as Unicode, and is looking for a
Japanese or Korean font which you don't have.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UTF-8 character question
  2008-05-12  8:35         ` David Kastrup
@ 2008-05-12  9:37           ` horatio
  0 siblings, 0 replies; 10+ messages in thread
From: horatio @ 2008-05-12  9:37 UTC (permalink / raw)
  To: help-gnu-emacs

On May 12, 1:35 am, David Kastrup <d...@gnu.org> wrote:
> Harald Hanche-Olsen <han...@math.ntnu.no> writes:
> > + hora...@gmail.com:
>
> >> My guess is there's some basic option or package that I'm missing
> >> that will make the problem go away.  Can you (or anyone else) copy
> >> and paste that character into an Emacs buffer?  If it works, can you
> >> think of anything in your setup that I might not have done?  I'll
> >> take a look myself in the meantime.
>
> > I can copy and paste it just fine.  However, you said you're running
> > emacs 22 on windows, right? I am running various versions of emacs 23
> > (the development version) on unix, so I very much doubt that you can
> > learn anything useful from my setup. I don't do anything out of the
> > ordinary with font setup anyway (other than using the Vera Sans Mono
> > font, which will affect only the latin characters). I think some other
> > users of emacs on windows will have to step in.
>
> If he is using Chinese or other CJK stuff a lot, he might want to bite
> the bullet and switch to Emacs 23.
>
> Almost all Emacs implementations that are around use MULE as an internal
> encoding.  Emacs>=23 and XEmacs starting from some 21.5 quite instable
> version use utf-8 as an internal encoding.
>
> The "problem" with MULE is that it represents characters as a
> charset/character pair, and characters from different charsets are
> basically different.  But character sets are coupled with encodings, and
> so some characters exist in quite a number of charsets (like the basic
> accented letters).  This necessitated functions for "charset
> unification" which do a better or worse job depending on what they are
> working with, and how muc code have written for the charsets.
>
> Now Emacs 23 loses this information and keeps around only the Unicode
> codepoint.  That means that you can't represent as much information as
> previously, but usually the information you lose is that which you would
> want to have disregarded, anyway.
>
> --
> David Kastrup, Kriemhildstr. 15, 44793 Bochum

Hey, thanks for the suggestion.  I found a zip online built from the
cvs just a week ago (on the emacsw32 site), and that seems to fix my
unicode problems.

Thanks to everyone for the help.  It seems that the best way to work
with Chinese in emacs is to track down a recent build of Emacs 23.

John


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UTF-8 character question
  2008-05-12  6:49 UTF-8 character question horatio
                   ` (2 preceding siblings ...)
  2008-05-12  9:21 ` Jason Rumney
@ 2008-05-12 15:42 ` Peter Dyballa
  3 siblings, 0 replies; 10+ messages in thread
From: Peter Dyballa @ 2008-05-12 15:42 UTC (permalink / raw)
  To: horatio; +Cc: help-gnu-emacs


Am 12.05.2008 um 08:49 schrieb horatio:

> Is there some way to make Emacs correctly
> display both versions of this character?


You can check with C-u C-x = on each of the characters/boxes what  
they actually are.

There might be another problem with the way MS encodes the snippet.  
Maybe you need to tell what selection-coding-system is used ...

--
Greetings

   Pete

A morning without coffee is like something without something else.








^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-05-12 15:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-12  6:49 UTF-8 character question horatio
2008-05-12  7:14 ` Harald Hanche-Olsen
2008-05-12  7:51   ` horatio
2008-05-12  8:07     ` horatio
2008-05-12  8:16       ` Harald Hanche-Olsen
2008-05-12  8:35         ` David Kastrup
2008-05-12  9:37           ` horatio
2008-05-12  7:17 ` Harald Hanche-Olsen
2008-05-12  9:21 ` Jason Rumney
2008-05-12 15:42 ` Peter Dyballa

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.